Coder Social home page Coder Social logo

Comments (3)

adf-ncgr avatar adf-ncgr commented on September 3, 2024

got the initial "complete loading" finished during the weekend. There are some issues to discuss, but there's plenty of data to do development with.

by adf_ncgr

from gcv.

adf-ncgr avatar adf-ncgr commented on September 3, 2024

noting some inconsistencies in the source files for future discussion with Wei and Steven. Very minor cases issues I corrected in place but most just took necessary actions on my "munged" copies to get them to load.

  • occasionally, the ##gff-version directive is misspelled, which causes the loader to fail.
  • there seems to be some mangling of the data content in the medtr comparisons to arachis species in the matches property, e.g.: Name=Aradu.A06;matches=Aradu.A06103325318..103727484;median_Ks=0.8496
    (note the missing separator of A06 from the coordinates). easily fixed, but I left in the main files in case you wanted to be able to try to track down how it got introduced in these two cases but not systematically in all
  • occasionally, files appear with a directive that the perl loader doesn't seem to understand and causes it to error; e.g.:
    ##genome-build JGI 2.0
    ##genome_build cicar ICC 4958 v2.0
    (possibly one of these versions of the directive spelling is understood by the loader, and the other is incorrect and causes the error I recall having to fix)
  • naming of chickpea pseudomolecules is a little inconsistent. I think in comparisons between the two chickpea genomes, Cicar1 is used to indicate chromosome 1 of the ICC4958 genome, but in all other places both ICC4958 and CDCFrontier are referenced as Ca1. This causes some contortions for the renaming procedure I used to get things into cicar.ICC4958.v2.Ca_LG_1 format consistently (how they are named in chado).
  • inconsistencies in some of the naming conventions of the files. all the red clover files use capital .X. whereas everyone else uses lowercase .x. ; also, the file name "glyma_Wm82_a2_recent_duplication.gff" is somewhat unique, most self-comparisons using the genome.x.genome naming convention.
  • want to better understand how decisions about renaming sequences are made for non-pseudomol cases. for example, the original genome file for red clover used names like "Tp57577_TGAC_v2_scaf_3109" for scaffolds, which got converted in the genome files used by browsers to "scaffold_3109" instead. The vigan genome naming appears to preserve the naming distinction made in the origin between SuperScaf_41 scaffold_2139

by adf_ncgr

from gcv.

adf-ncgr avatar adf-ncgr commented on September 3, 2024

A more serious issue I've just encountered; asymmetries within the soybean self-comparison file "glyma_Wm82_a2_recent_duplication.gff"? e.g.
Chr04 is listed as the src feature for only one block, but it is the target for many.

There are also some general question around how to deal with the fact that some pairs of species have not been included in block-generation, but clearly have syntenic relationships in the context viewer (e.g. glyma does not have blocks computed with respect to either cicar genome). This seems to fall under the bullet from our "Housing a new genome" SOP:
Determine the list of species for calculating synteny with the new species (SC)
but the option to ignore some pairs seems a little more problematic given the application at hand. Perhaps worth discussing

by adf_ncgr

from gcv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.