Coder Social home page Coder Social logo

gff3sort's People

Contributors

sestaton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

gff3sort's Issues

sorting fails for this example

Chr01 KBase exon 32364652 32365325 . + 0 ID=Potri.001G319000.1_exon_1; Parent=Potri.001G319000.1^M
Chr01 KBase gene 32364652 32372766 . + 0 ID=Potri.001G319000; JGI=fgenesh4_pg.C_LG_I002433; product=similar to RNA recognition motif %28RRM%29-containing protein%3B %5B co-ortholog %282of2%29 of At3g23900%2C %5D %28EC defLine%29; go=GO:0003676^M
Chr01 KBase mRNA 32364652 32372766 . + 0 ID=Potri.001G319000.1; Parent=Potri.001G319000; JGI=fgenesh4_pg.C_LG_I002433; product=similar to RNA recognition motif %28RRM%29-containing protein%3B %5B co-ortholog %282of2%29 of At3g23900%2C %5D %28EC defLine%29; go=GO:0003676^M
Chr01 KBase CDS 32364761 32365325 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32365997 32366437 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32365997 32366437 . + 0 ID=Potri.001G319000.1_exon_2; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32366615 32367131 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32366615 32367131 . + 0 ID=Potri.001G319000.1_exon_3; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32369735 32371072 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32369735 32371072 . + 0 ID=Potri.001G319000.1_exon_4; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32371158 32371158 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32371158 32371592 . + 0 ID=Potri.001G319000.1_exon_5; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32372071 32372144 . + 0 ID=Potri.001G319000.1_exon_6; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32372359 32372766 . + 0 ID=Potri.001G319000.1_exon_7; Parent=Potri.001G319000.1^M

dealing with beginning with "#" including all GFF3 pragma lines

https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md

Lines beginning with '##' are directives (sometimes called pragmas or meta-data) and provide meta-information about the document as a whole.

lines beginning with a single '#' are used for human-readable comments and can be ignored by parsers.

End-of-line comments (comments preceeded by # at the end of and on the same line as a feature or directive line) are not allowed.

using Pod::Usage

Consider using Pod::Usage to show embedded POD documentation to the user instead of relying on perldoc.

The use of square brackets around a parameter usually denotes an optional parameter. As such its use around the GFF3 file suggests the user not need to specify it.

Improve the help output on the command line. For example, it is unclear if optional parameters should be placed before or after the input GFF3 file.

increase the minimum Perl version from 5.010 to 5.10.1

Smart Matching operator (~~), whose behaviour may change with any version. We should increase the minimum Perl version from 5.010 to 5.10.1 since this operator was not introduced until 5.10.0 and its behaviour changed between 5.10.0 and 5.10.1.

memory problem

it reads the whole GFF3 file into memory to do the sort. So for large files, memory might be an issue

The things we should do is to calculate how much memory cost

features with no ID attribute but with Parent attributes

the script assumes, incorrectly, that all items with a Parent have an ID, which is not required (unless they themselves have children), and such subfeatures would not be placed by the script

push@{$parent2children{$parent} }, $id #when $id is void?

Thanks to Miklos Csuros’ review
In fact, the current code is able to deal with this.

if (defined($id)) {
     $id2line{$id} = $line;
}
else {
     $id2line{$line} = $line;
}

Considering to add some comments here.

feature have spaces

Both Parent and ID attributes may be parsed incorrectly if the field has spaces as in "ID=exon1 ; Parent=transcript1 ; Name=first exon".

Thanks to Miklos Csuros’ review

In fact, the current code is able to deal with this.

my ($id) = $note=~/ID=([^;]+)/
my ($parent) = $note=~/Parent=([^;]+)/

Considering to add some comments here.

Potential idea: turn into perl module?

If this library was structured using a "perl module" then JBrowse could potentially install it and use it in it's own scripts as part of a track loading pipeline

sort the chromosomes by natural order/original order

Currently this script simply sort chromosomes by alphabet. We could add an option to allow users to sort by natural order, that is: Chr2 would be placed before Chr10

OR by original order (However, the same chromosome must be put together)

Thanks to Dr. Miklos Csuros's comments

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.