billzt / gff3sort Goto Github PK
View Code? Open in Web Editor NEWGFF3sort: A Perl Script to sort gff3 files and produce suitable results for tabix tools
License: GNU General Public License v3.0
GFF3sort: A Perl Script to sort gff3 files and produce suitable results for tabix tools
License: GNU General Public License v3.0
Chr01 KBase exon 32364652 32365325 . + 0 ID=Potri.001G319000.1_exon_1; Parent=Potri.001G319000.1^M
Chr01 KBase gene 32364652 32372766 . + 0 ID=Potri.001G319000; JGI=fgenesh4_pg.C_LG_I002433; product=similar to RNA recognition motif %28RRM%29-containing protein%3B %5B co-ortholog %282of2%29 of At3g23900%2C %5D %28EC defLine%29; go=GO:0003676^M
Chr01 KBase mRNA 32364652 32372766 . + 0 ID=Potri.001G319000.1; Parent=Potri.001G319000; JGI=fgenesh4_pg.C_LG_I002433; product=similar to RNA recognition motif %28RRM%29-containing protein%3B %5B co-ortholog %282of2%29 of At3g23900%2C %5D %28EC defLine%29; go=GO:0003676^M
Chr01 KBase CDS 32364761 32365325 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32365997 32366437 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32365997 32366437 . + 0 ID=Potri.001G319000.1_exon_2; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32366615 32367131 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32366615 32367131 . + 0 ID=Potri.001G319000.1_exon_3; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32369735 32371072 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32369735 32371072 . + 0 ID=Potri.001G319000.1_exon_4; Parent=Potri.001G319000.1^M
Chr01 KBase CDS 32371158 32371158 . + 0 ID=Potri.001G319000.1.CDS; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32371158 32371592 . + 0 ID=Potri.001G319000.1_exon_5; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32372071 32372144 . + 0 ID=Potri.001G319000.1_exon_6; Parent=Potri.001G319000.1^M
Chr01 KBase exon 32372359 32372766 . + 0 ID=Potri.001G319000.1_exon_7; Parent=Potri.001G319000.1^M
https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md
Lines beginning with '##' are directives (sometimes called pragmas or meta-data) and provide meta-information about the document as a whole.
lines beginning with a single '#' are used for human-readable comments and can be ignored by parsers.
End-of-line comments (comments preceeded by # at the end of and on the same line as a feature or directive line) are not allowed.
Fixed in f5a8778
Could you create a new release which corresponds to the version of the tool at the time of publication?
Consider using Pod::Usage to show embedded POD documentation to the user instead of relying on perldoc.
The use of square brackets around a parameter usually denotes an optional parameter. As such its use around the GFF3 file suggests the user not need to specify it.
Improve the help output on the command line. For example, it is unclear if optional parameters should be placed before or after the input GFF3 file.
Smart Matching operator (~~), whose behaviour may change with any version. We should increase the minimum Perl version from 5.010 to 5.10.1 since this operator was not introduced until 5.10.0 and its behaviour changed between 5.10.0 and 5.10.1.
it reads the whole GFF3 file into memory to do the sort. So for large files, memory might be an issue
The things we should do is to calculate how much memory cost
the script assumes, incorrectly, that all items with a Parent have an ID, which is not required (unless they themselves have children), and such subfeatures would not be placed by the script
push@{$parent2children{$parent} }, $id #when $id is void?
Thanks to Miklos Csuros’ review
In fact, the current code is able to deal with this.
if (defined($id)) {
$id2line{$id} = $line;
}
else {
$id2line{$line} = $line;
}
Considering to add some comments here.
Thanks to Miklos Csuros’ review
"Parent" is not the same as "parent"
Thanks to Miklos Csuros’ review
Both Parent and ID attributes may be parsed incorrectly if the field has spaces as in "ID=exon1 ; Parent=transcript1 ; Name=first exon".
Thanks to Miklos Csuros’ review
In fact, the current code is able to deal with this.
my ($id) = $note=~/ID=([^;]+)/
my ($parent) = $note=~/Parent=([^;]+)/
Considering to add some comments here.
If this library was structured using a "perl module" then JBrowse could potentially install it and use it in it's own scripts as part of a track loading pipeline
if the feature has multiple parents (given as a comma-separated list)
Thanks' Miklos Csuros’ review
Currently this script simply sort chromosomes by alphabet. We could add an option to allow users to sort by natural order, that is: Chr2 would be placed before Chr10
OR by original order (However, the same chromosome must be put together)
Thanks to Dr. Miklos Csuros's comments
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.