Comments (2)
pragmas (‘##’) have a meaning that may depend on the context and should not be placed arbitrarily in the file without careful considerations.
In current script, all the pragmas (‘##’) were written in the front of the file. It is wrong
Thanks to Miklos Csuros’ review
from gff3sort.
Possible pragmas lines:
##gff-version 3.2.1
required, must be in the first line
##sequence-region seqid start end
Optional
##feature-ontology URI
##attribute-ontology URI
##source-ontology URI
##species NCBI_Taxonomy_URI
##genome-build source buildName
###
This directive (three # signs in a row) indicates that all forward references to feature IDs that have been seen to this point have been resolved. After seeing this directive, a program that is processing the file serially can close off any open objects that it has created and return them, thereby allowing iterative access to the file. Otherwise, software cannot know that a feature has been fully populated by its subfeatures until the end of the file has been reached. It is recommended that complex features, such as the canonical gene, be terminated with the ### notation.
(In order to properly be indexed by tabix, features must be sorted exactly by their start position regardless of whether they belong to which gene. These happened in overlapped genes when features might cross with each other. However, we consider to add a # mark in each block, by an option)
##FASTA
This notation indicates that the annotation portion of the file is at an end and that the remainder of the file contains one or more sequences (nucleotide or protein) in FASTA format. This allows features and sequences to be bundled together. All FASTA sequences included in the file must be included together at the end of the file and may not be interspersed with the features lines. Once a ##FASTA section is encountered no other content beyond valid FASTA sequence is allowed.
(Tabix can not deal with FASTA. Therefore such FASTA block should be removed and separated)
from gff3sort.
Related Issues (14)
- using "#!/usr/bin/env perl" for the shebang line so non-system Perl can be used more easily
- features with no ID attribute but with Parent attributes
- increase the minimum Perl version from 5.010 to 5.10.1
- using Pod::Usage
- New release corresponding to publication HOT 1
- sorting fails for this example
- memory problem HOT 1
- Potential idea: turn into perl module? HOT 1
- sort the chromosomes by natural order/original order
- give POD comments about usage
- attribute names are case sensitive
- feature has multiple parents HOT 1
- feature have spaces
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gff3sort.