dealing with beginning with "#" including all GFF3 pragma lines about gff3sort HOT 2 CLOSED

billzt commented on September 22, 2024

dealing with beginning with "#" including all GFF3 pragma lines

from gff3sort.

Comments (2)

billzt commented on September 22, 2024

pragmas (‘##’) have a meaning that may depend on the context and should not be placed arbitrarily in the file without careful considerations.

In current script, all the pragmas (‘##’) were written in the front of the file. It is wrong

Thanks to Miklos Csuros’ review

from gff3sort.

billzt commented on September 22, 2024

Possible pragmas lines:

##gff-version 3.2.1
required, must be in the first line

##sequence-region seqid start end
Optional

##feature-ontology URI
##attribute-ontology URI
##source-ontology URI
##species NCBI_Taxonomy_URI
##genome-build source buildName

###
This directive (three # signs in a row) indicates that all forward references to feature IDs that have been seen to this point have been resolved. After seeing this directive, a program that is processing the file serially can close off any open objects that it has created and return them, thereby allowing iterative access to the file. Otherwise, software cannot know that a feature has been fully populated by its subfeatures until the end of the file has been reached. It is recommended that complex features, such as the canonical gene, be terminated with the ### notation.
(In order to properly be indexed by tabix, features must be sorted exactly by their start position regardless of whether they belong to which gene. These happened in overlapped genes when features might cross with each other. However, we consider to add a # mark in each block, by an option)

##FASTA
This notation indicates that the annotation portion of the file is at an end and that the remainder of the file contains one or more sequences (nucleotide or protein) in FASTA format. This allows features and sequences to be bundled together. All FASTA sequences included in the file must be included together at the end of the file and may not be interspersed with the features lines. Once a ##FASTA section is encountered no other content beyond valid FASTA sequence is allowed.
(Tabix can not deal with FASTA. Therefore such FASTA block should be removed and separated)

from gff3sort.

dealing with beginning with "#" including all GFF3 pragma lines about gff3sort HOT 2 CLOSED

Comments (2)

Related Issues (14)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent