Coder Social home page Coder Social logo

gff3.jl's Introduction

GFF3.jl

Project Status: Active โ€“ The project has reached a stable, usable state and is being actively developed. Latest Release DOI MIT license Stable documentation Latest documentation

This project follows the semver pro forma and uses the git-flow branching model.

Description

The GFF3 package provides I/O and utilities for the GFF3 file format.

Installation

You can install the GFF3 package from the Julia REPL. Press ] to enter pkg mode, then enter the following command:

add GFF3

If you are interested in the cutting edge of the development, please check out the develop branch to try new features before release.

Testing

GFF3 is tested against Julia 1.X on Linux, OS X, and Windows.

Latest build status:

Unit Tests Documentation codecov

Contributing

We appreciate contributions from users including reporting bugs, fixing issues, improving performance and adding new features.

Take a look at the contributing files detailed contributor and maintainer guidelines, and code of conduct.

Financial contributions

We also welcome financial contributions in full transparency on our open collective. Anyone can file an expense. If the expense makes sense for the development the core contributors and the person who filed the expense will be reimbursed.

Backers & Sponsors

Thank you to all our backers and sponsors!

Love our work and community? Become a backer.

backers

Does your company use BioJulia? Help keep BioJulia feature rich and healthy by sponsoring the project. Your logo will show up here with a link to your website.

Questions?

If you have a question about contributing or using BioJulia software, come on over and chat to us on the Julia Slack workspace, or you can try the Bio category of the Julia discourse site.

gff3.jl's People

Contributors

ciaranomara avatar jakobnissen avatar jonathanbieler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gff3.jl's Issues

bump compat for FASTX to v2

Could you add FASTX.jl v2 to the compat?
GFF3.jl has only small dependencies on FASTX.jl and it will require only simple changes.

Unexpected behaviour for `GFF3.strand`

When given an unstranded feature, strand doesn't return the expected results.

GFF3 specifies that features where strand is . are unstranded (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md):

Column 7: "strand"
The strand of the feature. + for positive strand (relative to the landmark), - for minus strand, and . for features that are not stranded. In addition, ? can be used for features whose strandedness is relevant, but unknown

Thus, when using GFF3.strand on an unstranded feature (like a chromosome) I would expect it to return a GenomicFeatures.STRAND_BOTH. Instead the function raises an error.

Expected Behavior

genome_file = "genomes/Mus_musculus.GRCm38.102.gff3.gz"
reader = open(genome_file, "r") |> GzipDecompressorStream |> GFF3.Reader
record = read(reader)
GFF3.strand(record)

returns

STRAND_BOTH

Current Behavior

genome_file = "genomes/Mus_musculus.GRCm38.102.gff3.gz"
reader = open(genome_file, "r") |> GzipDecompressorStream |> GFF3.Reader
record = read(reader)
GFF3.strand(record)

returns

ERROR: strand is missing
Stacktrace:
 [1] missingerror(field::Symbol)
   @ BioCore.Exceptions ~/.julia/packages/BioCore/YBJvb/src/Exceptions.jl:22
 [2] strand(record::GFF3.Record)
   @ GFF3 ~/.julia/packages/GFF3/b3VT6/src/record.jl:363
 [3] top-level scope
   @ REPL[66]:1

Possible Solution / Implementation

The implementation checks if the field is missing (i.e. set to . or 0x2e):

if ismissing(record, record.strand)

Handling a missing field by returning GenomicFeatures.STRAND_BOTH should produce the expected results.

Other

If there is some reason for this behaviour which is not obvious to me, please tell me and I will gladly be put into my place.
For me, this makes my code more complicated than it has to be, as I have to perform extra checks to see if something is unstranded.

Further, whether this should also apply to GFF3.phase is also a question, as that function also checks this. Here however a . is not specified to mean anything, so likely this shouldn't apply.

Accessors (like `GFF3.source`) should return missing instead of throwing exceptions

The current behaviour makes things like filtering records by something like source difficult if any sources are missing (.).

I feel like returning missing might be a bit more idiomatic? It would also make the following code less perilous...

Iterators.filter(r -> GFF3.source(r) == "Pfam", reader)

which is the code currently crashing my whole script when it hits a single record missing a source.

Let me know if others agree that the API should return missing instead of throwing exceptions, and I'm more than happy to file a PR!

GFF3 Record Encode Error in Attributes like ";product=xxx (a; b) xxx"

The code:

using GFF3

r = GFF3.Record("Ga0225945_11\timg_core_v400\tCDS\t350909\t352399\t.\t-\t0\tID=2800905551;locus_tag=Ga0225945_11352;product=respiratory nitrite reductase (cytochrome; ammonia-forming) precursor")

Expected Behavior

Expect product attribute to be ["respiratory nitrite reductase (cytochrome; ammonia-forming) precursor"]

Current Behavior

ERROR: ArgumentError: failed to index Any ~>""
Stacktrace:
 [1] macro expansion
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\reader.jl:310 [inlined]
 [2] index!(stream::TranscodingStreams.NoopStream{IOBuffer}, record::GFF3.Record)
   @ GFF3 C:\Users\x\.julia\packages\Automa\1KOLQ\src\Stream.jl:126
 [3] index!
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\reader.jl:118 [inlined]
 [4] convert
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:54 [inlined]
 [5] Record
   @ C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:44 [inlined]
 [6] convert(#unused#::Type{GFF3.Record}, str::String)
   @ GFF3 C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:69
 [7] GFF3.Record(str::String)
   @ GFF3 C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:65
 [8] top-level scope
   @ none:1

Possible Solution / Implementation

Treat ; in (), [] and {} as a non separator.

Your Environment

  • Package Version used: [af1dc308] GFF3 v0.2.1
  • Julia Version used: 1.6.1
  • Operating System and version (desktop or mobile): Windows 11
  • Link to your project: NA
(@v1.6) pkg> status
      Status `C:\Users\x\.julia\environments\v1.6\Project.toml`
  [c7e460c6] ArgParse v1.1.4
  [c52e3926] Atom v0.12.34
  [336ed68f] CSV v0.9.1
  [a93c6f00] DataFrames v1.2.2
  [1313f7d8] DataFramesMeta v0.9.1
  [31c24e10] Distributions v0.25.16
  [c2308a5c] FASTX v1.2.0
  [af1dc308] GFF3 v0.2.1
  [eeff360b] JobSchedulers v0.1.2
  [e5e0dc1b] Juno v0.8.4
  [ef544631] Pipelines v0.4.0
  [91a5bcdd] Plots v1.21.3
  [f3b207a7] StatsPlots v0.14.27
  [fdbf4ff8] XLSX v0.7.8
  [ddb6d928] YAML v0.4.7

Thank you.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

suspected typo in GFF3 Reader docs example

I suspect there's a typo in the latest docs for the GFF3 Reader.

It should say record rather than reader. Entering reader as it appears in the examples throws an error and entering record works.

It reads:

Iterate over records.

for record in reader
# Do something on record (see Accessors section).
seqid = GFF3.seqid(reader)
# ...
end

But should read:

Iterate over records.

for record in reader
# Do something on record (see Accessors section).
seqid = GFF3.seqid(record)
# ...
end

EOFError: read end of file

EOFError: read end of file

Hello, I use while loop to read GFF3 file but I got this error.I do not know how it happens and how to solve it. Here is my code:

# Import the GFF3 module.
using GFF3

exonDict = Dict{String,Int64}()

# Open a GFF3 file.
reader = open(GFF3.Reader, "Homo_sapiens.GRCh38.106.gff3")

# Pre-allocate record.
record = GFF3.Record()

# Iterate over records.
while !eof(reader)
    empty!(record)
    read!(reader,record)
    # do something
    if GFF3.featuretype(record) == "exon"
        transid = split(GFF3.attributes(record,"Parent")[1],":")[2]
        exonLength = abs(GFF3.seqend(record) - GFF3.seqstart(record)) + 1
        # println(exonLength)
        if !haskey(exonDict,transid)
            exonDict[transid] = exonLength
        else
            exonDict[transid] += exonLength
        end
    end
end

# Finally, close the reader.
close(reader)

Thank you for your reply!

Re: Reading GTF file

Background

I am trying to read/load a GTF file, so that I can compute coverage of my sequencing reads on each gene. However, I obtained errors during reading and couldn't access to my GTF file. How could I open a GTF file?
By the way, my GTF file was downloaded from Gencode and contains mouse genome annotations.

Current Behavior

I tried to read my gtf file, but I got an error message.

using GenomicFeatures
GFF3.Reader( "gencode.vM18.annotation.gtf")

ERROR: MethodError: Cannot `convert` an object of type String to an object of type GenomicFeatures.GFF3.Reader
This may have arisen from a call to the constructor GenomicFeatures.GFF3.Reader(...), since type constructors fall back to convert methods.

I tried open, and this time I didn't get any error message.

reader = open(GFF3.Reader, "gencode.vM18.annotation.gtf")

GenomicFeatures.GFF3.Reader(BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}(BufferedStreams.BufferedInputStream{IOStream}(<128.0 KiB buffer, 100% filled, data immobilized>), -27, 6, false), false, Symbol[:feature], false, GenomicFeatures.GFF3.Record[], 0, 5)

Then, I executed IntervalCollection, but obtained an error

features = IntervalCollection(reader)

ERROR: GenomicFeatures.GFF3.Reader file format error on line 6 ~>"; gene_t"
Stacktrace:
 [1] _read!(::GenomicFeatures.GFF3.Reader, ::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}, ::GenomicFeatures.GFF3.Record) at /home/donghoon/.julia/v0.6/BioCore/src/ReaderHelper.jl:164
 [2] read! at /home/donghoon/.julia/v0.6/BioCore/src/ReaderHelper.jl:134 [inlined]
 [3] tryread!(::GenomicFeatures.GFF3.Reader, ::GenomicFeatures.GFF3.Record) at /home/donghoon/.julia/v0.6/BioCore/src/Ragel.jl:241                                     
 [4] start(::GenomicFeatures.GFF3.Reader) at /home/donghoon/.julia/v0.6/BioCore/src/Ragel.jl:258                                                                       
 [5] _collect(::Type{GenomicFeatures.Interval{GenomicFeatures.GFF3.Record}}, ::GenomicFeatures.GFF3.Reader, ::Base.SizeUnknown) at ./array.jl:394                      
 [6] GenomicFeatures.IntervalCollection(::GenomicFeatures.GFF3.Reader) at /home/donghoon/.julia/v0.6/GenomicFeatures/src/gff3/reader.jl:73     

Your Environment

  • Package Version used: 0.2.1
  • Julia Version used: 0.6.4
  • Operating System and version (desktop or mobile): Ubuntu 16.04.5
  • Link to your project:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.