biojulia / gff3.jl Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
The code:
using GFF3
r = GFF3.Record("Ga0225945_11\timg_core_v400\tCDS\t350909\t352399\t.\t-\t0\tID=2800905551;locus_tag=Ga0225945_11352;product=respiratory nitrite reductase (cytochrome; ammonia-forming) precursor")
Expect product attribute to be ["respiratory nitrite reductase (cytochrome; ammonia-forming) precursor"]
ERROR: ArgumentError: failed to index Any ~>""
Stacktrace:
[1] macro expansion
@ C:\Users\x\.julia\packages\GFF3\RXGVR\src\reader.jl:310 [inlined]
[2] index!(stream::TranscodingStreams.NoopStream{IOBuffer}, record::GFF3.Record)
@ GFF3 C:\Users\x\.julia\packages\Automa\1KOLQ\src\Stream.jl:126
[3] index!
@ C:\Users\x\.julia\packages\GFF3\RXGVR\src\reader.jl:118 [inlined]
[4] convert
@ C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:54 [inlined]
[5] Record
@ C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:44 [inlined]
[6] convert(#unused#::Type{GFF3.Record}, str::String)
@ GFF3 C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:69
[7] GFF3.Record(str::String)
@ GFF3 C:\Users\x\.julia\packages\GFF3\RXGVR\src\record.jl:65
[8] top-level scope
@ none:1
Treat ;
in ()
, []
and {}
as a non separator.
(@v1.6) pkg> status
Status `C:\Users\x\.julia\environments\v1.6\Project.toml`
[c7e460c6] ArgParse v1.1.4
[c52e3926] Atom v0.12.34
[336ed68f] CSV v0.9.1
[a93c6f00] DataFrames v1.2.2
[1313f7d8] DataFramesMeta v0.9.1
[31c24e10] Distributions v0.25.16
[c2308a5c] FASTX v1.2.0
[af1dc308] GFF3 v0.2.1
[eeff360b] JobSchedulers v0.1.2
[e5e0dc1b] Juno v0.8.4
[ef544631] Pipelines v0.4.0
[91a5bcdd] Plots v1.21.3
[f3b207a7] StatsPlots v0.14.27
[fdbf4ff8] XLSX v0.7.8
[ddb6d928] YAML v0.4.7
Thank you.
When given an unstranded feature, strand
doesn't return the expected results.
GFF3 specifies that features where strand
is .
are unstranded (https://github.com/The-Sequence-Ontology/Specifications/blob/master/gff3.md):
Column 7: "strand"
The strand of the feature. + for positive strand (relative to the landmark), - for minus strand, and . for features that are not stranded. In addition, ? can be used for features whose strandedness is relevant, but unknown
Thus, when using GFF3.strand
on an unstranded feature (like a chromosome) I would expect it to return a GenomicFeatures.STRAND_BOTH
. Instead the function raises an error.
genome_file = "genomes/Mus_musculus.GRCm38.102.gff3.gz"
reader = open(genome_file, "r") |> GzipDecompressorStream |> GFF3.Reader
record = read(reader)
GFF3.strand(record)
returns
STRAND_BOTH
genome_file = "genomes/Mus_musculus.GRCm38.102.gff3.gz"
reader = open(genome_file, "r") |> GzipDecompressorStream |> GFF3.Reader
record = read(reader)
GFF3.strand(record)
returns
ERROR: strand is missing
Stacktrace:
[1] missingerror(field::Symbol)
@ BioCore.Exceptions ~/.julia/packages/BioCore/YBJvb/src/Exceptions.jl:22
[2] strand(record::GFF3.Record)
@ GFF3 ~/.julia/packages/GFF3/b3VT6/src/record.jl:363
[3] top-level scope
@ REPL[66]:1
The implementation checks if the field is missing (i.e. set to .
or 0x2e
):
Line 361 in 76809cc
GenomicFeatures.STRAND_BOTH
should produce the expected results.
If there is some reason for this behaviour which is not obvious to me, please tell me and I will gladly be put into my place.
For me, this makes my code more complicated than it has to be, as I have to perform extra checks to see if something is unstranded.
Further, whether this should also apply to GFF3.phase
is also a question, as that function also checks this. Here however a .
is not specified to mean anything, so likely this shouldn't apply.
Could you add FASTX.jl v2 to the compat?
GFF3.jl has only small dependencies on FASTX.jl and it will require only simple changes.
I suspect there's a typo in the latest docs for the GFF3 Reader.
It should say record
rather than reader
. Entering reader
as it appears in the examples throws an error and entering record
works.
It reads:
for record in reader
# Do something on record (see Accessors section).
seqid = GFF3.seqid(reader)
# ...
end
But should read:
for record in reader
# Do something on record (see Accessors section).
seqid = GFF3.seqid(record)
# ...
end
I am trying to read/load a GTF file, so that I can compute coverage of my sequencing reads on each gene. However, I obtained errors during reading and couldn't access to my GTF file. How could I open a GTF file?
By the way, my GTF file was downloaded from Gencode and contains mouse genome annotations.
I tried to read my gtf file, but I got an error message.
using GenomicFeatures
GFF3.Reader( "gencode.vM18.annotation.gtf")
ERROR: MethodError: Cannot `convert` an object of type String to an object of type GenomicFeatures.GFF3.Reader
This may have arisen from a call to the constructor GenomicFeatures.GFF3.Reader(...), since type constructors fall back to convert methods.
I tried open
, and this time I didn't get any error message.
reader = open(GFF3.Reader, "gencode.vM18.annotation.gtf")
GenomicFeatures.GFF3.Reader(BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}(BufferedStreams.BufferedInputStream{IOStream}(<128.0 KiB buffer, 100% filled, data immobilized>), -27, 6, false), false, Symbol[:feature], false, GenomicFeatures.GFF3.Record[], 0, 5)
Then, I executed IntervalCollection
, but obtained an error
features = IntervalCollection(reader)
ERROR: GenomicFeatures.GFF3.Reader file format error on line 6 ~>"; gene_t"
Stacktrace:
[1] _read!(::GenomicFeatures.GFF3.Reader, ::BioCore.Ragel.State{BufferedStreams.BufferedInputStream{IOStream}}, ::GenomicFeatures.GFF3.Record) at /home/donghoon/.julia/v0.6/BioCore/src/ReaderHelper.jl:164
[2] read! at /home/donghoon/.julia/v0.6/BioCore/src/ReaderHelper.jl:134 [inlined]
[3] tryread!(::GenomicFeatures.GFF3.Reader, ::GenomicFeatures.GFF3.Record) at /home/donghoon/.julia/v0.6/BioCore/src/Ragel.jl:241
[4] start(::GenomicFeatures.GFF3.Reader) at /home/donghoon/.julia/v0.6/BioCore/src/Ragel.jl:258
[5] _collect(::Type{GenomicFeatures.Interval{GenomicFeatures.GFF3.Record}}, ::GenomicFeatures.GFF3.Reader, ::Base.SizeUnknown) at ./array.jl:394
[6] GenomicFeatures.IntervalCollection(::GenomicFeatures.GFF3.Reader) at /home/donghoon/.julia/v0.6/GenomicFeatures/src/gff3/reader.jl:73
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Hello, I use while loop to read GFF3 file but I got this error.I do not know how it happens and how to solve it. Here is my code:
# Import the GFF3 module.
using GFF3
exonDict = Dict{String,Int64}()
# Open a GFF3 file.
reader = open(GFF3.Reader, "Homo_sapiens.GRCh38.106.gff3")
# Pre-allocate record.
record = GFF3.Record()
# Iterate over records.
while !eof(reader)
empty!(record)
read!(reader,record)
# do something
if GFF3.featuretype(record) == "exon"
transid = split(GFF3.attributes(record,"Parent")[1],":")[2]
exonLength = abs(GFF3.seqend(record) - GFF3.seqstart(record)) + 1
# println(exonLength)
if !haskey(exonDict,transid)
exonDict[transid] = exonLength
else
exonDict[transid] += exonLength
end
end
end
# Finally, close the reader.
close(reader)
Thank you for your reply!
The current behaviour makes things like filtering records by something like source
difficult if any sources are missing (.
).
I feel like returning missing might be a bit more idiomatic? It would also make the following code less perilous...
Iterators.filter(r -> GFF3.source(r) == "Pfam", reader)
which is the code currently crashing my whole script when it hits a single record missing a source.
Let me know if others agree that the API should return missing
instead of throwing exceptions, and I'm more than happy to file a PR!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.