Comments (13)
I haven’t seen this one yet. Let me know if there are ways I can help.
from armor.
Ran the whole thing over night, and the salmon to edgeR part is solved now, only the shiny part now fails. But, since this is another topic, I'll try and figure it out myself, but if it does not work, will open a new issue. I have a much better idea now of what to look out for when using ARMOR for non-model organisms, thanks for all your help!!!
from armor.
Hm. ARMOR
is really intended for use with species where you have a reference annotation from either GENCODE or Ensembl. Where did you get your reference annotation from? Maybe there's something that can help you in this thread: #97, but we do not make any guarantees for non-{GENCODE, Ensembl} references.
What's causing your specific error is this:
some CDS cannot be mapped to an exon
I'm not sure what your GTF file looks like, but this must work (for tximeta
to work):
txdb <- makeTxDbFromGFF(gtf_file)
from armor.
Thank you Charlotte, sorry for the late reply, was on holidays last week.
I got the gtf from NCBI (tried a GenBank and RefSeq assembly), the gtf looks reasonably standard, but guess not. Will look into the details this week and report about any possible solutions I can find. We are interested in using the pipeline with plant genomes as well (not all in Emsembl), but if there's too much trouble making adjustments, we will probably look for other options.
I can send the gtf file if it's of any use, or better here's the link: https://www.ncbi.nlm.nih.gov/assembly/GCF_002277975.1.
from armor.
So the issue indeed seems to be that the GTF file can not be read with makeTxDbFromGFF()
(I downloaded and tried the GenBank one), seemingly because the CDS
entries can not be matched with the exon
entries. It appears that makeTxDbFromGFF()
does support some GTF files where CDSs are direct children of genes (see e.g. this post from a few years back: http://supportupgrade.bioconductor.org/p/71117/), but this one does not seem to fall in that category. If I just crudely replace all CDS
with exon
in the file, it can be read. So this at least provides an explanation. As for a solution - I would first like to check with @mikelove if he has ever seen (and maybe solved 😃) a similar issue when reading non-model organism data with tximeta
. Otherwise we'll see if it can be addressed in another way.
from armor.
I've now tried to use an Ensembl genome annotation of a parent strain (with more genes) available in gff3, which I transformed into gtf before running. This has lead me to a completely different error:
Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
generating transcript ranges
Error in checkAssays2Txps(assays, txps) :
none of the transcripts in the quantification files are in the GTF
Calls: -> checkAssays2Txps
Execution halted
which I am currently interpretting is caused by a loss of gene-level data when going from the gff3 to gtf. Here's the gtf file, with only exon and CDS data:
atcc_13032_v4.gtf.txt
Since the gene level data is available in the gff3 file, i just need to convert it better to the gtf and I guess it should work?
Corynebacterium_glutamicum_atcc_13032.ASM19633v1.47.chromosome.Chromosome.gff3.txt
So, not solution yet, but working on it.
from armor.
Did you requantify the data with the new annotation? It looks like the transcript IDs are not matching.
from armor.
As for the original GTF file, somehow all the transcripts with annotated CDSs are called "unknown_transcript_1" (regardless of the associated gene), which seems a bit strange.
from armor.
yes, the original gtf really is strange.
yes, I did requantify (i always delete all outputs and the salmon and star indexes, then run the pipeline on a small part of my dataset, my understanding is this forces requantification), this should not be the reason for transcript ids not matching, i think
from armor.
Could you post just the first lines of one of the quant.sf files from Salmon?
from armor.
Name Length EffectiveLength TPM NumReads
CAF18566 1575 1456.128 72.956849 179.012
CAF18567 327 162.000 0.000000 0.000
CAF18568 1185 913.293 344.404551 530.024
CAF18569 1185 992.982 82.987673 138.858
CAF18570 489 328.875 43.307604 24.000
CAF18571 2055 2078.305 377.541652 1322.179
CAF18572 969 943.521 25.158942 40.000
CAF18573 672 506.014 107.042140 91.271
from armor.
Right. So the transcript IDs in the GTF file are of the form transcript:CAF18566
(not CAF18566
).
from armor.
I see, here's where my utter lack of understanding of how gtf files work let me down. So, if I just replace the form in the GTF file, it should work. I will try later today, thanks!
from armor.
Related Issues (20)
- Error running example files HOT 3
- issue when setting up and running the example data HOT 5
- Iteration over edgeR fails upon changes to metadata.txt HOT 4
- Issues Running Example Dataset HOT 2
- column-spillover in DGE result .txt file HOT 2
- Update wiki Home page command HOT 1
- Detailed instructions for MacOS installation? HOT 4
- Could not install R packages HOT 6
- `Ncpu` in `install.packages()` ? HOT 1
- CalledProcessError in EdgeR HOT 5
- Input exception wit SE reads HOT 10
- New commit breaks conda activation? HOT 12
- tximeta error HOT 1
- error in rule bigwig HOT 4
- About threads/cores HOT 3
- Error when running the example dataset HOT 7
- Gencode / Ensembl HOT 5
- trouble with running the example dataset HOT 4
- Error when using bacterial fa and gtf file in ARMOR HOT 26
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from armor.