Coder Social home page Coder Social logo

Comments (13)

mikelove avatar mikelove commented on June 6, 2024 1

I haven’t seen this one yet. Let me know if there are ways I can help.

from armor.

anzezupanic avatar anzezupanic commented on June 6, 2024 1

Ran the whole thing over night, and the salmon to edgeR part is solved now, only the shiny part now fails. But, since this is another topic, I'll try and figure it out myself, but if it does not work, will open a new issue. I have a much better idea now of what to look out for when using ARMOR for non-model organisms, thanks for all your help!!!

from armor.

csoneson avatar csoneson commented on June 6, 2024

Hm. ARMOR is really intended for use with species where you have a reference annotation from either GENCODE or Ensembl. Where did you get your reference annotation from? Maybe there's something that can help you in this thread: #97, but we do not make any guarantees for non-{GENCODE, Ensembl} references.

What's causing your specific error is this:

some CDS cannot be mapped to an exon

I'm not sure what your GTF file looks like, but this must work (for tximeta to work):

txdb <- makeTxDbFromGFF(gtf_file)

from armor.

anzezupanic avatar anzezupanic commented on June 6, 2024

Thank you Charlotte, sorry for the late reply, was on holidays last week.

I got the gtf from NCBI (tried a GenBank and RefSeq assembly), the gtf looks reasonably standard, but guess not. Will look into the details this week and report about any possible solutions I can find. We are interested in using the pipeline with plant genomes as well (not all in Emsembl), but if there's too much trouble making adjustments, we will probably look for other options.
I can send the gtf file if it's of any use, or better here's the link: https://www.ncbi.nlm.nih.gov/assembly/GCF_002277975.1.

from armor.

csoneson avatar csoneson commented on June 6, 2024

So the issue indeed seems to be that the GTF file can not be read with makeTxDbFromGFF() (I downloaded and tried the GenBank one), seemingly because the CDS entries can not be matched with the exon entries. It appears that makeTxDbFromGFF() does support some GTF files where CDSs are direct children of genes (see e.g. this post from a few years back: http://supportupgrade.bioconductor.org/p/71117/), but this one does not seem to fall in that category. If I just crudely replace all CDS with exon in the file, it can be read. So this at least provides an explanation. As for a solution - I would first like to check with @mikelove if he has ever seen (and maybe solved 😃) a similar issue when reading non-model organism data with tximeta. Otherwise we'll see if it can be addressed in another way.

from armor.

anzezupanic avatar anzezupanic commented on June 6, 2024

I've now tried to use an Ensembl genome annotation of a parent strain (with more genes) available in gff3, which I transformed into gtf before running. This has lead me to a completely different error:

Verifying validity of the information in the database:
Checking transcripts ... OK
Checking exons ... OK
generating transcript ranges
Error in checkAssays2Txps(assays, txps) :
none of the transcripts in the quantification files are in the GTF
Calls: -> checkAssays2Txps
Execution halted

which I am currently interpretting is caused by a loss of gene-level data when going from the gff3 to gtf. Here's the gtf file, with only exon and CDS data:
atcc_13032_v4.gtf.txt
Since the gene level data is available in the gff3 file, i just need to convert it better to the gtf and I guess it should work?
Corynebacterium_glutamicum_atcc_13032.ASM19633v1.47.chromosome.Chromosome.gff3.txt

So, not solution yet, but working on it.

from armor.

csoneson avatar csoneson commented on June 6, 2024

Did you requantify the data with the new annotation? It looks like the transcript IDs are not matching.

from armor.

csoneson avatar csoneson commented on June 6, 2024

As for the original GTF file, somehow all the transcripts with annotated CDSs are called "unknown_transcript_1" (regardless of the associated gene), which seems a bit strange.

from armor.

anzezupanic avatar anzezupanic commented on June 6, 2024

yes, the original gtf really is strange.

yes, I did requantify (i always delete all outputs and the salmon and star indexes, then run the pipeline on a small part of my dataset, my understanding is this forces requantification), this should not be the reason for transcript ids not matching, i think

from armor.

csoneson avatar csoneson commented on June 6, 2024

Could you post just the first lines of one of the quant.sf files from Salmon?

from armor.

anzezupanic avatar anzezupanic commented on June 6, 2024

Name Length EffectiveLength TPM NumReads
CAF18566 1575 1456.128 72.956849 179.012
CAF18567 327 162.000 0.000000 0.000
CAF18568 1185 913.293 344.404551 530.024
CAF18569 1185 992.982 82.987673 138.858
CAF18570 489 328.875 43.307604 24.000
CAF18571 2055 2078.305 377.541652 1322.179
CAF18572 969 943.521 25.158942 40.000
CAF18573 672 506.014 107.042140 91.271

from armor.

csoneson avatar csoneson commented on June 6, 2024

Right. So the transcript IDs in the GTF file are of the form transcript:CAF18566 (not CAF18566).

from armor.

anzezupanic avatar anzezupanic commented on June 6, 2024

I see, here's where my utter lack of understanding of how gtf files work let me down. So, if I just replace the form in the GTF file, it should work. I will try later today, thanks!

from armor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.