Coder Social home page Coder Social logo

Comments (8)

telatin avatar telatin commented on August 29, 2024

Thanks for asking!
There are no assumptions made regarding the region amplified. I'll elaborate more on this later next week.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Back again, with a broader answer.
Dadaist2 essentially performs these steps:

  1. Input reads preparation (eg. trimmomatic or fastp). This is amplicon agnostic but trimmomatic requires the primer sequences to be fed.
  2. DADA2 denoising: no assumptions on the region
  3. Taxonomy classification: proper database (eg: 16S or ITS) should be provided, but using DADA2 or DECIPHER so no "pre-trained classifiers"

If you need any guidance or if you find some step to be better implemented in some way, please, let us know.
Best luck

from dadaist2.

slambrechts avatar slambrechts commented on August 29, 2024

Thank you for your answer!

For me it didn't work with the V1-V3 region, I ended up with 62 ASVs, while I know from other tools there are much more in our samples.

Maybe it has something to do with the error message below?

[2021-09-21 15:28:20] 62 representative sequences found.
DADA2 ERROR:
[2021-09-21 15:28:20] DADA2 filtered too many reads: 0.0091% from total 13526581 to 1234

Do you perhaps know anyone else that tried it with V1-V3 data?

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Yes, it looks like the filtering was devastating. Can you please tell me the sequencing format (i.e. 2x250bp), platform (eg MiSeq). If you want to send me a couple of samples, I can provide an email for that. Thanks for reporting!

from dadaist2.

slambrechts avatar slambrechts commented on August 29, 2024

yes, the primers used are pA (AGAGTTTGATCCTGGCTCAG, positions 8–27) and BKL1 (GTATTACCGCGGCTGCTGGCA, positions 536–516), and sequencing was done on an Illumina MiSeq platform giving 2 × 300 bp paired end reads. I can send a couple of samples, yes. Thank you for your help!

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Hi @slambrechts, I had a first look at the sequences, they contain the primers as expected but the quality profile was a bit lower than usual. From the log, I saw that three samples did not pass the very mild QC at the beginning (109, 93, and 42).
As you correctly pointed out the key error is
DADA2 filtered too many reads: 0.0091% from total 13526581 to 1234, and this led to the inaccurately low number of features detected.

To see at which step there is the biggest data loss you can open dada2_stats.tsv to see the overall picture:

	input	filtered	denoised	merged	non-chimeric
18_R1.fastq.gz	227281	80865	78479	16908	8878
20_R1.fastq.gz	98396	34157	33007	5796	3749

Here a relevant loss is at the filtering and merging step.
For the quality, maybe (but don't know if it's worth in terms of final quality) one might relax the Expected Errors via --maxee1 and --maxee2 (eg. 1.5 and 2.0 respectively?)
For the merging, a rule of the thumb could be trying to trim removing some 50bp to see if this makes it better (example --trunc-len-1 285 --trunc-len-2 250 or just trimming the input files).
Can you try this way and let me know?

PS: I put here a random but important note: all the samples must come from the same sequencing run. Multi runs is still under development, unfortunately.

from dadaist2.

slambrechts avatar slambrechts commented on August 29, 2024

hi @telatin, thank you for looking into this. I also spoke to some people in our department, and they seem to indeed use maxee=2 when using the V1-V3 region, so I will try different values for --maxee1, --maxee2 and --trunc-len-1 and --trunc-len-2 as you suggested.

Also, they use truncQ=11 in dada2. Is there an equivalent in dadaist2 to truncate reads at the first instance of a quality score less than or equal to truncQ?

Unfortunately I am indeed working with samples that come from multiple sequencing runs. Is this problem specific to dadaist2, or is this also not desirable when working with dada2 itself?

from dadaist2.

telatin avatar telatin commented on August 29, 2024

It's a DADA2 limitation, as the training is done assuming the reads comes from the same run. What you can do is to process the "runs" independently and then merge the OTU tables. This is possible because the OTUs are not clustered so they are equal across runs (if so).

At the moment truncQ is set to 10, as per SOP (and Qiime2 too) defaults, but I'll enable a parameter in the next releases if this can help!

from dadaist2.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.