Great toolkit! I was wondering whether dadaist 2 can handle different regions of the 1

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

V1-V3,about quadram-institute-bioscience/dadaist2

Comments (8)

telatin commented on August 29, 2024

Thanks for asking!
There are no assumptions made regarding the region amplified. I'll elaborate more on this later next week.

from dadaist2.

telatin commented on August 29, 2024

Back again, with a broader answer.
Dadaist2 essentially performs these steps:

Input reads preparation (eg. trimmomatic or fastp). This is amplicon agnostic but trimmomatic requires the primer sequences to be fed.
DADA2 denoising: no assumptions on the region
Taxonomy classification: proper database (eg: 16S or ITS) should be provided, but using DADA2 or DECIPHER so no "pre-trained classifiers"

If you need any guidance or if you find some step to be better implemented in some way, please, let us know.
Best luck

from dadaist2.

slambrechts commented on August 29, 2024

Thank you for your answer!

For me it didn't work with the V1-V3 region, I ended up with 62 ASVs, while I know from other tools there are much more in our samples.

Maybe it has something to do with the error message below?

[2021-09-21 15:28:20] 62 representative sequences found.
DADA2 ERROR:
[2021-09-21 15:28:20] DADA2 filtered too many reads: 0.0091% from total 13526581 to 1234

Do you perhaps know anyone else that tried it with V1-V3 data?

from dadaist2.

telatin commented on August 29, 2024

Yes, it looks like the filtering was devastating. Can you please tell me the sequencing format (i.e. 2x250bp), platform (eg MiSeq). If you want to send me a couple of samples, I can provide an email for that. Thanks for reporting!

from dadaist2.

slambrechts commented on August 29, 2024

yes, the primers used are pA (AGAGTTTGATCCTGGCTCAG, positions 8–27) and BKL1 (GTATTACCGCGGCTGCTGGCA, positions 536–516), and sequencing was done on an Illumina MiSeq platform giving 2 × 300 bp paired end reads. I can send a couple of samples, yes. Thank you for your help!

from dadaist2.

telatin commented on August 29, 2024

Hi @slambrechts, I had a first look at the sequences, they contain the primers as expected but the quality profile was a bit lower than usual. From the log, I saw that three samples did not pass the very mild QC at the beginning (109, 93, and 42).
As you correctly pointed out the key error is
DADA2 filtered too many reads: 0.0091% from total 13526581 to 1234, and this led to the inaccurately low number of features detected.

To see at which step there is the biggest data loss you can open dada2_stats.tsv to see the overall picture:

	input	filtered	denoised	merged	non-chimeric
18_R1.fastq.gz	227281	80865	78479	16908	8878
20_R1.fastq.gz	98396	34157	33007	5796	3749

Here a relevant loss is at the filtering and merging step.
For the quality, maybe (but don't know if it's worth in terms of final quality) one might relax the Expected Errors via --maxee1 and --maxee2 (eg. 1.5 and 2.0 respectively?)
For the merging, a rule of the thumb could be trying to trim removing some 50bp to see if this makes it better (example --trunc-len-1 285 --trunc-len-2 250 or just trimming the input files).
Can you try this way and let me know?

PS: I put here a random but important note: all the samples must come from the same sequencing run. Multi runs is still under development, unfortunately.

from dadaist2.

slambrechts commented on August 29, 2024

hi @telatin, thank you for looking into this. I also spoke to some people in our department, and they seem to indeed use maxee=2 when using the V1-V3 region, so I will try different values for --maxee1, --maxee2 and --trunc-len-1 and --trunc-len-2 as you suggested.

Also, they use truncQ=11 in dada2. Is there an equivalent in dadaist2 to truncate reads at the first instance of a quality score less than or equal to truncQ?

Unfortunately I am indeed working with samples that come from multiple sequencing runs. Is this problem specific to dadaist2, or is this also not desirable when working with dada2 itself?

from dadaist2.

telatin commented on August 29, 2024

It's a DADA2 limitation, as the training is done assuming the reads comes from the same run. What you can do is to process the "runs" independently and then merge the OTU tables. This is possible because the OTUs are not clustered so they are equal across runs (if so).

At the moment truncQ is set to 10, as per SOP (and Qiime2 too) defaults, but I'll enable a parameter in the next releases if this can help!

from dadaist2.

V1-V3 about dadaist2 HOT 8 CLOSED

Comments (8)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent