Describe the bug hello, I installed dadaist2 on my computer and i

Hello there, as you can see from the log <div class="snippet-clipboard-content

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-ho

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

Hello <a class="user-mention notranslate" data-hovercard-type="user" data

[BUG] Unable to replicate tutorial,about quadram-institute-bioscience/dadaist2

Comments (14)

telatin commented on August 29, 2024 1

Thanks again for the feedback. lately I pushed new releases of other tools so apologies for the delay in taking care of this issue, but it's on my radar...

from dadaist2.

telatin commented on August 29, 2024

Hello there,
as you can see from the log

DADA2 ERROR:
[2022-03-22 11:13:33] DADA2 filtered too many reads: 3.5209% from total 16104 to 567

the current settings let DADA2 remove most of the reads. This is the primary cause to investigate. We will test again with the tutorial. But can you please clarify first if you used the dadaist2 binary from the repository that you cloned or the one from conda (I see you activated the environment)?

from dadaist2.

najibveto commented on August 29, 2024

I followed the recommendation, so i installed through conda environment:

wget -O dadaist2.yaml "https://quadram-institute-bioscience.github.io/dadaist2/dadaist2-$(uname).yaml"
conda env create --file dadaist2.yaml -n dadaist2
for the remaining part, I followed the the first tutorial. also I tried the Mothur SOP tutorial and had similar error.
it could create MicrobiomeAnalyst subdirectory with the different files, however the R folder was empty.

from dadaist2.

najibveto commented on August 29, 2024

Thanks again for the feedback. lately I pushed new releases of other tools so apologies for the delay in taking care of this issue, but it's on my radar...

thank you, no worries.

from dadaist2.

telatin commented on August 29, 2024

Hi @najibveto, the latest release 1.2.4 updated the defaults to more relaxed values, re-enabling the analysis of the test dataset. Thanks for reporting and feel free to let me know if you are experiencing issues

from dadaist2.

najibveto commented on August 29, 2024

Hi @najibveto, the latest release 1.2.4 updated the defaults to more relaxed values, re-enabling the analysis of the test dataset. Thanks for reporting and feel free to let me know if you are experiencing issues

thank you a lot for your answer. I tried to use the dadaist2 but i got the same problem:

Dadaist2
[2022-07-06 10:27:00] Ready to log in /home/user/example-output/dadaist.log
[2022-07-06 10:27:00] dadaist2 1.1.0
[2022-07-06 10:27:00] DECIPHER Taxonomy database found: refs/SILVA_SSU_r138_2019.RData
 * Input directory: data/16S/
 * Output directory: /home/user/example-output/
 * Metadata: metadata.tsv
 * Reference database: skip
 * Threads: 8
 * Temporary directory: /tmp/dadaist2_0wOu2s
 * QC strategy: skip
[2022-07-06 10:27:00] SeqFu quality truncation at: 254 - 209
[2022-07-06 10:27:00] Checking dependencies
 * DECIPHER: <pass>
 * RScript: R scripting front-end version 4.0.5 (2021-03-31)
 * assign-taxonomy: dadaist2-assigntax 1.1.3
 * clustalo: 1.2.4
 * dada2 (lib): <pass>
 * exporter: dadaist2-exporter 1.4.0
 * fastp: fastp 0.20.1
 * fasttree: FastTree version 2.1.10 Double precision (No SSE3):
 * fu-primers: fu-primers 1.6.0
[2022-07-06 10:27:07] Temporary directory: /tmp/dadaist2_0wOu2s
[2022-07-06 10:27:07] Threads: 8
[2022-07-06 10:27:07] Output directory: /home/user/example-output/
[2022-07-06 10:27:07] Input directory "data/16S/": 3 found (paired-end)
[2022-07-06 10:27:07] (1/3) Processing A01: skip
[2022-07-06 10:27:07] (2/3) Processing A02: skip
[2022-07-06 10:27:07] (3/3) Processing F99: skip
[2022-07-06 10:27:07] Running DADA2...
[2022-07-06 10:27:07] Dada2 script parameters:
[2022-07-06 10:27:26] DADA2 Finished.
[2022-07-06 10:27:26] 13 representative sequences found.
DADA2 ERROR:
[2022-07-06 10:27:26] DADA2 filtered too many reads: 3.5209% from total 16104 to 567
[2022-07-06 10:27:26] Assigning taxonomy using DECIPHER: refs/SILVA_SSU_r138_2019.RData
[2022-07-06 10:27:45] Converting decipher taxonomy output: /tmp/dadaist2_0wOu2s/taxonomy.tsv
[2022-07-06 10:27:45] Multiple sequence alignment and tree generation
[2022-07-06 10:27:46] Feature tree generated
[2022-07-06 10:27:46] Dadaist finished, output files saved:
 * decipher-taxonomy-table: /home/user/example-output/taxonomy.txt
 * feature-table: /home/user/example-output/feature-table.tsv
 * features-tree: /home/user/example-output/rep-seqs.tree
 * multiple-alignment: /home/user/example-output/rep-seqs.msa
 * rep-seqs: /home/user/example-output/rep-seqs.fasta

as u can see, the DADA2 gives and it filter a lot of reads. i tried several computer (mine, workstation computer) but i got the same error.
i am really sorry for distrubing you and thank you for your help.

from dadaist2.

telatin commented on August 29, 2024

Hello @najibveto,
thanks for your swift response!

Just a couple of clarifications: the "toy" dataset is expected to have a huge filtering, and from your log, I see you are still on "dadaist2 1.1.0", and the patch was applied to 1.2.4 (just a slight increase in sensitivity and will pick ~30 ASV instead of 13, which I think is a decent compromise out of the box for that dataset).
A quick fix is to manually trim the initial part of the reads using the "fastp" preprocessing (add --fastp as parameter): this will trim the initial part of the reads.
In real life datasets one should know the primers and remove them via --primers FOR:REV (recommended method).

I will further update the manual and release a more robust fix with 1.2.5.

from dadaist2.

najibveto commented on August 29, 2024

Hello @najibveto, thanks for your swift response!

Just a couple of clarifications: the "toy" dataset is expected to have a huge filtering, and from your log, I see you are still on "dadaist2 1.1.0", and the patch was applied to 1.2.4 (just a slight increase in sensitivity and will pick ~30 ASV instead of 13, which I think is a decent compromise out of the box for that dataset). A quick fix is to manually trim the initial part of the reads using the "fastp" preprocessing (add --fastp as parameter): this will trim the initial part of the reads. In real life datasets one should know the primers and remove them via --primers FOR:REV (recommended method).

I will further update the manual and release a more robust fix with 1.2.5.

thank you for your answer. For dadaist2 1.1.0, i removed and installed it from conda. probably i made a mistake.
just one question, normally when running the dadaist2, it should creatse an R object to be imported in phyloseq and used for analysis. i tried both tutorial but i didn't get the phyloseq object.
thank you again and sorry for the trouble.

from dadaist2.

telatin commented on August 29, 2024

Some final analyses (including creation of PhyloSeq object and Rhea) are automatically done only if DADA2 doesn't filter too much. One workaround is to lower the threshold of filtered reads (--max-loss 0.05 for example).

Alternatively, there is a dedicated program to generate a PhyloSeq object starting from a Dadaist2 output directory. This will work if Dadaist2 made it to the MicrobiomeAnalyst folder:

dadaist2-phyloseqMake -i dadaist2-outputdir/

For conda, there might be a reason for the installation of 1.1.0, but one can try pinning the version to see if there are incompatibilities:

mamba create -n Dadaist2 -c conda-forge -c bioconda dadaist=1.2.4

The recommended method to create the environment, though, would be via a YAML file

mamba env create --file dadaist2.yaml -n dadaist2-env

See: dadaist2/installation

from dadaist2.

najibveto commented on August 29, 2024

Some final analyses (including creation of PhyloSeq object and Rhea) are automatically done only if DADA2 doesn't filter too much. One workaround is to lower the threshold of filtered reads (--max-loss 0.05 for example).

Alternatively, there is a dedicated program to generate a PhyloSeq object starting from a Dadaist2 output directory. This will work if Dadaist2 made it to the MicrobiomeAnalyst folder:
dadaist2-phyloseqMake -i dadaist2-outputdir/
For conda, there might be a reason for the installation of 1.1.0, but one can try pinning the version to see if there are incompatibilities:
mamba create -n Dadaist2 -c conda-forge -c bioconda dadaist=1.2.4
The recommended method to create the environment, though, would be via a YAML file
mamba env create --file dadaist2.yaml -n dadaist2-env
See: dadaist2/installation

thank you so much, i could install the version 1.2.4 and it is working fine.
just when i tried to generate the phyloseq, i got this error:

dadaist2  --max-loss 0.05 -i MiSeq_SOP/ -o dadaist2-sop -m metadata.tsv -d ~/refs/silva_nr_v138_train_set.fa.gz
 Dadaist2 1.2.4

 [WARNING] Output directory found. This is a warning but in future releases this might require to specify --force to proceed.
[2022-07-07 17:26:34] Ready to log in /home/najib/dadaist2-sop/dadaist.log
[2022-07-07 17:26:34] dadaist2 1.2.4
[2022-07-07 17:26:34] Taxonomy database found: /home/najib/refs/silva_nr_v138_train_set.fa.gz
[2022-07-07 17:26:34] Parameter: taxonomy-type: dada2
[2022-07-07 17:26:34] Parameter: taxonomy-db: /home/najib/refs/silva_nr_v138_train_set.fa.gz
 * Input directory: MiSeq_SOP/
 * Output directory: /home/najib/dadaist2-sop/
 * Metadata: metadata.tsv
 * Reference database: /home/najib/refs/silva_nr_v138_train_set.fa.gz
 * Threads: 6
 * Temporary directory: /tmp/dadaist2_pwW88L
 * QC strategy: skip
[2022-07-07 17:26:34] QC: Checking quality profile with SeqFu
[2022-07-07 17:26:34] SeqFu quality truncation at (trunc-len-1 and trunc-len-2): 248 - 176
[2022-07-07 17:26:34] Checking dependencies
 * RScript: R scripting front-end version 4.0.5 (2021-03-31)
 * Taxonomy: dadaist2-assigntax 1.1.3
 * assign-taxonomy: dadaist2-assigntax 1.1.3
 * clustalo: 1.2.4
 * dada2 (lib): <pass>
 * exporter: dadaist2-exporter 1.4.0
 * fastp: fastp 0.23.2
 * fasttree: FastTree version 2.1.11 Double precision (No SSE3):
 * fu-primers: fu-primers 1.12.0
[2022-07-07 17:26:40] Temporary directory: /tmp/dadaist2_pwW88L
[2022-07-07 17:26:40] Threads: 6
[2022-07-07 17:26:40] Output directory: /home/najib/dadaist2-sop/
[2022-07-07 17:26:40] Input directory "MiSeq_SOP/": 20 found (paired-end)
[2022-07-07 17:26:40] (1/20) Processing F3D0: skip
[2022-07-07 17:26:40] (2/20) Processing F3D1: skip
[2022-07-07 17:26:41] (3/20) Processing F3D141: skip
[2022-07-07 17:26:41] (4/20) Processing F3D142: skip
[2022-07-07 17:26:41] (5/20) Processing F3D143: skip
[2022-07-07 17:26:41] (6/20) Processing F3D144: skip
[2022-07-07 17:26:41] (7/20) Processing F3D145: skip
[2022-07-07 17:26:42] (8/20) Processing F3D146: skip
[2022-07-07 17:26:42] (9/20) Processing F3D147: skip
[2022-07-07 17:26:42] (10/20) Processing F3D148: skip
[2022-07-07 17:26:43] (11/20) Processing F3D149: skip
[2022-07-07 17:26:43] (12/20) Processing F3D150: skip
[2022-07-07 17:26:44] (13/20) Processing F3D2: skip
[2022-07-07 17:26:44] (14/20) Processing F3D3: skip
[2022-07-07 17:26:45] (15/20) Processing F3D5: skip
[2022-07-07 17:26:45] (16/20) Processing F3D6: skip
[2022-07-07 17:26:45] (17/20) Processing F3D7: skip
[2022-07-07 17:26:45] (18/20) Processing F3D8: skip
[2022-07-07 17:26:46] (19/20) Processing F3D9: skip
[2022-07-07 17:26:46] (20/20) Processing Mock: skip
[2022-07-07 17:26:46] Running DADA2...
[2022-07-07 17:26:46] Dada2 script parameters:
[2022-07-07 17:29:24] DADA2 Finished.
[2022-07-07 17:29:24] Converting dada2 taxonomy output: /tmp/dadaist2_pwW88L/taxonomy.tsv
[2022-07-07 17:29:24] 223 representative sequences found.
[2022-07-07 17:29:24] DADA2 filtered 69.1842% from total 152360 to 105409
[2022-07-07 17:29:24] Multiple sequence alignment and tree generation
[2022-07-07 17:29:27] Feature tree generated
[2022-07-07 17:29:27] Exporting MicrobiomeAnalyst
[2022-07-07 17:29:32] PhyloSeq file not generated: /home/najib/dadaist2-sop/R/phyloseq.rds
[2022-07-07 17:29:32] Diagnostics:
 DADAIST2 Import to PhyloSeq
R version 4.0.5 (2021-03-31)
 * Input:  /home/najib/dadaist2-sop/
 * Loading feature table
 * Taxonomy loaded
 * Tree loaded
 * Metadata loaded
 * PhyloSeq: Feature table done
 * PhyloSeq: Taxonomy table done
 * PhyloSeq: adding tree
Error in validObject(.Object) : invalid class "phyloseq" object:
 Component sample names do not match.
 Try sample_names()
Calls: phyloseq ... do.call -> new -> initialize -> initialize -> validObject
Execution halted

PhyloSeq creation failed. at /home/najib/miniconda3/envs/dadaist/bin/dadaist2-phyloseqMake line 96.
[2022-07-07 17:29:34] Rhea normalization/alpha finished.
[2022-07-07 17:29:34] Dadaist finished, output files saved:
 * dada-taxonomy-table: /home/najib/dadaist2-sop/taxonomy.txt
 * feature-table: /home/najib/dadaist2-sop/feature-table.tsv
 * features-tree: /home/najib/dadaist2-sop/rep-seqs.tree
 * mba-files: /home/najib/dadaist2-sop/MicrobiomeAnalyst
 * multiple-alignment: /home/najib/dadaist2-sop/rep-seqs.msa
 * rep-seqs: /home/najib/dadaist2-sop/rep-seqs.fasta
 * rhea: /home/najib/dadaist2-sop/Rhea

[2022-07-07 17:29:34] Cleaning up

also i used the dadaist2-phyloseqMake and i got the same error.
thank you so much for your help.

from dadaist2.

telatin commented on August 29, 2024

This is good progress!
Now the trap is in a bug (will be fixed in the next release) that you can workaround:

OUTDIR="dadaist2-sop"
sed -i 's/_S210_L001_R1_001.fastq.gz//' "$OUTDIR"/MicrobiomeAnalyst/table.csv
dadaist2-phyloseqMake  -i "$OUTDIR"

Let me know if this fixes the problem on this run.

from dadaist2.

najibveto commented on August 29, 2024

This is good progress! Now the trap is in a bug (will be fixed in the next release) that you can workaround:
OUTDIR="dadaist2-sop"
sed -i 's/_S210_L001_R1_001.fastq.gz//' "$OUTDIR"/MicrobiomeAnalyst/table.csv
dadaist2-phyloseqMake  -i "$OUTDIR"
Let me know if this fixes the problem on this run.

it worked, i think when the dadaist2 generate the metadata.cvs and the table.csv, they give different name:
table.csv

metadata.csv

thank you for your help.

from dadaist2.

telatin commented on August 29, 2024

Thanks for sharing!
The problem was in the feature table generation, which should have the sample IDs: it worked correctly when specifying primers or trimming regions (normal use) but in the tutorial this was skipped and the problem wasn't caught.

from dadaist2.

telatin commented on August 29, 2024

Will close, please open new issues should arise, please let me know :)

from dadaist2.

[BUG] Unable to replicate tutorial about dadaist2 HOT 14 CLOSED

Comments (14)

Related Issues (18)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent