Coder Social home page Coder Social logo

Comments (14)

telatin avatar telatin commented on August 29, 2024 1

Thanks again for the feedback. lately I pushed new releases of other tools so apologies for the delay in taking care of this issue, but it's on my radar...

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Hello there,
as you can see from the log

DADA2 ERROR:
[2022-03-22 11:13:33] DADA2 filtered too many reads: 3.5209% from total 16104 to 567

the current settings let DADA2 remove most of the reads. This is the primary cause to investigate. We will test again with the tutorial. But can you please clarify first if you used the dadaist2 binary from the repository that you cloned or the one from conda (I see you activated the environment)?

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

I followed the recommendation, so i installed through conda environment:

wget -O dadaist2.yaml "https://quadram-institute-bioscience.github.io/dadaist2/dadaist2-$(uname).yaml"
conda env create --file dadaist2.yaml -n dadaist2
for the remaining part, I followed the the first tutorial. also I tried the Mothur SOP tutorial and had similar error.
it could create MicrobiomeAnalyst subdirectory with the different files, however the R folder was empty.

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

Thanks again for the feedback. lately I pushed new releases of other tools so apologies for the delay in taking care of this issue, but it's on my radar...

thank you, no worries.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Hi @najibveto, the latest release 1.2.4 updated the defaults to more relaxed values, re-enabling the analysis of the test dataset. Thanks for reporting and feel free to let me know if you are experiencing issues

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

Hi @najibveto, the latest release 1.2.4 updated the defaults to more relaxed values, re-enabling the analysis of the test dataset. Thanks for reporting and feel free to let me know if you are experiencing issues

thank you a lot for your answer. I tried to use the dadaist2 but i got the same problem:

Dadaist2
[2022-07-06 10:27:00] Ready to log in /home/user/example-output/dadaist.log
[2022-07-06 10:27:00] dadaist2 1.1.0
[2022-07-06 10:27:00] DECIPHER Taxonomy database found: refs/SILVA_SSU_r138_2019.RData
 * Input directory: data/16S/
 * Output directory: /home/user/example-output/
 * Metadata: metadata.tsv
 * Reference database: skip
 * Threads: 8
 * Temporary directory: /tmp/dadaist2_0wOu2s
 * QC strategy: skip
[2022-07-06 10:27:00] SeqFu quality truncation at: 254 - 209
[2022-07-06 10:27:00] Checking dependencies
 * DECIPHER: <pass>
 * RScript: R scripting front-end version 4.0.5 (2021-03-31)
 * assign-taxonomy: dadaist2-assigntax 1.1.3
 * clustalo: 1.2.4
 * dada2 (lib): <pass>
 * exporter: dadaist2-exporter 1.4.0
 * fastp: fastp 0.20.1
 * fasttree: FastTree version 2.1.10 Double precision (No SSE3):
 * fu-primers: fu-primers 1.6.0
[2022-07-06 10:27:07] Temporary directory: /tmp/dadaist2_0wOu2s
[2022-07-06 10:27:07] Threads: 8
[2022-07-06 10:27:07] Output directory: /home/user/example-output/
[2022-07-06 10:27:07] Input directory "data/16S/": 3 found (paired-end)
[2022-07-06 10:27:07] (1/3) Processing A01: skip
[2022-07-06 10:27:07] (2/3) Processing A02: skip
[2022-07-06 10:27:07] (3/3) Processing F99: skip
[2022-07-06 10:27:07] Running DADA2...
[2022-07-06 10:27:07] Dada2 script parameters:
[2022-07-06 10:27:26] DADA2 Finished.
[2022-07-06 10:27:26] 13 representative sequences found.
DADA2 ERROR:
[2022-07-06 10:27:26] DADA2 filtered too many reads: 3.5209% from total 16104 to 567
[2022-07-06 10:27:26] Assigning taxonomy using DECIPHER: refs/SILVA_SSU_r138_2019.RData
[2022-07-06 10:27:45] Converting decipher taxonomy output: /tmp/dadaist2_0wOu2s/taxonomy.tsv
[2022-07-06 10:27:45] Multiple sequence alignment and tree generation
[2022-07-06 10:27:46] Feature tree generated
[2022-07-06 10:27:46] Dadaist finished, output files saved:
 * decipher-taxonomy-table: /home/user/example-output/taxonomy.txt
 * feature-table: /home/user/example-output/feature-table.tsv
 * features-tree: /home/user/example-output/rep-seqs.tree
 * multiple-alignment: /home/user/example-output/rep-seqs.msa
 * rep-seqs: /home/user/example-output/rep-seqs.fasta

as u can see, the DADA2 gives and it filter a lot of reads. i tried several computer (mine, workstation computer) but i got the same error.
i am really sorry for distrubing you and thank you for your help.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Hello @najibveto,
thanks for your swift response!

Just a couple of clarifications: the "toy" dataset is expected to have a huge filtering, and from your log, I see you are still on "dadaist2 1.1.0", and the patch was applied to 1.2.4 (just a slight increase in sensitivity and will pick ~30 ASV instead of 13, which I think is a decent compromise out of the box for that dataset).
A quick fix is to manually trim the initial part of the reads using the "fastp" preprocessing (add --fastp as parameter): this will trim the initial part of the reads.
In real life datasets one should know the primers and remove them via --primers FOR:REV (recommended method).

I will further update the manual and release a more robust fix with 1.2.5.

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

Hello @najibveto, thanks for your swift response!

Just a couple of clarifications: the "toy" dataset is expected to have a huge filtering, and from your log, I see you are still on "dadaist2 1.1.0", and the patch was applied to 1.2.4 (just a slight increase in sensitivity and will pick ~30 ASV instead of 13, which I think is a decent compromise out of the box for that dataset). A quick fix is to manually trim the initial part of the reads using the "fastp" preprocessing (add --fastp as parameter): this will trim the initial part of the reads. In real life datasets one should know the primers and remove them via --primers FOR:REV (recommended method).

I will further update the manual and release a more robust fix with 1.2.5.

thank you for your answer. For dadaist2 1.1.0, i removed and installed it from conda. probably i made a mistake.
just one question, normally when running the dadaist2, it should creatse an R object to be imported in phyloseq and used for analysis. i tried both tutorial but i didn't get the phyloseq object.
thank you again and sorry for the trouble.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Some final analyses (including creation of PhyloSeq object and Rhea) are automatically done only if DADA2 doesn't filter too much. One workaround is to lower the threshold of filtered reads (--max-loss 0.05 for example).

Alternatively, there is a dedicated program to generate a PhyloSeq object starting from a Dadaist2 output directory. This will work if Dadaist2 made it to the MicrobiomeAnalyst folder:

dadaist2-phyloseqMake -i dadaist2-outputdir/

For conda, there might be a reason for the installation of 1.1.0, but one can try pinning the version to see if there are incompatibilities:

mamba create -n Dadaist2 -c conda-forge -c bioconda dadaist=1.2.4

The recommended method to create the environment, though, would be via a YAML file

mamba env create --file dadaist2.yaml -n dadaist2-env

See: dadaist2/installation

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

Some final analyses (including creation of PhyloSeq object and Rhea) are automatically done only if DADA2 doesn't filter too much. One workaround is to lower the threshold of filtered reads (--max-loss 0.05 for example).

Alternatively, there is a dedicated program to generate a PhyloSeq object starting from a Dadaist2 output directory. This will work if Dadaist2 made it to the MicrobiomeAnalyst folder:

dadaist2-phyloseqMake -i dadaist2-outputdir/

For conda, there might be a reason for the installation of 1.1.0, but one can try pinning the version to see if there are incompatibilities:

mamba create -n Dadaist2 -c conda-forge -c bioconda dadaist=1.2.4

The recommended method to create the environment, though, would be via a YAML file

mamba env create --file dadaist2.yaml -n dadaist2-env

See: dadaist2/installation

thank you so much, i could install the version 1.2.4 and it is working fine.
just when i tried to generate the phyloseq, i got this error:

dadaist2  --max-loss 0.05 -i MiSeq_SOP/ -o dadaist2-sop -m metadata.tsv -d ~/refs/silva_nr_v138_train_set.fa.gz
 Dadaist2 1.2.4

 [WARNING] Output directory found. This is a warning but in future releases this might require to specify --force to proceed.
[2022-07-07 17:26:34] Ready to log in /home/najib/dadaist2-sop/dadaist.log
[2022-07-07 17:26:34] dadaist2 1.2.4
[2022-07-07 17:26:34] Taxonomy database found: /home/najib/refs/silva_nr_v138_train_set.fa.gz
[2022-07-07 17:26:34] Parameter: taxonomy-type: dada2
[2022-07-07 17:26:34] Parameter: taxonomy-db: /home/najib/refs/silva_nr_v138_train_set.fa.gz
 * Input directory: MiSeq_SOP/
 * Output directory: /home/najib/dadaist2-sop/
 * Metadata: metadata.tsv
 * Reference database: /home/najib/refs/silva_nr_v138_train_set.fa.gz
 * Threads: 6
 * Temporary directory: /tmp/dadaist2_pwW88L
 * QC strategy: skip
[2022-07-07 17:26:34] QC: Checking quality profile with SeqFu
[2022-07-07 17:26:34] SeqFu quality truncation at (trunc-len-1 and trunc-len-2): 248 - 176
[2022-07-07 17:26:34] Checking dependencies
 * RScript: R scripting front-end version 4.0.5 (2021-03-31)
 * Taxonomy: dadaist2-assigntax 1.1.3
 * assign-taxonomy: dadaist2-assigntax 1.1.3
 * clustalo: 1.2.4
 * dada2 (lib): <pass>
 * exporter: dadaist2-exporter 1.4.0
 * fastp: fastp 0.23.2
 * fasttree: FastTree version 2.1.11 Double precision (No SSE3):
 * fu-primers: fu-primers 1.12.0
[2022-07-07 17:26:40] Temporary directory: /tmp/dadaist2_pwW88L
[2022-07-07 17:26:40] Threads: 6
[2022-07-07 17:26:40] Output directory: /home/najib/dadaist2-sop/
[2022-07-07 17:26:40] Input directory "MiSeq_SOP/": 20 found (paired-end)
[2022-07-07 17:26:40] (1/20) Processing F3D0: skip
[2022-07-07 17:26:40] (2/20) Processing F3D1: skip
[2022-07-07 17:26:41] (3/20) Processing F3D141: skip
[2022-07-07 17:26:41] (4/20) Processing F3D142: skip
[2022-07-07 17:26:41] (5/20) Processing F3D143: skip
[2022-07-07 17:26:41] (6/20) Processing F3D144: skip
[2022-07-07 17:26:41] (7/20) Processing F3D145: skip
[2022-07-07 17:26:42] (8/20) Processing F3D146: skip
[2022-07-07 17:26:42] (9/20) Processing F3D147: skip
[2022-07-07 17:26:42] (10/20) Processing F3D148: skip
[2022-07-07 17:26:43] (11/20) Processing F3D149: skip
[2022-07-07 17:26:43] (12/20) Processing F3D150: skip
[2022-07-07 17:26:44] (13/20) Processing F3D2: skip
[2022-07-07 17:26:44] (14/20) Processing F3D3: skip
[2022-07-07 17:26:45] (15/20) Processing F3D5: skip
[2022-07-07 17:26:45] (16/20) Processing F3D6: skip
[2022-07-07 17:26:45] (17/20) Processing F3D7: skip
[2022-07-07 17:26:45] (18/20) Processing F3D8: skip
[2022-07-07 17:26:46] (19/20) Processing F3D9: skip
[2022-07-07 17:26:46] (20/20) Processing Mock: skip
[2022-07-07 17:26:46] Running DADA2...
[2022-07-07 17:26:46] Dada2 script parameters:
[2022-07-07 17:29:24] DADA2 Finished.
[2022-07-07 17:29:24] Converting dada2 taxonomy output: /tmp/dadaist2_pwW88L/taxonomy.tsv
[2022-07-07 17:29:24] 223 representative sequences found.
[2022-07-07 17:29:24] DADA2 filtered 69.1842% from total 152360 to 105409
[2022-07-07 17:29:24] Multiple sequence alignment and tree generation
[2022-07-07 17:29:27] Feature tree generated
[2022-07-07 17:29:27] Exporting MicrobiomeAnalyst
[2022-07-07 17:29:32] PhyloSeq file not generated: /home/najib/dadaist2-sop/R/phyloseq.rds
[2022-07-07 17:29:32] Diagnostics:
 DADAIST2 Import to PhyloSeq
R version 4.0.5 (2021-03-31)
 * Input:  /home/najib/dadaist2-sop/
 * Loading feature table
 * Taxonomy loaded
 * Tree loaded
 * Metadata loaded
 * PhyloSeq: Feature table done
 * PhyloSeq: Taxonomy table done
 * PhyloSeq: adding tree
Error in validObject(.Object) : invalid class "phyloseq" object:
 Component sample names do not match.
 Try sample_names()
Calls: phyloseq ... do.call -> new -> initialize -> initialize -> validObject
Execution halted

PhyloSeq creation failed. at /home/najib/miniconda3/envs/dadaist/bin/dadaist2-phyloseqMake line 96.
[2022-07-07 17:29:34] Rhea normalization/alpha finished.
[2022-07-07 17:29:34] Dadaist finished, output files saved:
 * dada-taxonomy-table: /home/najib/dadaist2-sop/taxonomy.txt
 * feature-table: /home/najib/dadaist2-sop/feature-table.tsv
 * features-tree: /home/najib/dadaist2-sop/rep-seqs.tree
 * mba-files: /home/najib/dadaist2-sop/MicrobiomeAnalyst
 * multiple-alignment: /home/najib/dadaist2-sop/rep-seqs.msa
 * rep-seqs: /home/najib/dadaist2-sop/rep-seqs.fasta
 * rhea: /home/najib/dadaist2-sop/Rhea

[2022-07-07 17:29:34] Cleaning up

also i used the dadaist2-phyloseqMake and i got the same error.
thank you so much for your help.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

This is good progress!
Now the trap is in a bug (will be fixed in the next release) that you can workaround:

OUTDIR="dadaist2-sop"
sed -i 's/_S210_L001_R1_001.fastq.gz//' "$OUTDIR"/MicrobiomeAnalyst/table.csv
dadaist2-phyloseqMake  -i "$OUTDIR"

Let me know if this fixes the problem on this run.

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

This is good progress! Now the trap is in a bug (will be fixed in the next release) that you can workaround:

OUTDIR="dadaist2-sop"
sed -i 's/_S210_L001_R1_001.fastq.gz//' "$OUTDIR"/MicrobiomeAnalyst/table.csv
dadaist2-phyloseqMake  -i "$OUTDIR"

Let me know if this fixes the problem on this run.

it worked, i think when the dadaist2 generate the metadata.cvs and the table.csv, they give different name:
table.csv

image

metadata.csv

image

thank you for your help.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Thanks for sharing!
The problem was in the feature table generation, which should have the sample IDs: it worked correctly when specifying primers or trimming regions (normal use) but in the tutorial this was skipped and the problem wasn't caught.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

Will close, please open new issues should arise, please let me know :)

from dadaist2.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.