Coder Social home page Coder Social logo

Comments (4)

telatin avatar telatin commented on August 29, 2024

Howdy!
the problem must be fixed at the source of the problem: too many filtered reads.
One way of course is just lowering the threshold to allow an aggressive filtering, but in this case 4% of the totals looks worth investigating and maybe adjusting the parameters (truncQ, maxee, trunc...) to have less reads filtered in the first place.
Different providers or sequencing core can have very different output: once you can tune some parameters based on your usual supplier, you should be able to adjust the pipeline quickly.

A way to investigate the biggest loss is checking the dada2-stats file where you'll see the number of reads retained at each step. Can you please post it?

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

thank you for your reply.
the dada2-stats file is a follow:

<style> </style>
  input filtered denoised merged non-chimeric
W2111_R1.fastq.gz 71411 19455 19455 13308 2653
W2201_R1.fastq.gz 76186 23861 23861 20669 2085
W2202_R1.fastq.gz 69819 25303 25303 21829 2245
W2203_R1.fastq.gz 91891 46711 46711 34512 6685
i tried to lower the loss to 5% as suggested but i got the same error.

from dadaist2.

telatin avatar telatin commented on August 29, 2024

From the stats, there is a significant loss in the filtering step, but not dramatic. A further significant loss is in the non-chimeric step. Maybe relaxing the initial filtering can improve the process, if the chimaera detection is right, maybe the library was amplified a lot (or there could be other sources?)

While I do not recommend lowering the loss parameter to bypass the issue, as this is not a bug but a sanity check to prevent misinterpreting potentially noisy results, if the message is DADA2 filtered too many reads: 4.7926%, you should try with 4% (or simply 1% to keep it disabled :) )

from dadaist2.

najibveto avatar najibveto commented on August 29, 2024

hello,
I tried both percentage for loss (1% and 4%) and both of them worked and i got the phyloseq object as well the MicrobiomeAnalyst files, when i tried with 5% loss, i got the usual error.
i put the one with 4% loss.


╭╴ at  ~ via  v3.9.13 via 🅒  dadaist
╰─ dadaist2  --max-loss 0.04 -i metagenome/16S/ -o water -m metadata.tsv -d ~/refs/silva_nr_v138_train_set.fa.gz
    ____            __      _      __ ___
   / __ \____ _____/ /___ _(_)____/ /|__ \
  / / / / __ `/ __  / __ `/ / ___/ __/_/ /
 / /_/ / /_/ / /_/ / /_/ / (__  ) /_/ __/
/_____/\__,_/\__,_/\__,_/_/____/\__/____/

1.2.5

[WARNING] Output directory found.
 This is a warning but in future releases this might require to specify --force to proceed.
[2022-08-16 09:20:21] Ready to log in /home/najib/water/dadaist.log
[2022-08-16 09:20:21] dadaist2 1.2.5
[2022-08-16 09:20:21] Taxonomy database found: /home/najib/refs/silva_nr_v138_train_set.fa.gz
[2022-08-16 09:20:21] Parameter: taxonomy-type: dada2
[2022-08-16 09:20:21] Parameter: taxonomy-db: /home/najib/refs/silva_nr_v138_train_set.fa.gz
 * Input directory: metagenome/16S/
 * Output directory: /home/najib/water/
 * Metadata: metadata.tsv
 * Reference database: /home/najib/refs/silva_nr_v138_train_set.fa.gz
 * Threads: 6
 * Temporary directory: /tmp/dadaist2_1fIjRN
 * QC strategy: skip
[2022-08-16 09:20:21] QC: Checking quality profile with SeqFu
[2022-08-16 09:20:22] SeqFu quality truncation at (trunc-len-1 and trunc-len-2): 290 - 231
[2022-08-16 09:20:22] Checking dependencies
 * RScript: R scripting front-end version 4.0.5 (2021-03-31)
 * Taxonomy: dadaist2-assigntax 1.1.3
 * assign-taxonomy: dadaist2-assigntax 1.1.3
 * clustalo: 1.2.4
 * dada2 (lib): <pass>
 * exporter: dadaist2-exporter 1.4.0
 * fastp: fastp 0.23.2
 * fasttree: FastTree version 2.1.11 Double precision (No SSE3):
 * fu-primers: fu-primers 1.12.0
[2022-08-16 09:20:27] Temporary directory: /tmp/dadaist2_1fIjRN
[2022-08-16 09:20:27] Threads: 6
[2022-08-16 09:20:27] Output directory: /home/najib/water/
[2022-08-16 09:20:27] Checked metadata for autumn
[2022-08-16 09:20:27] Checked metadata for spirng
[2022-08-16 09:20:27] Checked metadata for summer
[2022-08-16 09:20:27] Checked metadata for winter
[2022-08-16 09:20:27] Input directory "metagenome/16S/": 4 found (paired-end)
[2022-08-16 09:20:27] (1/4) Processing autumn: skip
[2022-08-16 09:20:27] Copying input reads for DADA2
[2022-08-16 09:20:27] (2/4) Processing spirng: skip
[2022-08-16 09:20:27] Copying input reads for DADA2
[2022-08-16 09:20:27] (3/4) Processing summer: skip
[2022-08-16 09:20:27] Copying input reads for DADA2
[2022-08-16 09:20:27] (4/4) Processing winter: skip
[2022-08-16 09:20:27] Copying input reads for DADA2
[2022-08-16 09:20:27] Running DADA2...
[2022-08-16 09:20:27] Dada2 script parameters:
 * [1] forward_reads: /tmp/dadaist2_1fIjRN/for
 * [2] reverse_reads: /tmp/dadaist2_1fIjRN/rev
 * [3] feature_table_output: /tmp/dadaist2_1fIjRN/dada2/dada2.tsv
 * [4] stats_output: /tmp/dadaist2_1fIjRN/dada2/stats.tsv
 * [5] filt_forward: /tmp/dadaist2_1fIjRN/for/filtered
 * [6] filt_reverse: /tmp/dadaist2_1fIjRN/rev/filtered
 * [7] truncLenF: 290
 * [8] truncLenR: 231
 * [9] trimLeftF: 0
 * [10] trimLeftR: 0
 * [11] maxEEF: 1
 * [12] maxEER: 1.5
 * [13] truncQ: 10
 * [14] chimeraMethod: consensus
 * [15] minFold: 1
 * [16] threads: 6
 * [17] nreads_learn: 0
 * [18] baseDir: /tmp/dadaist2_1fIjRN
 * [19] doPlots: do_plots
 * [20] taxonomyDb: /home/najib/refs/silva_nr_v138_train_set.fa.gz
 * [21] saveRDS: no
 * [22] noMerge: 0
 * [23] processPool: 0
[2022-08-16 09:28:44] DADA2 Finished.
[2022-08-16 09:28:44] Converting dada2 taxonomy output: /tmp/dadaist2_1fIjRN/taxonomy.tsv
[2022-08-16 09:28:44] 922 representative sequences found.
[2022-08-16 09:28:44] DADA2 filtered 4.7926% from total 486266 to 23305
[2022-08-16 09:28:44] Multiple sequence alignment and tree generation
[2022-08-16 09:29:20] Feature tree generated
[2022-08-16 09:29:20] Exporting MicrobiomeAnalyst
[2022-08-16 09:29:24] Generating PhyloSeq object
[2022-08-16 09:29:26] Rhea normalization/alpha finished.
[2022-08-16 09:29:26] Dadaist finished, output files saved:
 * dada-taxonomy-table: /home/najib/water/taxonomy.txt
 * feature-table: /home/najib/water/feature-table.tsv
 * features-tree: /home/najib/water/rep-seqs.tree
 * mba-files: /home/najib/water/MicrobiomeAnalyst
 * multiple-alignment: /home/najib/water/rep-seqs.msa
 * phyloseq: /home/najib/water/R/phyloseq.rds
 * rep-seqs: /home/najib/water/rep-seqs.fasta
 * rhea: /home/najib/water/Rhea

from dadaist2.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.