Coder Social home page Coder Social logo

Comments (13)

markschl avatar markschl commented on July 20, 2024 1

Thanks for the hint, I was able to get it running. The results from a small mock community analysis look good, before there were multiple OTUs for a few species (putative chimeras) and now there is just one per species. The Amptk DADA2 pipeline also gives almost the same results.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Thanks @markschl, would you be able to test latest via install with pip to see if working now as you would expect?

from amptk.

markschl avatar markschl commented on July 20, 2024

Thanks for the quick response. I'm having some problems installing the pip version due to dependency issues, I might try at the weekend :)

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

From that environment should be able to just do:

python -m pip install git+https://github.com/nextgenusfs/amptk.git --upgrade --force --no-deps

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Great thanks for the help. I will tag a new release then with these fixes.

from amptk.

markschl avatar markschl commented on July 20, 2024

Great! Although, the 'sortbysize' command from the description here torognes/vsearch#283 is still missing. But as I wrote earlier, I'm not absolutely sure whether this is required or not with the current VSEARCH (whether the UNOISE output is always sorted and whether uchime3_denovo really needs sorted input).

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Yes it is doing sortbysize only for vsearch unoise3.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Yes it is doing sortbysize only for vsearch unoise3.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Yes it is doing sortbysize only for vsearch unoise3.

from amptk.

markschl avatar markschl commented on July 20, 2024

Sorry if I didn't write it clearly enough... I meant that the code linked above proposes a sorting step between the cluster_unoise and the uchime3_denovo step (in the example code the steps are called unoise and uchime_denovo, but I assume that they are equivalent).

However, after having a closer look I actually think that an extra sorting step is not required; I apologize for talking about it. The documentation for uchime_denovo and uchime2_denovo states that the input sequences are automatically sorted by size, only for uchime3_denovo this statement is missing. However, the code and the command output (Sorting by abundance) suggest that the input is automatically size-sorted by VSEARCH in all three cases.

The sorting step before the cluster_unoise command should not be necessary, since derep_fulllength and fastx_uniques already provide sorted output (it's not written in the VSEARCH docs of these commands themselves, but rather in the description of --output, --fastaout and --fastqout, it took me some time to find out...). If the output of the de-replication wasn't size sorted, USEARCH would anyway also stop with an error.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Okay, I did implement the sorting after fastqx_uniques or derep_fulllength (depending on vsearch version which is here: https://github.com/nextgenusfs/amptk/blob/master/amptk/unoise3.py#L156-L183) -- so probably safest to keep the sorting step in there in case older version of vsearch doesn't have fastq_uniques that it is properly sorted before cluster_unoise. which is here: https://github.com/nextgenusfs/amptk/blob/master/amptk/unoise3.py#L187-L224

Does this seem appropriate as it currently is?

And your point about whether uchime_denovo is necessary after usearch mediated unoise3 I also acknowledge, but to keep consistent with the other clustering methods I'd prefer to leave it in (it hopefully isn't harmful). But if you have data that suggests otherwise I can change it. I've not used usearch in a few years after Mac OS versions no longer supported 32-bit binaries......

from amptk.

markschl avatar markschl commented on July 20, 2024

Does this seem appropriate as it currently is?

Sure, the extra sorting step will not do harm at all...

Regarding the USEARCH approach: since the introduction of unoise and cluster_otus (UPARSE), uchime_denovo has been deprecated/removed from USEARCH because these commands do chimera removal "on the fly" and the author states that this is more effective. But your approach may not be a problem, since at least my impression (based on rather limited experience) is that the VSEARCH uchime_denovo seems less strict (removes less chimeras), so the effect of this additional step may be limited.

from amptk.

markschl avatar markschl commented on July 20, 2024

Thanks a lot for fixing this!

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.