I noticed that Amptk misses the --sizein argument in

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

From that environment should be able to just do: <div class="snippet-clipboard-con

Great! Although, the 'sortbysize' command from the deion here <a class="issue-li

Problematic unoise3 implementation with VSEARCH about amptk HOT 13 CLOSED

markschl commented on July 20, 2024

Problematic unoise3 implementation with VSEARCH

from amptk.

Comments (13)

markschl commented on July 20, 2024 1

Thanks for the hint, I was able to get it running. The results from a small mock community analysis look good, before there were multiple OTUs for a few species (putative chimeras) and now there is just one per species. The Amptk DADA2 pipeline also gives almost the same results.

from amptk.

nextgenusfs commented on July 20, 2024

Thanks @markschl, would you be able to test latest via install with pip to see if working now as you would expect?

from amptk.

markschl commented on July 20, 2024

Thanks for the quick response. I'm having some problems installing the pip version due to dependency issues, I might try at the weekend :)

from amptk.

nextgenusfs commented on July 20, 2024

From that environment should be able to just do:

python -m pip install git+https://github.com/nextgenusfs/amptk.git --upgrade --force --no-deps

from amptk.

nextgenusfs commented on July 20, 2024

Great thanks for the help. I will tag a new release then with these fixes.

from amptk.

markschl commented on July 20, 2024

Great! Although, the 'sortbysize' command from the description here torognes/vsearch#283 is still missing. But as I wrote earlier, I'm not absolutely sure whether this is required or not with the current VSEARCH (whether the UNOISE output is always sorted and whether uchime3_denovo really needs sorted input).

from amptk.

nextgenusfs commented on July 20, 2024

Yes it is doing sortbysize only for vsearch unoise3.

from amptk.

nextgenusfs commented on July 20, 2024

Yes it is doing sortbysize only for vsearch unoise3.

from amptk.

nextgenusfs commented on July 20, 2024

Yes it is doing sortbysize only for vsearch unoise3.

from amptk.

markschl commented on July 20, 2024

Sorry if I didn't write it clearly enough... I meant that the code linked above proposes a sorting step between the cluster_unoise and the uchime3_denovo step (in the example code the steps are called unoise and uchime_denovo, but I assume that they are equivalent).

However, after having a closer look I actually think that an extra sorting step is not required; I apologize for talking about it. The documentation for uchime_denovo and uchime2_denovo states that the input sequences are automatically sorted by size, only for uchime3_denovo this statement is missing. However, the code and the command output (Sorting by abundance) suggest that the input is automatically size-sorted by VSEARCH in all three cases.

The sorting step before the cluster_unoise command should not be necessary, since derep_fulllength and fastx_uniques already provide sorted output (it's not written in the VSEARCH docs of these commands themselves, but rather in the description of --output, --fastaout and --fastqout, it took me some time to find out...). If the output of the de-replication wasn't size sorted, USEARCH would anyway also stop with an error.

from amptk.

nextgenusfs commented on July 20, 2024

Okay, I did implement the sorting after fastqx_uniques or derep_fulllength (depending on vsearch version which is here: https://github.com/nextgenusfs/amptk/blob/master/amptk/unoise3.py#L156-L183) -- so probably safest to keep the sorting step in there in case older version of vsearch doesn't have fastq_uniques that it is properly sorted before cluster_unoise. which is here: https://github.com/nextgenusfs/amptk/blob/master/amptk/unoise3.py#L187-L224

Does this seem appropriate as it currently is?

And your point about whether uchime_denovo is necessary after usearch mediated unoise3 I also acknowledge, but to keep consistent with the other clustering methods I'd prefer to leave it in (it hopefully isn't harmful). But if you have data that suggests otherwise I can change it. I've not used usearch in a few years after Mac OS versions no longer supported 32-bit binaries......

from amptk.

markschl commented on July 20, 2024

Does this seem appropriate as it currently is?

Sure, the extra sorting step will not do harm at all...

Regarding the USEARCH approach: since the introduction of unoise and cluster_otus (UPARSE), uchime_denovo has been deprecated/removed from USEARCH because these commands do chimera removal "on the fly" and the author states that this is more effective. But your approach may not be a problem, since at least my impression (based on rather limited experience) is that the VSEARCH uchime_denovo seems less strict (removes less chimeras), so the effect of this additional step may be limited.

from amptk.

markschl commented on July 20, 2024

Thanks a lot for fixing this!

from amptk.

Problematic unoise3 implementation with VSEARCH about amptk HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent