Comments (13)
Thanks for the hint, I was able to get it running. The results from a small mock community analysis look good, before there were multiple OTUs for a few species (putative chimeras) and now there is just one per species. The Amptk DADA2 pipeline also gives almost the same results.
from amptk.
Thanks @markschl, would you be able to test latest via install with pip to see if working now as you would expect?
from amptk.
Thanks for the quick response. I'm having some problems installing the pip version due to dependency issues, I might try at the weekend :)
from amptk.
From that environment should be able to just do:
python -m pip install git+https://github.com/nextgenusfs/amptk.git --upgrade --force --no-deps
from amptk.
Great thanks for the help. I will tag a new release then with these fixes.
from amptk.
Great! Although, the 'sortbysize' command from the description here torognes/vsearch#283 is still missing. But as I wrote earlier, I'm not absolutely sure whether this is required or not with the current VSEARCH (whether the UNOISE output is always sorted and whether uchime3_denovo
really needs sorted input).
from amptk.
Yes it is doing sortbysize only for vsearch unoise3.
from amptk.
Yes it is doing sortbysize only for vsearch unoise3.
from amptk.
Yes it is doing sortbysize only for vsearch unoise3.
from amptk.
Sorry if I didn't write it clearly enough... I meant that the code linked above proposes a sorting step between the cluster_unoise
and the uchime3_denovo
step (in the example code the steps are called unoise
and uchime_denovo
, but I assume that they are equivalent).
However, after having a closer look I actually think that an extra sorting step is not required; I apologize for talking about it. The documentation for uchime_denovo
and uchime2_denovo
states that the input sequences are automatically sorted by size, only for uchime3_denovo
this statement is missing. However, the code and the command output (Sorting by abundance
) suggest that the input is automatically size-sorted by VSEARCH in all three cases.
The sorting step before the cluster_unoise
command should not be necessary, since derep_fulllength
and fastx_uniques
already provide sorted output (it's not written in the VSEARCH docs of these commands themselves, but rather in the description of --output
, --fastaout
and --fastqout
, it took me some time to find out...). If the output of the de-replication wasn't size sorted, USEARCH would anyway also stop with an error.
from amptk.
Okay, I did implement the sorting after fastqx_uniques
or derep_fulllength
(depending on vsearch version which is here: https://github.com/nextgenusfs/amptk/blob/master/amptk/unoise3.py#L156-L183) -- so probably safest to keep the sorting step in there in case older version of vsearch doesn't have fastq_uniques
that it is properly sorted before cluster_unoise
. which is here: https://github.com/nextgenusfs/amptk/blob/master/amptk/unoise3.py#L187-L224
Does this seem appropriate as it currently is?
And your point about whether uchime_denovo
is necessary after usearch
mediated unoise3
I also acknowledge, but to keep consistent with the other clustering methods I'd prefer to leave it in (it hopefully isn't harmful). But if you have data that suggests otherwise I can change it. I've not used usearch
in a few years after Mac OS versions no longer supported 32-bit binaries......
from amptk.
Does this seem appropriate as it currently is?
Sure, the extra sorting step will not do harm at all...
Regarding the USEARCH approach: since the introduction of unoise
and cluster_otus
(UPARSE), uchime_denovo
has been deprecated/removed from USEARCH because these commands do chimera removal "on the fly" and the author states that this is more effective. But your approach may not be a problem, since at least my impression (based on rather limited experience) is that the VSEARCH uchime_denovo
seems less strict (removes less chimeras), so the effect of this additional step may be limited.
from amptk.
Thanks a lot for fixing this!
from amptk.
Related Issues (20)
- Issue installing AMPtk (Mac OS - M1 chip) HOT 2
- getting NoneType vs int error in clustering step
- Error when run quick start HOT 7
- COI Database build strips specific epithet from species column HOT 1
- related with SynMock HOT 2
- Support Python 3.8 onwards HOT 3
- SyntaxError in "duplicate ID in mapping file: XXX, exiting"
- Default for -p, --index_bleed documented as 0.005 HOT 1
- Typo "Bjerkandara adusta" --> "Bjerkandera adusta" HOT 1
- Missing species names in amptk_mock1.fa HOT 3
- Missing final new line in amptk_mock1.fa and amptk_synmock.fa HOT 2
- Inconsistent primer trimming sequence in amptk_mock*.fa HOT 5
- Matching MockA, MockB1 and MockB2 to FASTQ filenames HOT 2
- platform.linux_distribution is removed since Python 3.8 HOT 1
- Species names in amptk_mock2.fa and amptk_mock3.fa vs Figure 4
- new users cannot install amptk properly, please help HOT 3
- unoise3 clustering HOT 5
- Problem with TypeError during AMPtk cluster HOT 11
- Saw you started some prelim ONT methods HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amptk.