Comments (8)
Certainly easy to add... although many projects don't have replicate samples, so I can imagine a scenario where a real OTU is highly abundant and only in a single sample. I guess I might rather manually remove samples like that, but if saves you some processing time trivial to add an option like that.
from amptk.
Should this filter be applied before or after index-bleed filtering? I guess you are using it post current filtering?? So this would be applied after index-bleed and subtraction filtering? That would be the most stringent.
from amptk.
Right, post filtering was the one I was thinking of.
from amptk.
Ok, 739255a. Default is set at 1. This isn't a very "smart" filter... Certainly gets rid of very valid OTUs in the dataset that I ran it on. But as long as you know what is happening it is fine:
$ amptk filter -f otus.fa -i otu_table.txt -o test --min_samples_otu 2
-------------------------------------------------------
[Mar 09 10:57 AM]: OS: MacOSX 10.13.3, 8 cores, ~ 16 GB RAM. Python: 2.7.14
[Mar 09 10:57 AM]: AMPtk v1.1.1, USEARCH v9.2.64, VSEARCH v2.6.2
[Mar 09 10:57 AM]: Loading OTU table: otu_table.txt
[Mar 09 10:57 AM]: OTU table contains 1,133 OTUs and 6,306,944 read counts
[Mar 09 10:57 AM]: Sorting OTU table naturally
[Mar 09 10:57 AM]: Removing OTUs according to --min_reads_otu: (OTUs with less than 2 reads from all samples)
[Mar 09 10:57 AM]: Normalizing OTU table to number of reads per sample
[Mar 09 10:57 AM]: No spike-in mock (-b) or index-bleed (-p) specified, thus not running index-bleed filtering
[Mar 09 10:57 AM]: Dropped 83 OTUs found in fewer than 2 samples
[Mar 09 10:57 AM]: Filtering OTU table down to 1,050 OTUs and 6,281,116 read counts
[Mar 09 10:57 AM]: Filtering valid OTUs
-------------------------------------------------------
OTU Table filtering finished
-------------------------------------------------------
OTU Table Stats: test.stats.txt
Sorted OTU table: test.sorted.txt
Normalized/filter: test.normalized.txt
Final Binary table: test.final.binary.txt
Final OTU table: test.final.txt
Filtered OTUs: test.filtered.otus.fa
-------------------------------------------------------
from amptk.
Cool. I wonder what the average read depth was for each OTU dropped, and what the average of those dropped OTU read depths were compared to the read depth of OTUs retained.
Likewise, I wonder how many samples were dropped when you did this? The report indicates how OTU and read depth change, but not how many samples were affected by it.
Looks great though - thanks!
from amptk.
I can add some more stats, a sample with a single OTU not found in any other sample would be highly unlikely in most datasets wouldn't it?
from amptk.
Yeah, you're right. I don't get that, even with this messy diet stuff. Though I'd say that there are a lot of cases in which I don't have more than 10 or so OTUs I'm confident in for many samples. I'm finding more and more that a few sequences tend to drive the vast majority of signal in each sample; they're not necessarily the same sequence in all samples, but COI amplicons in the bird and bat guano I've been dealing with suggest to me that you do often find samples with little diversity
from amptk.
Perhaps that LULU filtering might be useful? That theoretically would combine "errors" that have propagated from perhaps a single "OTU".
from amptk.
Related Issues (20)
- Issue installing AMPtk (Mac OS - M1 chip) HOT 2
- getting NoneType vs int error in clustering step
- Error when run quick start HOT 7
- usearch9 not found when generate UTAX database
- related with SynMock HOT 2
- Support Python 3.8 onwards HOT 3
- SyntaxError in "duplicate ID in mapping file: XXX, exiting"
- Default for -p, --index_bleed documented as 0.005 HOT 1
- Typo "Bjerkandara adusta" --> "Bjerkandera adusta" HOT 1
- Missing species names in amptk_mock1.fa HOT 3
- Missing final new line in amptk_mock1.fa and amptk_synmock.fa HOT 2
- Inconsistent primer trimming sequence in amptk_mock*.fa HOT 5
- Matching MockA, MockB1 and MockB2 to FASTQ filenames HOT 2
- platform.linux_distribution is removed since Python 3.8 HOT 1
- Species names in amptk_mock2.fa and amptk_mock3.fa vs Figure 4
- new users cannot install amptk properly, please help HOT 3
- unoise3 clustering HOT 5
- Problem with TypeError during AMPtk cluster HOT 11
- Saw you started some prelim ONT methods HOT 2
- Problematic unoise3 implementation with VSEARCH HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amptk.