Hi Jon, I ultimately end up doing this in R with the OTU table output from amptk,

Ok, <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https://gi

feature request for 'amptk filter' about amptk HOT 8 CLOSED

nextgenusfs commented on July 20, 2024

feature request for 'amptk filter'

from amptk.

Comments (8)

nextgenusfs commented on July 20, 2024

Certainly easy to add... although many projects don't have replicate samples, so I can imagine a scenario where a real OTU is highly abundant and only in a single sample. I guess I might rather manually remove samples like that, but if saves you some processing time trivial to add an option like that.

from amptk.

nextgenusfs commented on July 20, 2024

Should this filter be applied before or after index-bleed filtering? I guess you are using it post current filtering?? So this would be applied after index-bleed and subtraction filtering? That would be the most stringent.

from amptk.

devonorourke commented on July 20, 2024

Right, post filtering was the one I was thinking of.

from amptk.

nextgenusfs commented on July 20, 2024

Ok, 739255a. Default is set at 1. This isn't a very "smart" filter... Certainly gets rid of very valid OTUs in the dataset that I ran it on. But as long as you know what is happening it is fine:

$ amptk filter -f otus.fa -i otu_table.txt -o test --min_samples_otu 2
-------------------------------------------------------
[Mar 09 10:57 AM]: OS: MacOSX 10.13.3, 8 cores, ~ 16 GB RAM. Python: 2.7.14
[Mar 09 10:57 AM]: AMPtk v1.1.1, USEARCH v9.2.64, VSEARCH v2.6.2
[Mar 09 10:57 AM]: Loading OTU table: otu_table.txt
[Mar 09 10:57 AM]: OTU table contains 1,133 OTUs and 6,306,944 read counts
[Mar 09 10:57 AM]: Sorting OTU table naturally
[Mar 09 10:57 AM]: Removing OTUs according to --min_reads_otu: (OTUs with less than 2 reads from all samples)
[Mar 09 10:57 AM]: Normalizing OTU table to number of reads per sample
[Mar 09 10:57 AM]: No spike-in mock (-b) or index-bleed (-p) specified, thus not running index-bleed filtering
[Mar 09 10:57 AM]: Dropped 83 OTUs found in fewer than 2 samples
[Mar 09 10:57 AM]: Filtering OTU table down to 1,050 OTUs and 6,281,116 read counts
[Mar 09 10:57 AM]: Filtering valid OTUs
-------------------------------------------------------
OTU Table filtering finished
-------------------------------------------------------
OTU Table Stats:      test.stats.txt
Sorted OTU table:     test.sorted.txt
Normalized/filter:    test.normalized.txt
Final Binary table:   test.final.binary.txt
Final OTU table:      test.final.txt
Filtered OTUs:        test.filtered.otus.fa
-------------------------------------------------------

from amptk.

devonorourke commented on July 20, 2024

Cool. I wonder what the average read depth was for each OTU dropped, and what the average of those dropped OTU read depths were compared to the read depth of OTUs retained.

Likewise, I wonder how many samples were dropped when you did this? The report indicates how OTU and read depth change, but not how many samples were affected by it.

Looks great though - thanks!

from amptk.

nextgenusfs commented on July 20, 2024

I can add some more stats, a sample with a single OTU not found in any other sample would be highly unlikely in most datasets wouldn't it?

from amptk.

devonorourke commented on July 20, 2024

Yeah, you're right. I don't get that, even with this messy diet stuff. Though I'd say that there are a lot of cases in which I don't have more than 10 or so OTUs I'm confident in for many samples. I'm finding more and more that a few sequences tend to drive the vast majority of signal in each sample; they're not necessarily the same sequence in all samples, but COI amplicons in the bird and bat guano I've been dealing with suggest to me that you do often find samples with little diversity

from amptk.

nextgenusfs commented on July 20, 2024

Perhaps that LULU filtering might be useful? That theoretically would combine "errors" that have propagated from perhaps a single "OTU".

from amptk.

feature request for 'amptk filter' about amptk HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent