Coder Social home page Coder Social logo

Comments (8)

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Certainly easy to add... although many projects don't have replicate samples, so I can imagine a scenario where a real OTU is highly abundant and only in a single sample. I guess I might rather manually remove samples like that, but if saves you some processing time trivial to add an option like that.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Should this filter be applied before or after index-bleed filtering? I guess you are using it post current filtering?? So this would be applied after index-bleed and subtraction filtering? That would be the most stringent.

from amptk.

devonorourke avatar devonorourke commented on July 20, 2024

Right, post filtering was the one I was thinking of.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Ok, 739255a. Default is set at 1. This isn't a very "smart" filter... Certainly gets rid of very valid OTUs in the dataset that I ran it on. But as long as you know what is happening it is fine:

$ amptk filter -f otus.fa -i otu_table.txt -o test --min_samples_otu 2
-------------------------------------------------------
[Mar 09 10:57 AM]: OS: MacOSX 10.13.3, 8 cores, ~ 16 GB RAM. Python: 2.7.14
[Mar 09 10:57 AM]: AMPtk v1.1.1, USEARCH v9.2.64, VSEARCH v2.6.2
[Mar 09 10:57 AM]: Loading OTU table: otu_table.txt
[Mar 09 10:57 AM]: OTU table contains 1,133 OTUs and 6,306,944 read counts
[Mar 09 10:57 AM]: Sorting OTU table naturally
[Mar 09 10:57 AM]: Removing OTUs according to --min_reads_otu: (OTUs with less than 2 reads from all samples)
[Mar 09 10:57 AM]: Normalizing OTU table to number of reads per sample
[Mar 09 10:57 AM]: No spike-in mock (-b) or index-bleed (-p) specified, thus not running index-bleed filtering
[Mar 09 10:57 AM]: Dropped 83 OTUs found in fewer than 2 samples
[Mar 09 10:57 AM]: Filtering OTU table down to 1,050 OTUs and 6,281,116 read counts
[Mar 09 10:57 AM]: Filtering valid OTUs
-------------------------------------------------------
OTU Table filtering finished
-------------------------------------------------------
OTU Table Stats:      test.stats.txt
Sorted OTU table:     test.sorted.txt
Normalized/filter:    test.normalized.txt
Final Binary table:   test.final.binary.txt
Final OTU table:      test.final.txt
Filtered OTUs:        test.filtered.otus.fa
-------------------------------------------------------

from amptk.

devonorourke avatar devonorourke commented on July 20, 2024

Cool. I wonder what the average read depth was for each OTU dropped, and what the average of those dropped OTU read depths were compared to the read depth of OTUs retained.

Likewise, I wonder how many samples were dropped when you did this? The report indicates how OTU and read depth change, but not how many samples were affected by it.

Looks great though - thanks!

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

I can add some more stats, a sample with a single OTU not found in any other sample would be highly unlikely in most datasets wouldn't it?

from amptk.

devonorourke avatar devonorourke commented on July 20, 2024

Yeah, you're right. I don't get that, even with this messy diet stuff. Though I'd say that there are a lot of cases in which I don't have more than 10 or so OTUs I'm confident in for many samples. I'm finding more and more that a few sequences tend to drive the vast majority of signal in each sample; they're not necessarily the same sequence in all samples, but COI amplicons in the bird and bat guano I've been dealing with suggest to me that you do often find samples with little diversity

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

Perhaps that LULU filtering might be useful? That theoretically would combine "errors" that have propagated from perhaps a single "OTU".

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.