Coder Social home page Coder Social logo

Composition of each OTU? about amptk HOT 6 CLOSED

nextgenusfs avatar nextgenusfs commented on July 20, 2024
Composition of each OTU?

from amptk.

Comments (6)

MycoMap avatar MycoMap commented on July 20, 2024

My primary interest in environmental data is for floristic efforts. 97% clustering is good generally for fungi, but if you are looking for species-level taxonomic resolution, many species will be clustered together that should not be, and there will be separate OTUs for the same species. Further examination of OTU composition is helpful when looking closely at individual OTUs.

from amptk.

MycoMap avatar MycoMap commented on July 20, 2024

I wouldn't be looking so much for all of the individual reads, but rather the representative sequences that make up the OTUs.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

The OTU sequence in UPARSE by definition is the most abundant sequence in each cluster. It stems from dereplicating the data to find identical sequences, which are then sorted by abundance (the assumption here is that the more abundant sequences are the higher the likelihood that they are not errors), and then the algorithm moves from most abundant to least abundant defining clusters at the 97% threshold. So essentially all reads that map to the OTU at 100% are representative sequences. Other OTU picking algorithms identify a "centroid" sequence which can be interpreted as representative sequence, UPARSE by default uses representative sequences as the consensus OTU.

There are lots of reasons that 97% is used for establishing OTUs and ITS is more complicated than other amplicons such as 16S as there are many examples of intra-variation in ITS, that is a single isolate has multiple ITS sequences that may be divergent by more than 97% (this happens quite a bit). So the idea of actually getting species level resolution for some groups isn't really possible, for other groups ITS works well. I've seen different species with the same ITS sequence, so really the idea behind OTUs is in the name "operational taxonomic unit" - it isn't really a proxy for species (although it's easier to think that it is).

You can give the DADA2 or UNOSIE2 algorithm a try, they are a "new breed" of algorithms that try to error correct sequences as opposed to clustering. DADA2 has been shown to be accurate to a single base pair (but currently requires all sequences be trimmed to a set length), whereas UNOISE2 claims to be better than or equal to DADA2 (all are author claims of course and all tested with 16S data). Both the algorithms are incorporated into ufits.

from amptk.

MycoMap avatar MycoMap commented on July 20, 2024

Ive done a bit of testing of different clustering methods with curated reference datasets. Most of this involved the centroid clustering methodology. I am well aware that ITS OTU clusters are not a perfect proxy for species-level resolution, but it is very informative in most cases for macrofungi. I will check out the DADA2 and UNOISE algorithms.

from amptk.

MycoMap avatar MycoMap commented on July 20, 2024

Just an FYI, the software automatically looks to install DADA2 in a location on the compute cluster that I do not have write permissions for. I am going to try it out on a server where I have better control over the entire file structure.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

You should install manually and it will then not try to install automatically.

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.