Coder Social home page Coder Social logo

Comments (5)

nextgenusfs avatar nextgenusfs commented on August 20, 2024

I will have to look back and get confirmation, but my recollection was that these sequences were from Sanger sequencing of the actual clones -- where we might have used a different forward sequencing primer and/or the Sanger data wasn't good. Surely that is where the Y came from (ie Sanger data was inconclusive at that position).

from amptk.

peterjc avatar peterjc commented on August 20, 2024

Yes, the ambiguity is undoubtedly from Sanger capillary sequencing - I was puzzled that it was Y in one file, but the more vague N in another.

My main query was about the trimming being different between the three files under https://github.com/nextgenusfs/amptk/tree/master/amptk/DB - but I may not have understood why there are three files?

$ grep -c "^>" amptk_mock*.fa
amptk_mock1.fa:20
amptk_mock2.fa:26
amptk_mock3.fa:23

Given they have differing numbers of sequences, my assumption was three different mock communities. Quoting the paper,

To construct the biological mock community (BioMock) we selected 26 identified fungal
cultures (Table S1) ...

and:

The BioMock-standards consist of an equimolar mixture of 26 PCR
products thereby removing the PCR bias from mixed DNA samples, while the BioMock
communities consist of an equimolar mixture of 23 single-copy plasmids.

That suggests amptk_mock2.fa is the BioMock-standards (equimolar mixture of 26 PCR products), and amptk_mock3.fa is the BioMock communities (equimolar mixture of 23 single-copy plasmids), leaving amptk_mock1.fa unexplained - perhaps a precursor dataset?

from amptk.

nextgenusfs avatar nextgenusfs commented on August 20, 2024

Hi @peterjc. Yes I think you have it figured out. I had used this to just keep track of what we were running when were were doing these experiments and writing this pipeline. amptk_mock1.fa was our first control sample we ran on Ion Torrent and once we saw results we added a few more species to capture some more variability and make sure we understood what was going on (It actually pre-dates the data in the publication, so at this point it probably shouldn't be in the repo other than for legacy/historical reasons).

So the numerical suffix is the iterations of the different "biological mock" communities that we ran. We had included a few clones in biomock2 that consistently did not sequence well on Ion Torrent for different reasons (mind you this is before we developed the synthetic mock) we then attempted to pare down to the 23 that had enough sequence divergence and give us some confidence in the data analysis pipelines.

The more times we ran this the more we got the idea that the biological sequences are problematic due to index-bleed, hence synthetic mocks as the final solution. I still recommend people run the synthetic mock along with a biological mock community that is representative of species/ITS sequences they are looking to differentiate. The other practical advice when using Illumina is just never re-use any of the index/barcodes and then "index-bleed" is actually quite low on MiSeq platform. I don't think many folks are using Ion Torrent anymore, so practical advice would be to buy the MiSeq instead of the Ion Torrent :).

from amptk.

peterjc avatar peterjc commented on August 20, 2024

That makes sense - as much as we try to document as we go along, there will always be hidden assumptions it takes fresh eyes to see.

I found your work looking for other people doing metabarcoding with synthetic controls. Our paper plans are all delayed though lockdown, but many of our methodical conclusions agree (we're only using the MiSeq). Expect a citation to follow...

So, as to this issue (#89), amptk_mock1.fa was used in pilot data before the work described on the paper, so we shouldn't read too much into it differing from amptk_mock2.fa and amptk_mock3.fa. I will close this issue now, thank you.

from amptk.

nextgenusfs avatar nextgenusfs commented on August 20, 2024

Cool, looking forward to the publication! It took me awhile to remember the different mocks that we tried -- most of this work/sequencing was done in 2014-2016 time frame so hard to recall all the details!

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.