Coder Social home page Coder Social logo

Comments (3)

nextgenusfs avatar nextgenusfs commented on July 20, 2024 1

You can demux each of the files using amptk illumina2 and then concatenate them together, i.e. would look something like:

amptk illumina2 -i bcl2fastq_output1_R1.fastq --reverse bcl2fastq_output1_R2.fastq \
     -f GTGARTCATCGARTCTTTG -r ITS4 --barcode_fasta forwardTags.fa  \
     --reverse_barcode reverseTags.fa -o output1

So if you run something like these for each of the sequencing center demultiplexed files and then concatenate the output1.demux.fq output2.demux.fq, etc then should be all set for any of the downstream AMPtk steps. The forward and reverse tags need to be in a multi-fasta file i.e.

>Tag1F
GATCCGATA
>Tag2F
GATTTTAAG
...
...

Other settings to consider would the the --trim_len parameter, default is 300 so since you have 2x250 you may want to reduce this to something like 240 bp so that you can rescue forward reads if merging of the PE reads isn't successful. But if reverse reads are high quality should be okay, keep in mind that if you specify a --trim_len larger than your read length then only reads that are properly merged will be used. If the files are too large to be merged with 32-bit usearch (free version) you can flip on PE merging of reads using vsearch by adding --merge_method vsearch to the above commands. If it was a MiSeq run then you shouldn't have to do this.

from amptk.

nextgenusfs avatar nextgenusfs commented on July 20, 2024

What format is your data in now? Probably the most important data analysis is actually the pre-processing of reads, so I would caution against doing primer stripping and quality filtering with other methods. The pre-processing scripts are fairly versatile and might work with the way you have your raw data. And I'm willing to add another method if it is a commonly used format.

You can generate a compatible OTU table with vsearch, the FASTQ headers need to contain which sampleID the read originated from. AMPtk uses ;barcodelabel=sample; format.

from amptk.

jack1120 avatar jack1120 commented on July 20, 2024

My samples were prepared (MiSeq, PE, 2x250bp) with unique combinations of tags on the 5' end of both the forward (gITS7) and reverse (ITS4) primers. These samples were split into pools and illumina indices were ligated onto the constructs (TruSeq PCR-Free LT Library Prep Kit), with a different set of indices used for each pool. The final construct looks like:

P5-Index1-For_Tag:For_Primer-Amplicon-Rev_Primer:Rev_Tag-Index2-P7

The sequencing center demultiplexed the sequences by the indices such that I received files for each of my pooled samples that, after merging, would theoretically look like:

For_Tag1:For_Primer-Amplicon-Rev_Primer:Rev_Tag1
For_Tag2:For_Primer-Amplicon-Rev_Primer:Rev_Tag2
For_Tag3:For_Primer-Amplicon-Rev-Primer:Rev_Tag3
etc.

Where each unique tag combination represents amplicons from a given sample (but all in a single fastq file). Although these are Illumina sequences, the structure is more similar to what you describe for processing Ion Torrent or 454 data, but with an additional tag attached to the reverse primer.

I am currently demultiplexing the R1 (using the For_Tags) and R2 (using the Rev_Tags) files independently, looking in both the 5'-3' and 3'-5' orientations (how does amptk handle mixed orientations?). The tags and primers are then trimmed, and the sequences are fed into DADA2 where they are quality filtered, merged, and denoised. The resulting OTU tables are then re-formatted into a vsearch-compatible format for further processing.

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.