Comments (3)
You can demux each of the files using amptk illumina2
and then concatenate them together, i.e. would look something like:
amptk illumina2 -i bcl2fastq_output1_R1.fastq --reverse bcl2fastq_output1_R2.fastq \
-f GTGARTCATCGARTCTTTG -r ITS4 --barcode_fasta forwardTags.fa \
--reverse_barcode reverseTags.fa -o output1
So if you run something like these for each of the sequencing center demultiplexed files and then concatenate the output1.demux.fq output2.demux.fq, etc then should be all set for any of the downstream AMPtk steps. The forward and reverse tags need to be in a multi-fasta file i.e.
>Tag1F
GATCCGATA
>Tag2F
GATTTTAAG
...
...
Other settings to consider would the the --trim_len
parameter, default is 300 so since you have 2x250 you may want to reduce this to something like 240 bp so that you can rescue forward reads if merging of the PE reads isn't successful. But if reverse reads are high quality should be okay, keep in mind that if you specify a --trim_len
larger than your read length then only reads that are properly merged will be used. If the files are too large to be merged with 32-bit usearch (free version) you can flip on PE merging of reads using vsearch by adding --merge_method vsearch
to the above commands. If it was a MiSeq run then you shouldn't have to do this.
from amptk.
What format is your data in now? Probably the most important data analysis is actually the pre-processing of reads, so I would caution against doing primer stripping and quality filtering with other methods. The pre-processing scripts are fairly versatile and might work with the way you have your raw data. And I'm willing to add another method if it is a commonly used format.
You can generate a compatible OTU table with vsearch
, the FASTQ headers need to contain which sampleID the read originated from. AMPtk uses ;barcodelabel=sample;
format.
from amptk.
My samples were prepared (MiSeq, PE, 2x250bp) with unique combinations of tags on the 5' end of both the forward (gITS7) and reverse (ITS4) primers. These samples were split into pools and illumina indices were ligated onto the constructs (TruSeq PCR-Free LT Library Prep Kit), with a different set of indices used for each pool. The final construct looks like:
P5-Index1-For_Tag:For_Primer-Amplicon-Rev_Primer:Rev_Tag-Index2-P7
The sequencing center demultiplexed the sequences by the indices such that I received files for each of my pooled samples that, after merging, would theoretically look like:
For_Tag1:For_Primer-Amplicon-Rev_Primer:Rev_Tag1
For_Tag2:For_Primer-Amplicon-Rev_Primer:Rev_Tag2
For_Tag3:For_Primer-Amplicon-Rev-Primer:Rev_Tag3
etc.
Where each unique tag combination represents amplicons from a given sample (but all in a single fastq file). Although these are Illumina sequences, the structure is more similar to what you describe for processing Ion Torrent or 454 data, but with an additional tag attached to the reverse primer.
I am currently demultiplexing the R1 (using the For_Tags) and R2 (using the Rev_Tags) files independently, looking in both the 5'-3' and 3'-5' orientations (how does amptk handle mixed orientations?). The tags and primers are then trimmed, and the sequences are fed into DADA2 where they are quality filtered, merged, and denoised. The resulting OTU tables are then re-formatted into a vsearch-compatible format for further processing.
from amptk.
Related Issues (20)
- Issue installing AMPtk (Mac OS - M1 chip) HOT 2
- amptk database error HOT 2
- Custom 16S database not found after successful installation HOT 11
- COI Database build strips specific epithet from species column HOT 1
- related with SynMock HOT 2
- Support Python 3.8 onwards HOT 3
- SyntaxError in "duplicate ID in mapping file: XXX, exiting"
- Default for -p, --index_bleed documented as 0.005 HOT 1
- Typo "Bjerkandara adusta" --> "Bjerkandera adusta" HOT 1
- Missing species names in amptk_mock1.fa HOT 3
- Missing final new line in amptk_mock1.fa and amptk_synmock.fa HOT 2
- Inconsistent primer trimming sequence in amptk_mock*.fa HOT 5
- Matching MockA, MockB1 and MockB2 to FASTQ filenames HOT 2
- platform.linux_distribution is removed since Python 3.8 HOT 1
- Species names in amptk_mock2.fa and amptk_mock3.fa vs Figure 4
- new users cannot install amptk properly, please help HOT 3
- unoise3 clustering HOT 5
- Problem with TypeError during AMPtk cluster HOT 11
- Saw you started some prelim ONT methods HOT 2
- Problematic unoise3 implementation with VSEARCH HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from amptk.