Coder Social home page Coder Social logo

Comments (11)

nextgenusfs avatar nextgenusfs commented on August 20, 2024

I'm not entirely sure what the error is, are you able to make a smaller test dataset where this error is reproducible so I can debug?

from amptk.

nextgenusfs avatar nextgenusfs commented on August 20, 2024

Or if this data is public via SRA I can try to download, i see this accession in your demux log, SRR11014838. Just give me the commands you issued so I can repeat.

from amptk.

haideruni avatar haideruni commented on August 20, 2024

The data is publicly available, that accession is a single-end read. This error also occurred for my paired-end reads as well.

Here are the commands I ran for the SRR11014838 file. $INPUT is the input file variable and $OUTPUT is the output file variable.
amptk SRA -i $INPUT -f LR0R -r LR2R -o "$OUTPUT" --cpus 40 --min_len 80 --require_primer off --primer_mismatch 2
amptk cluster -i "$OUTPUT".demux.fq.gz -o "$OUTPUT"
The amptk cluster step is where the error occurs.

It was also trimmed using trimmomatic under this piece of code before having those amptk commands being run on it, I'm not sure if it makes a difference but I've put the code here. $EBROOTTRIMMOMATIC/trimmomatic-0.36.jar is just the path to the trimmomatic jar.
java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.36.jar SE -phred33 $INFILE $OUTFILE.fastq ILLUMINACLIP:TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 CROP:300

from amptk.

nextgenusfs avatar nextgenusfs commented on August 20, 2024

Definitely do not trim with trimmomatic. I will try to look at it this weekend.

from amptk.

nextgenusfs avatar nextgenusfs commented on August 20, 2024

So this experiment appears to be metagenomic data? -- AMPtk is for amplicons... https://www.ncbi.nlm.nih.gov/sra/?term=SRR11014838

from amptk.

haideruni avatar haideruni commented on August 20, 2024

The metagenomic dataset should include amplicon data so it should be okay. The other files I worked with that are metagenomic were able to provide me with results and a taxonomy file at the end when using AMPtk.

Is there a reason why trimmomatic shouldn't be used for trimming?

from amptk.

nextgenusfs avatar nextgenusfs commented on August 20, 2024

Sounds like you might be confused or at least I am then on what is in this dataset....? AMPtk is for pooled amplicons, ie environmental samples with different barcodes -- so the software is expecting barcode--primer--amplicon--primer--barcode, metagenomics data is derived from random ligation of adapters. While you could certainly pull out reads that have primer sequences and cluster them -- there really isn't a method for that in AMPtk. Per trimmomatic, I thought you had amplicon data -- so AMPtk works by first finding for/rev primers, trimming them off, then using expected errors quality trimming to filter data to be clustered into OTUS -- so you do not want to 3' or adapter trim the data before it gets to AMPtk.

from amptk.

haideruni avatar haideruni commented on August 20, 2024

Sorry for the delayed responses, I'm having some technical issues on my end throughout this week. Thank you for that thorough explanation, you may probably be able to tell by now but I am a beginner to computational biology in general and learning as I go, so I appreciate that in-depth explanation. I'll try to provide more context in this response. It's a bit tricky to explain as an undergrad student but I'll try my best.

The lab I'm working with has tasked me with exploring the output of AMPtk with metagenomic data, resulting in me using AMPtk with metagenomic files, even if it isn't the intended method. To ensure I'm not confused and understand , is the TypeError being caused purely because the metagenomic data isn't the intended dataset for AMPtk and the TypeError is caused by that difference in the input files? I've had other metagenomic files found on public records that still provide an end Taxonomy result. If one were to correct the TypeError so that metagenomic files could go through AMPtk how would one have to achieve that?

from amptk.

nextgenusfs avatar nextgenusfs commented on August 20, 2024

I don't know the exact cause of the TypeError in the code -- but I suspect it is because the input data is not what AMPtk is expecting.

Just googled quickly and saw this comparison of the methods, perhaps that will be helpful. There is a lot of terminology to become familiar with. https://microbialdarkmatter.org/index.php/blog/2-the-difference-between-metagenomics-and-amplicon-sequencing

Generally metagenomics experiments use different tools, ie something like Kraken (and many many others) aimed at classification of the reads in that experiment. Whereas, amplicon sequencing (metabarcoding) involves amplification of a single region of all the organisms in an environment -- and then we try to process those data and identify which organisms are in that sample.

from amptk.

haideruni avatar haideruni commented on August 20, 2024

Thank you for that resource, I'll continue researching and learning the different terminology.

Is there a chance you could provide some insight as to why you think some of the metagenomic data files are able to be processed while others aren't? The original file wasn't able to be processed (https://www.ncbi.nlm.nih.gov/sra/?term=SRR11014838) however, other metagenomic files in a similar format are able to be processed without running into the TypeError (https://www.ncbi.nlm.nih.gov/sra/?term=SRR14024353). But if AMPtk isn't for metagenomic files then shouldn't AMPtk not be able to process any metagenomic files at all?

from amptk.

haideruni avatar haideruni commented on August 20, 2024

It's been a year since this issue and I've learned more about bioinformatics and metagenomics, I'll close this issue because the error is very likely due to my usage of incorrect input files. Thank you for your help and for discussing some of the background in metagenomics and amplicons in your responses.

from amptk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.