Coder Social home page Coder Social logo

SE adapter trimmer about htstream HOT 9 CLOSED

s4hts avatar s4hts commented on July 20, 2024
SE adapter trimmer

from htstream.

Comments (9)

msettles avatar msettles commented on July 20, 2024

Adapters can only be found on the 3' end and can be 1bp of the adapter (not expected to identify) to the full adapter, plus bogus sequence afterward. You should actually know this and now I'm asking about it at your defense, so study up ;)

Look at the description/algorithm in sickle http://bioinformatics.ucdavis.edu/research-computing/software/ and other papers.

I would add it to the existing adapter-removal application (for SE input only), you can verify using the PE approach, so a R1 trimmed of adapter should produce the same results as R1/R2 trimmed of adapter using overlap method.

On 5' primers, I would do something different and maybe we can put in as a feature request later. I have an algorithm for this that I use in dbcAmplicons that can be used as a starting point, but this is not a priority.

We are doing alot of Quantseq libraries right now, which are SE reads, preprocessing includes 5' trim, adapter removal and polyA/T removal. At the moment, it would seem SE adapter removal is the only thing missing from HTStream

from htstream.

samhunter avatar samhunter commented on July 20, 2024

A few thoughts,

  1. Adapter trimming for SE reads would be nice, but alternatively, maybe tell the lab to size select if you plan to do SE reads? Or.. sequence shorter reads? Some exploratory testing I did with pattern-based adapter trimming suggested that it became very difficult to accurately remove them when the length got below a certain threshold. Probably a more sophisticated method could do better, but I don't think anyone has great results. Also, I have read recently that the new pseudo-mappers are much less negatively impacted by poly-A and adapter sequences, so this might be somewhat of a moot point (reads with full-length adapters may not map uniquely anyway).

  2. This is probably a wet-lab problem.... Just tell the lab to size select your libraries better. What are they doing sequencing such short fragments? That short junk is probably mostly primer dimers anyway!!!

  3. Weird, why do you have a 5' adapter on your Quantseq libraries? Are you doing the 3' mRNA-Seq FWD kit, or the REV kit? The FWD kit suggests only sequencing toward the poly-A end of the transcript, so your reads should only have poly-A and adapter on the 3' end correct? Or do you mean that you are trimming the reads on the 5' end for quality or something else?

from htstream.

msettles avatar msettles commented on July 20, 2024

from htstream.

samhunter avatar samhunter commented on July 20, 2024

Ahh, so trimming off the random primer for the Quantseq libs. I guess that sort of makes sense, but, the point is just to count transcripts, so it would be interesting to see if there are real impacts.

David and I chatted about this a bit. We like the idea of having a pattern-based 3' or 5' trimmer that could be used for adapters, primers, or whatever else. It will almost certainly be a different application from any of the current trimmers, and might not get implemented right away (due to other more pressing priorities), but would add a nice tool to the tool set.

from htstream.

msettles avatar msettles commented on July 20, 2024

from htstream.

dstreett avatar dstreett commented on July 20, 2024

@msettles - Just an FYI, they are called LIns (Long INserts) and SIns (Short INserts) now. :) (Biggies and smalls was rejected due to some minor issues with trademarks.)

LIns and SIns better ways to describe Illumina overlaps.

from htstream.

msettles avatar msettles commented on July 20, 2024

from htstream.

msettles avatar msettles commented on July 20, 2024

Well shouldn't it be overlaps instead of inserts? really as far as inserts go there are 3 classes, 'short' < length of read and contain adapter, 'medium' they are > length of reads but < length of 2x read, and 'long' > length of both reads (results in a pair)

from htstream.

msettles avatar msettles commented on July 20, 2024

addressed in #122

from htstream.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.