Comments (9)
Adapters can only be found on the 3' end and can be 1bp of the adapter (not expected to identify) to the full adapter, plus bogus sequence afterward. You should actually know this and now I'm asking about it at your defense, so study up ;)
Look at the description/algorithm in sickle http://bioinformatics.ucdavis.edu/research-computing/software/ and other papers.
I would add it to the existing adapter-removal application (for SE input only), you can verify using the PE approach, so a R1 trimmed of adapter should produce the same results as R1/R2 trimmed of adapter using overlap method.
On 5' primers, I would do something different and maybe we can put in as a feature request later. I have an algorithm for this that I use in dbcAmplicons that can be used as a starting point, but this is not a priority.
We are doing alot of Quantseq libraries right now, which are SE reads, preprocessing includes 5' trim, adapter removal and polyA/T removal. At the moment, it would seem SE adapter removal is the only thing missing from HTStream
from htstream.
A few thoughts,
-
Adapter trimming for SE reads would be nice, but alternatively, maybe tell the lab to size select if you plan to do SE reads? Or.. sequence shorter reads? Some exploratory testing I did with pattern-based adapter trimming suggested that it became very difficult to accurately remove them when the length got below a certain threshold. Probably a more sophisticated method could do better, but I don't think anyone has great results. Also, I have read recently that the new pseudo-mappers are much less negatively impacted by poly-A and adapter sequences, so this might be somewhat of a moot point (reads with full-length adapters may not map uniquely anyway).
-
This is probably a wet-lab problem.... Just tell the lab to size select your libraries better. What are they doing sequencing such short fragments? That short junk is probably mostly primer dimers anyway!!!
-
Weird, why do you have a 5' adapter on your Quantseq libraries? Are you doing the 3' mRNA-Seq FWD kit, or the REV kit? The FWD kit suggests only sequencing toward the poly-A end of the transcript, so your reads should only have poly-A and adapter on the 3' end correct? Or do you mean that you are trimming the reads on the 5' end for quality or something else?
from htstream.
from htstream.
Ahh, so trimming off the random primer for the Quantseq libs. I guess that sort of makes sense, but, the point is just to count transcripts, so it would be interesting to see if there are real impacts.
David and I chatted about this a bit. We like the idea of having a pattern-based 3' or 5' trimmer that could be used for adapters, primers, or whatever else. It will almost certainly be a different application from any of the current trimmers, and might not get implemented right away (due to other more pressing priorities), but would add a nice tool to the tool set.
from htstream.
from htstream.
@msettles - Just an FYI, they are called LIns (Long INserts) and SIns (Short INserts) now. :) (Biggies and smalls was rejected due to some minor issues with trademarks.)
LIns and SIns better ways to describe Illumina overlaps.
from htstream.
from htstream.
Well shouldn't it be overlaps instead of inserts? really as far as inserts go there are 3 classes, 'short' < length of read and contain adapter, 'medium' they are > length of reads but < length of 2x read, and 'long' > length of both reads (results in a pair)
from htstream.
addressed in #122
from htstream.
Related Issues (20)
- hts_SeqScreener enhancements for bigger references
- hts_Primer doesn't report fragments and basepairs_in
- Feature downgrade actually, remove -a option from SuperD
- -m (minLength) option removed from hts_QWindowTrim, but does not exist in hts_CutTrim HOT 3
- Flag use HOT 1
- Is "percentage-hits" calculated properly for SeqScreener? HOT 1
- SuperDeduper ignoring reads HOT 6
- hts_Primers doesn't seem to read multi-fasta files correctly
- hts_Primers - error message HOT 4
- Version incorrect and CMAKE_PREFIX_PATH not working HOT 5
- How to cite HTStream? :-) HOT 2
- hts_LengthFilter is missing from the documentation!
- citation? HOT 2
- Add to CutTrim, trim to length from 5' or 3'
- pointer error in hts_Stats HOT 1
- Compiling from source fails on Ubuntu 22.04.1 HOT 2
- Remaining adapter sequence
- Order of input files to hts_SeqScreener changes hits reported when R1/R2 lengths differ
- "no such file or directory" error HOT 2
- munmap_chunk(): invalid pointer error HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from htstream.