Comments (8)
Hi Adam, looks like I grabbed the wrong bam file :-) I ran it again with 0x100 and 0x2308 and the results are almost identical. No 'NNNN' reads in either of them. It's only when I omit that step that the 'NNNN' reads creep in. False alarm :-)
from teloclip.
Hi Johannes,
That is curious. Could you please post example SAM records for one read that shows this behaviour and one that does not?
What happens if you filter secondary alignments?
from teloclip.
Hi Adam, I added the secondary alignment filtering step samtools view -h -F 0x2308
and that seems to have taken care of the problem :-) Cheers.
from teloclip.
Cool, that makes sense. When teloclip writes out the overhang sequence it is drawing on the read sequence stored in the SAM alignment record, this is only present for primary alignments.
The choice to discard secondary alignments was based on the assumption that sub-telomeric sequences are generally repetitive and therefore long-reads (containing telomeric repeats in the soft-clipped overhang) may also have secondary alignments to other chromosome/contig ends. It may be worthwhile checking where the primary alignments are for some of those overhanging secondary alignments - do those reads show up at the end of another contig?
I've corrected the hex code for filtering non-primary alignments, it should be 0x100. Previous code was also to catch PE reads where the mate was not mapped - doesn't apply to long reads.
from teloclip.
Hi Adam,
I just ran it with the 0x100
hex code, and the 'NNNNN' reads are back. 0x2308
removes them.
from teloclip.
Can you please post a few example SAM records so I can figure out what's going on and update the docs accordingly?
from teloclip.
Sure, if you can tell me how to do that :-) I just run the command above and get the bam file.
from teloclip.
For a contig where you know that there are overhanging reads with telomeric repeats on at least one end you can extract those alignments like this:
Align reads to reference + sort:
minimap2 -ax map-ont P9424_final.fasta ../P9424.correctedReads.fasta.gz | samtools sort > P9424_sorted.bam
Index on position:
samtools index P9424_sorted.bam
Filter for reads only on "Chr1" from position 1 to some number slightly longer than your longest read:
samtools view P9424_sorted.bam "Chr1:1-500000" > Chr1_leftend.bam
If you can post that final file + a fasta file with just the target contig (i.e. Chr1 in the example) I'll take a look. Can also email me if you don't want to post data.
from teloclip.
Related Issues (20)
- Calculate alignment end position from CIGAR HOT 1
- Verify: Correct read slice coords for negative strand alignments HOT 2
- Summary report option
- Interactive Contig extension HOT 1
- functionality to extract reads at NNN gaps in scaffolds? HOT 2
- Find existing telo repeats in contigs HOT 1
- Automatically extend contigs HOT 1
- error "invalid mode: 'rU' " in teloclip (v. bioconda)
- Refresh codebase
- Automate testing HOT 1
- Package Distribution HOT 1
- Feature: Align and Extend HOT 5
- No 'rU' option in Python 3.12 HOT 4
- Improve readability
- Confusing about input and output file HOT 17
- Feature: Fuzzy motif search
- Add metadata to extracted overhang reads HOT 1
- Missing isMotifInClip()
- Could you explain in detail how to extend reads? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from teloclip.