Coder Social home page Coder Social logo

NNNNN reads about teloclip HOT 8 CLOSED

JWDebler avatar JWDebler commented on June 24, 2024
NNNNN reads

from teloclip.

Comments (8)

JWDebler avatar JWDebler commented on June 24, 2024 1

Hi Adam, looks like I grabbed the wrong bam file :-) I ran it again with 0x100 and 0x2308 and the results are almost identical. No 'NNNN' reads in either of them. It's only when I omit that step that the 'NNNN' reads creep in. False alarm :-)

from teloclip.

Adamtaranto avatar Adamtaranto commented on June 24, 2024

Hi Johannes,

That is curious. Could you please post example SAM records for one read that shows this behaviour and one that does not?

What happens if you filter secondary alignments?

from teloclip.

JWDebler avatar JWDebler commented on June 24, 2024

Hi Adam, I added the secondary alignment filtering step samtools view -h -F 0x2308 and that seems to have taken care of the problem :-) Cheers.

from teloclip.

Adamtaranto avatar Adamtaranto commented on June 24, 2024

Cool, that makes sense. When teloclip writes out the overhang sequence it is drawing on the read sequence stored in the SAM alignment record, this is only present for primary alignments.

The choice to discard secondary alignments was based on the assumption that sub-telomeric sequences are generally repetitive and therefore long-reads (containing telomeric repeats in the soft-clipped overhang) may also have secondary alignments to other chromosome/contig ends. It may be worthwhile checking where the primary alignments are for some of those overhanging secondary alignments - do those reads show up at the end of another contig?

I've corrected the hex code for filtering non-primary alignments, it should be 0x100. Previous code was also to catch PE reads where the mate was not mapped - doesn't apply to long reads.

from teloclip.

JWDebler avatar JWDebler commented on June 24, 2024

Hi Adam,

I just ran it with the 0x100 hex code, and the 'NNNNN' reads are back. 0x2308 removes them.

from teloclip.

Adamtaranto avatar Adamtaranto commented on June 24, 2024

Can you please post a few example SAM records so I can figure out what's going on and update the docs accordingly?

from teloclip.

JWDebler avatar JWDebler commented on June 24, 2024

Sure, if you can tell me how to do that :-) I just run the command above and get the bam file.

from teloclip.

Adamtaranto avatar Adamtaranto commented on June 24, 2024

For a contig where you know that there are overhanging reads with telomeric repeats on at least one end you can extract those alignments like this:

Align reads to reference + sort:
minimap2 -ax map-ont P9424_final.fasta ../P9424.correctedReads.fasta.gz | samtools sort > P9424_sorted.bam

Index on position:
samtools index P9424_sorted.bam

Filter for reads only on "Chr1" from position 1 to some number slightly longer than your longest read:
samtools view P9424_sorted.bam "Chr1:1-500000" > Chr1_leftend.bam

If you can post that final file + a fasta file with just the target contig (i.e. Chr1 in the example) I'll take a look. Can also email me if you don't want to post data.

from teloclip.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.