Coder Social home page Coder Social logo

Comments (2)

suhrig avatar suhrig commented on July 23, 2024

But what hell is this mean? Bkps will be merged only when distance between bkp2 and bkp1 are equal?

It means that breakpoints will only be merged if they are shifted by the same number of bases. For example, let's say we have two pairs of breakpoints:

breakpoint1: chr1:1000, breakpoint2: chr2:1000
breakpoint1: chr1:1002, breakpoint2: chr2:998

In this case, the breakpoints will be merged, because the two extra bases that are aligned at breakpoint1 of the second breakpoint pair are subtracted from breakpoint2. This is a frequent occurrence, because when breakpoints have sequence homology multiple alternative alignments are possible all of which have the same alignment score. For example, when fusion breakpoints have two bases of sequence homology, the aligner (STAR) can place the two bases on either end of the breakpoints and both will have the same alignment score. In practice, STAR will choose them at random, giving rise to 50:50 ratio of one or the other alignment.

In contrast, the following example will not be merged:

breakpoint1: chr1:1000, breakpoint2: chr2:1000
breakpoint1: chr1:1002, breakpoint2: chr2:1000

This is because they are not equally scoring alternative alignments. These alignments must arise from different sequences and are not considered to originate from the same event by Arriba, hence.

In the past, the code used to be written as you suggest - merging anything within a given distance - regardless of whether the breakpoints arise from alternative alignments or whether they are unrelated alignments. In my experience, this often led to breakpoints being merged which do not belong to each other, which in turn led to a higher false positive rate, because the read numbers were inflated. This is why I changed the merging procedure to be as stringent as it is now.

Admittedly, this will occasionally lead to breakpoints not being merged even though they should as in your case. If you have the time, can you kindly extract the reads overlapping your fusion breakpoints and send them to me? I can gladly take a look how STAR chose to align these reads and why they are not alternative alignments. My guess is there are sequencing errors right at the fusion junction.

is this change correctly?

I think your adaptation is more complex than it has to be. I haven't tested this, but it should suffice to only compare breakpoint1 to breakpoint1 and breakpoint2 to breakpoint2 - regardless of the fusion direction, i.e.:

abs((**fusion).breakpoint2 - (**previous_fusion).breakpoint2) <= max_distance && abs((**fusion).breakpoint1 - (**previous_fusion).breakpoint1) <= max_distance)

breakpoint1 always contains the smaller coordinate and breakpoint2 always contains the bigger coordinate.

from arriba.

karlestira avatar karlestira commented on July 23, 2024

OK, thanks, but my demander says fusions too close to each other always need to be merged(to be more precise as they said, "filtered"). They think "the code used to be" is right.

My sequencing data was captured using specific probes and has UMI, resulting in a very high dup rate. And the markdup program is specially designed to meet some requirements(such as making consensus sequence for PCR duplications, very important for mutation recognition). All these things may lead to some additional problems and performed differently from conventional RNA sequencing data.

Anyway, thank you for your answer, maybe I need to merge STAR-FUSION's and arriba's result.

I will close this issues.

from arriba.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.