Hello. I run arriba in my sample and found it gives me results like

Error in merging adjacent breakpoints? about arriba HOT 2 CLOSED

karlestira commented on July 23, 2024

Error in merging adjacent breakpoints?

from arriba.

Comments (2)

suhrig commented on July 23, 2024

But what hell is this mean? Bkps will be merged only when distance between bkp2 and bkp1 are equal?

It means that breakpoints will only be merged if they are shifted by the same number of bases. For example, let's say we have two pairs of breakpoints:

breakpoint1: chr1:1000, breakpoint2: chr2:1000
breakpoint1: chr1:1002, breakpoint2: chr2:998

In this case, the breakpoints will be merged, because the two extra bases that are aligned at breakpoint1 of the second breakpoint pair are subtracted from breakpoint2. This is a frequent occurrence, because when breakpoints have sequence homology multiple alternative alignments are possible all of which have the same alignment score. For example, when fusion breakpoints have two bases of sequence homology, the aligner (STAR) can place the two bases on either end of the breakpoints and both will have the same alignment score. In practice, STAR will choose them at random, giving rise to 50:50 ratio of one or the other alignment.

In contrast, the following example will not be merged:

breakpoint1: chr1:1000, breakpoint2: chr2:1000
breakpoint1: chr1:1002, breakpoint2: chr2:1000

This is because they are not equally scoring alternative alignments. These alignments must arise from different sequences and are not considered to originate from the same event by Arriba, hence.

In the past, the code used to be written as you suggest - merging anything within a given distance - regardless of whether the breakpoints arise from alternative alignments or whether they are unrelated alignments. In my experience, this often led to breakpoints being merged which do not belong to each other, which in turn led to a higher false positive rate, because the read numbers were inflated. This is why I changed the merging procedure to be as stringent as it is now.

Admittedly, this will occasionally lead to breakpoints not being merged even though they should as in your case. If you have the time, can you kindly extract the reads overlapping your fusion breakpoints and send them to me? I can gladly take a look how STAR chose to align these reads and why they are not alternative alignments. My guess is there are sequencing errors right at the fusion junction.

is this change correctly?

I think your adaptation is more complex than it has to be. I haven't tested this, but it should suffice to only compare breakpoint1 to breakpoint1 and breakpoint2 to breakpoint2 - regardless of the fusion direction, i.e.:

abs((**fusion).breakpoint2 - (**previous_fusion).breakpoint2) <= max_distance && abs((**fusion).breakpoint1 - (**previous_fusion).breakpoint1) <= max_distance)

breakpoint1 always contains the smaller coordinate and breakpoint2 always contains the bigger coordinate.

from arriba.

karlestira commented on July 23, 2024

OK, thanks, but my demander says fusions too close to each other always need to be merged(to be more precise as they said, "filtered"). They think "the code used to be" is right.

My sequencing data was captured using specific probes and has UMI, resulting in a very high dup rate. And the markdup program is specially designed to meet some requirements(such as making consensus sequence for PCR duplications, very important for mutation recognition). All these things may lead to some additional problems and performed differently from conventional RNA sequencing data.

Anyway, thank you for your answer, maybe I need to merge STAR-FUSION's and arriba's result.

I will close this issues.

from arriba.

Error in merging adjacent breakpoints? about arriba HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent