Comments (2)
But what hell is this mean? Bkps will be merged only when distance between bkp2 and bkp1 are equal?
It means that breakpoints will only be merged if they are shifted by the same number of bases. For example, let's say we have two pairs of breakpoints:
breakpoint1: chr1:1000, breakpoint2: chr2:1000
breakpoint1: chr1:1002, breakpoint2: chr2:998
In this case, the breakpoints will be merged, because the two extra bases that are aligned at breakpoint1 of the second breakpoint pair are subtracted from breakpoint2. This is a frequent occurrence, because when breakpoints have sequence homology multiple alternative alignments are possible all of which have the same alignment score. For example, when fusion breakpoints have two bases of sequence homology, the aligner (STAR) can place the two bases on either end of the breakpoints and both will have the same alignment score. In practice, STAR will choose them at random, giving rise to 50:50 ratio of one or the other alignment.
In contrast, the following example will not be merged:
breakpoint1: chr1:1000, breakpoint2: chr2:1000
breakpoint1: chr1:1002, breakpoint2: chr2:1000
This is because they are not equally scoring alternative alignments. These alignments must arise from different sequences and are not considered to originate from the same event by Arriba, hence.
In the past, the code used to be written as you suggest - merging anything within a given distance - regardless of whether the breakpoints arise from alternative alignments or whether they are unrelated alignments. In my experience, this often led to breakpoints being merged which do not belong to each other, which in turn led to a higher false positive rate, because the read numbers were inflated. This is why I changed the merging procedure to be as stringent as it is now.
Admittedly, this will occasionally lead to breakpoints not being merged even though they should as in your case. If you have the time, can you kindly extract the reads overlapping your fusion breakpoints and send them to me? I can gladly take a look how STAR chose to align these reads and why they are not alternative alignments. My guess is there are sequencing errors right at the fusion junction.
is this change correctly?
I think your adaptation is more complex than it has to be. I haven't tested this, but it should suffice to only compare breakpoint1 to breakpoint1 and breakpoint2 to breakpoint2 - regardless of the fusion direction, i.e.:
abs((**fusion).breakpoint2 - (**previous_fusion).breakpoint2) <= max_distance && abs((**fusion).breakpoint1 - (**previous_fusion).breakpoint1) <= max_distance)
breakpoint1 always contains the smaller coordinate and breakpoint2 always contains the bigger coordinate.
from arriba.
OK, thanks, but my demander says fusions too close to each other always need to be merged(to be more precise as they said, "filtered"). They think "the code used to be" is right.
My sequencing data was captured using specific probes and has UMI, resulting in a very high dup rate. And the markdup program is specially designed to meet some requirements(such as making consensus sequence for PCR duplications, very important for mutation recognition). All these things may lead to some additional problems and performed differently from conventional RNA sequencing data.
Anyway, thank you for your answer, maybe I need to merge STAR-FUSION's and arriba's result.
I will close this issues.
from arriba.
Related Issues (20)
- Arriba output filtering HOT 5
- Reference Genome HOT 2
- Using a genome not supported HOT 3
- zsh: exec format error: ./arriba HOT 5
- Is it possible to have draw_fusions.R output the exon number in text? HOT 4
- Suppressed Sequences included in RefSeq_viral_genomes_v2.4.0.fa.gz HOT 1
- Adding more tools to plot. HOT 1
- Known canonical fusion reported with zero reads - need help to understand the output HOT 1
- Finding fusions and counting supporting reads zsh: killed HOT 3
- Finding fusions and counting supporting reads zsh: killed HOT 17
- Single-End vs Paired-End behaviour for split1/2, discordant and coverage counts HOT 14
- Problem detection exom skipped HOT 2
- Issue with Dragen BAM encountering std::out_of_range error in version [v2.4.0] HOT 8
- Error occured while I running draw_fusion.R HOT 2
- Error while running draw_fusion.R HOT 8
- Issues with Missing Exon Coordinates Using "draw_fusions.R" HOT 1
- Criteria of selecting specific transcripts HOT 5
- Identifying gene fusions in plant genomes. HOT 3
- The interpretation of the contents of the result file fusions.tsv. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arriba.