Coder Social home page Coder Social logo

split reads about arriba HOT 2 CLOSED

suhrig avatar suhrig commented on August 26, 2024
split reads

from arriba.

Comments (2)

suhrig avatar suhrig commented on August 26, 2024

Hi Krutika,

Would you agree that the split reads1+split reads2 == spanning reads that can then be used to identify how many spanning reads support each call? Specifically for read based filtering purposes.

Yes. The sum of split_reads1 and split_reads2 equates to the number of reads which overlap the fusion breakpoints, i.e. which are not discordant mates. split_reads1 counts the number of reads with the anchor read in gene1, and split_reads2 those with the anchor read in gene2. Arriba distinguishes between the two, because doing so conveys information about the quality of the alignment. If all the anchors are in the same gene, the gene lacking anchor reads has no high quality alignments, i.e., reads which align completely.

For Discordant reads would at-least 1 read support be strictly used in arriba to call a true fusion call? Some high/medium calls have 0 discordant/junction reads.

No. To get a high-confidence call, at least two of the following columns must be greater than zero: split_reads1, split_reads2, discordant_mates. If at least two of these are non-zero, both genes have high quality alignments.

How you you recommend we annotate arriba calls with databases such as ChimericDB etc?

Download the Excel sheets from the ChimerDB website and convert them to a tab-separated file (.tsv): http://203.255.191.229:8080/chimerdbv31/mdownload.cdb

You can then use the following awk snippet to add a column counting the number of fusion pair hits in the database. The snippet takes two arguments: the ChimerDB file in TSV format and the fusion predictions from Arriba to be annotated. Make sure the ChimerDB file is given first and has ChimerDB in the name.

awk -F '\t' '
	FNR==1{ for (i=1;i<=NF;i++) col[$i]=i }
	FILENAME~/ChimerDB/{ hits[$col["Fusion_pair"]]++ }
	FILENAME!~/ChimerDB/{ print $0"\t"(FNR==1 ? "chimerdb_pair_hits" : hits[$col["#gene1"]"_"$col["gene2"]]+hits[$col["gene2"]"_"$col["#gene1"]]) }
' ChimerDB3.0_ChimerSeq.tsv fusions.tsv

Alternatively, here is an awk snippet to add two columns, which count the number of hits for gene1 and gene2 separately:

awk -F '\t' '
	FNR==1{ for (i=1;i<=NF;i++) col[$i]=i }
	FILENAME~/ChimerDB/{ hits[$col["H_gene"]]++; hits[$col["T_gene"]]++ }
	FILENAME!~/ChimerDB/{ print $0"\t"(FNR==1 ? "chimerdb_hits1\tchimerdb_hits2" : (hits[$col["#gene1"]]+0)"\t"(hits[$col["gene2"]]+0)) }
' ChimerDB3.0_ChimerSeq.tsv fusions.tsv

Let me know, if you have trouble running these snippets or if you want more detailed annotation.

Regards,
Sebastian

from arriba.

kgaonkar6 avatar kgaonkar6 commented on August 26, 2024

Thank you for the information and link to chimerDB with the helpful snippets. I was able filter arriba with the suggestions above and add the chimerDB information to the fusion. Thank you!

from arriba.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.