Coder Social home page Coder Social logo

genefuse's Introduction

install with conda

GeneFuse

A tool to detect and visualize target gene fusions by scanning FASTQ files directly. This tool accepts FASTQ files and reference genome as input, and outputs detected fusion results in TEXT, JSON and HTML formats.

Take a quick glance of the informative report

Get genefuse program

install with Bioconda

install with conda

conda install -c bioconda genefuse

download binary

This binary is only for Linux systems, http://opengene.org/GeneFuse/genefuse

# this binary was compiled on CentOS, and tested on CentOS/Ubuntu
wget http://opengene.org/GeneFuse/genefuse
chmod a+x ./genefuse

or compile from source

# get source (you can also use browser to download from master or releases)
git clone https://github.com/OpenGene/genefuse.git

# build
cd genefuse
make

# Install
sudo make install

Usage

You should provide following arguments to run genefuse

  • the reference genome fasta file, specified by -r or --ref=
  • the fusion setting file, specified by -f or --fusion=
  • the fastq file(s), specified by -1 or --read1= for single-end data. If dealing with pair-end data, specify the read2 file by -2 or --read2=
  • use -h or --html= to specify the file name of HTML report
  • use -j or --json= to specify the file name of JSON report
  • the plain text result is directly printed to STDOUT, you can pipe it to a file using a >

Example

genefuse -r hg19.fasta -f genes/druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h report.html > result

Reference genome

The reference genome should be a single whole FASTA file containg all chromosome data. This file shouldn't be compressed. For human data, typicall hg19/GRch37 or hg38/GRch38 assembly is used, which can be downloaded from following sites:

Fusion file

The fusion file is a list of coordinated target genes together with their exons. A sample is:

>EML4_ENST00000318522.5,chr2:42396490-42559688
1,42396490,42396776
2,42472645,42472827
3,42483641,42483770
4,42488261,42488434
5,42490318,42490446
...

>ALK_ENST00000389048.3,chr2:29415640-30144432
1,30142859,30144432
2,29940444,29940563
3,29917716,29917880
4,29754781,29754982
5,29606598,29606725
...

The coordination system should be consistent with the reference genome.

Fusion files provided in this package

Four fusion files are provided with genefuse:

  1. genes/druggable.hg19.csv: all druggable fusion genes based on hg19/GRch37 reference assembly.
  2. genes/druggable.hg38.csv: all druggable fusion genes based on hg38/GRch38 reference assembly.
  3. genes/cancer.hg19.csv: all COSMIC curated fusion genes (http://cancer.sanger.ac.uk/cosmic/fusion) based on hg19/GRch37 reference assembly.
  4. genes/cancer.hg38.csv: all COSMIC curated fusion genes (http://cancer.sanger.ac.uk/cosmic/fusion) based on hg38/GRch38 reference assembly.

Notes:

  • genefuse runs much faster with druggable genes than cancer genes, since druggable genes are only a small subset of cancer genes. Use this one if you only care about the fusion related personalized medicine for cancers.
  • The cancer genes should be enough for most cancer related studies, since all COSMIC curated fusion genes are included.
  • If you want to create a custom gene list, please follow the instructions given on next section.

Create a fusion file based on hg19 or hg38

If you'd like to create a custom fusion file, you can use scripts/make_fusion_genes.py
As the script uses refFlat.txt file to determine genomic coordinates of exons, you need to download a refFlat.txt file from UCSC Genome Browser in advance. Of course, the choice of using either hg19 or hg38 is up to you.

Please make sure unzip the file to txt format before you continue

As for the input gene list file, all genes should be listed in separate lines. By default, the longest transcript will be used. However, you can specify a different transcript by adding the transcript ID to the end of a gene. The gene and its transcript should be separated by a tab or a space. Please note that each gene should be the HGNC official gene symbol, and each transcript should be NCBI RefSeq transcript ID.

An example of gene list file:

BRCA2	NM_000059
FAM155A
IRS2

When both input gene list file (gene_list.txt) and refFlat.txt file are prepared, you can use following command to generate a user-defined fusion file (fusion.csv):

python3 scripts/make_fusion_genes.py gene_list.txt -r /path/to/refflat -o fusion.csv

HTML report

GeneFuse can generate very informative and interactive HTML pages to visualize the fusions with following information:

  • the fusion genes, along with their transcripts.
  • the inferred break point with reference genome coordinations.
  • the inferred fusion protein, with all exons and the transcription direction.
  • the supporting reads, with all bases colorized according to their quality scores.
  • the number of supporting reads, and how many of them are unique (the rest may be duplications)

A HTML report example

image
See the HTML page of this picture: http://opengene.org/GeneFuse/report.html

All options

options:
  -1, --read1       read1 file name (string)
  -2, --read2       read2 file name (string [=])
  -f, --fusion      fusion file name, in CSV format (string)
  -r, --ref         reference fasta file name (string)
  -u, --unique      specify the least supporting read number is required to report a fusion, default is 2 (int [=2])
  -d, --deletion    specify the least deletion length of a intra-gene deletion to report, default is 50 (int [=50])
  -h, --html        file name to store HTML report, default is genefuse.html (string [=genefuse.html])
  -j, --json        file name to store JSON report, default is genefuse.json (string [=genefuse.json])
  -t, --thread      worker thread number, default is 4 (int [=4])
  -?, --help        print this message

Cite GeneFuse

If you used GeneFuse in you work, you can cite it as:

Shifu Chen, Ming Liu, Tanxiao Huang, Wenting Liao, Mingyan Xu and Jia Gu. GeneFuse: detection and visualization of target gene fusions from DNA sequencing data. International Journal of Biological Sciences, 2018; 14(8): 843-848. doi: 10.7150/ijbs.24626

genefuse's People

Contributors

sfchen avatar zkai0801 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

genefuse's Issues

fusion only option

I got a lot of deletions when I ran our real data using illumina targeted RNA fusion panel.
This panel is sequenced by 2x75 bp. Would you add an option for fusion only?

GeneFuse ignores fusions in TCF3

Hello,
Thanks for developing this tool! I was testing it on some hematological samples with TCF3 gene fusions (confirmed by FISH), and it seems like GeneFuse is not picking them up. I have checked that the gene is correctly reflected in the reference, and the reads connecting TCF3 and its partners are uniquely mapped to its location in the corresponding BAM files. I have two samples with different fusions, and in both of them, the fusion was not predicted even as untranscribed. Can you suggest what else can I try to fix this problem?
Thanks in advance!
Cheers,
Nadezda

make: *** No rule to make target 'install'. Stop.

Need help to compile from source code.
Got "make: *** No rule to make target 'install'. Stop."
when sudo make install, in ubuntu 16.04 LTS

Running binary without any error but found no fusions when run with dataset for testing.
Is there any library must have before installation, Thanks.

different gene set gives different results for same fusion

Hi thank-you for a great tool. I ran genefuse on my WES data from a sample with a known ETV6--NTRK3 fusion. I ran with druggable.hg38.csv and cancer.hg38.csv which both have ETV6 and NTRK3 gene, the exons and coordinates are identical for these 2 genes in the 2 .csv files. But I only detect the ETV6--NTRK3 fusion if I use the smaller druggable.hg38.csv gene set, and not if I use cancer.hg38.csv. I was wondering if having more genes in cancer.hg38.csv potentially means reads with better alignment to other genes are aligning to those genes rather than ETV6 or NTRK3?

Support cross pair

Read1

@name1
AGGGTTACCTGAGGATCGAATGAATTGAAATGTGTAAATTGCCGAGCACGTAGTAACCATGCAACAAGTGTTAGCTCCTATTATCCTGTCCCTTTGAGGGATGGCACCATATGGGGACACAGTGTGTGCTGCCATCTCCCTTCTACCGGC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Read2

@name1
TGGGCCCTCCCTCCCAAGCACTGGAGGAGGTACTCGTTCTGTGGGCCGGGGCCCCTCCCTCCTGAGCACTGGAGGAGGCACTTGTTCTGTGGGCCACGGCCCCTCCCTCCCCAGCACTGGAGGAGGCACTGGTTGTGTGGGCCCTGGCTC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Can not get the fusions with the testdata

Can not detect the fusions with the test data,is anyone can help me? all the testdata is from the git and
http://opengene.org/dataset.html

Here is my shell command:
./genefuse -r Homo_sapiens_assembly19.fasta -f genes/cancer.hg19.csv -1 ./testdata/genefuse.R1.fq.gz -2 ./testdata/genefuse.R2.fq.gz -h report.html > result
12:17:45 start with 4 threads
12:18:1 mapper indexing done
12:18:6 sequence number before filtering: 0
12:18:6 removeByComplexity: 0
12:18:6 removeByDistance: 0
12:18:6 removeIndels: 0
12:18:53 matcher indexing done
12:18:53 removeAlignables: 0
12:18:53 found 0 fusions
12:18:53 done

How to get VAF info from genefuse results?

Hi there:
#Fusion: EML4_ENST00000318522.5:intron:18|-2:42544153___ALK_ENST00000389048.3:intron:19|+2:29446769 (total: 155, unique:13)
155 refers to the number of supporting reads and 13 is the unique reads. Can we calculate VAF by 13/155*100 = 8.4% ?
Or where else can we get VAF info from the result file? Thanks!

How to use GeneFuse to detect fusion for nanopore sequencing fastq?

Hi sfchen,
I have installed genefuse and worked for the test data successfully. However, it did not work for nanopore fastq. Could you help me? Thank you!
My command line as follow:
genefuse -r ~/database/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa -f cosmic-fusion-list/cancer.GRCh38.csv -t 8 -1 ***.fastq -h ***.fusion.html > ***.log

gen_fusion_file.jl

Hi,

the same issue with gen_fusion_file.jl for csv file preparation while using Julia 0.6 version.

ERROR: LoadError: UndefVarError: readstring not defined in detect at /home/ubuntu/julia/gen_fusion_file.jl:33 in main at /home/ubuntu/julia/gen_fusion_file.jl:22 while loading /home/ubuntu/julia/gen_fusion_file.jl, in expression starting on line 119

Could you please generate result file .

My genes are attached.
genes (2).txt

Many thanks for your responce

What is the minimum required read length?

I conducted a simulation. When my read length is 89bp, it outputs expected fusions. However when I use read length of 85bp. There are no results.

The main change is the LOG information: "sequence number before filtering: XXX". Shorter read length result in very few (or even zero) sequence number before filtering.

How do I adjust the source code to satisfy more shorter reads?

error when creating fusion.csv file

Hi,

I am trying to create the fusion.csv file using the script that you provide gen_fusion_file.jl but I get the error:

loading gencode grch38
ERROR: LoadError: KeyError: key "localfile" not found

And this is what I run:
julia scripts/gen_fusion_file.jl -r GRch38 -g genes_for_csv.txt -f fusion.csv

Thanks a lot,

Erica

tmprss-erg

>TMPRSS2_ENST00000332149.9,chr21:41464551-41531116
1,41508081,41508159
2,41498119,41498189
3,41494356,41494578
4,41489507,41489593
5,41488394,41488513
6,41480476,41480602
7,41479172,41479282
8,41476577,41476620
9,41473325,41473496
10,41471806,41471981
11,41470648,41470743
12,41468396,41468538
13,41467734,41467886
14,41464551,41466153
>ERG_ENST00000288319.11,chr21:38380027-38661780
1,38498363,38498483
2,38445404,38445621
3,38423410,38423561
4,38403506,38403709
5,38402557,38402637
6,38400574,38400645
7,38392376,38392444
8,38391659,38391715
9,38390995,38391042
10,38380027,38383923

No fusion found using tests data

Hi Shifu,
no fusion found using tests data
./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 R1.fq.gz -2 R2.fq.gz -h r1r2n.html
15:51:11 start with 4 threads
15:51:50 mapper indexing done
15:52:20 sequence number before filtering: 0
15:52:20 removeByComplexity: 0
15:52:20 removeByDistance: 0
15:52:20 removeIndels: 0
15:54:5 matcher indexing done
15:54:5 removeAlignables: 0
15:54:5 found 0 fusions

./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h genefuser1r2n.html
15:55:45 start with 4 threads
15:56:25 mapper indexing done
15:56:36 sequence number before filtering: 0
15:56:36 removeByComplexity: 0
15:56:36 removeByDistance: 0
15:56:36 removeIndels: 0
15:58:25 matcher indexing done
15:58:25 removeAlignables: 0
15:58:25 found 0 fusions

Dataset was downloaded from: http://opengene.org/dataset.html

Thanks.

std::bad_alloc

Hi:
I got a problem:
19:31:16 start with 12 threads
19:34:24 mapper indexing done
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

I am wondering what caused this problem?

No fusion found in tests data

amar@amar-Virtual-Machine:~/GeneFuse$ ./genefuse -1 testdata/R1.fq -2 testdata/R2.fq -r Homo_sapiens_assembly19.fasta -f genes/cancer.hg19.csv
22:53:5 start with 4 threads
22:53:9 mapper indexing done
22:53:9 sequence number before filtering: 0
22:53:9 removeByComplexity: 0
22:53:9 removeByDistance: 0
22:53:9 removeIndels: 0
22:53:25 matcher indexing done
22:53:25 removeAlignables: 0
22:53:25 found 0 fusions

./genefuse -1 testdata/R1.fq -2 testdata/R2.fq -r Homo_sapiens_assembly19.fasta -f genes/cancer.hg19.csv

genefuse v0.3.0, time used: 20 seconds

22:53:25 done
No fusion gene found in test data.Kindly reply....

GeneFuse on RNA-seq

Dear sir,
I tried to apply genefuse to our RNA-seq with known fusion gene.
Somehow we couldn't find any fusion gene.
command I used:
genefuse -r hg19.fa -f druggable.hg19.csv -1 A2076_FFPE_RNA_S1_L001_R1_001.fastq.gz" -2 A2076_FFPE_RNA_S1_L001_R2_001.fastq.gz" -h A2076_FFPE_RNA.html > A2076_FFPE_RNA.result
14:39:27 start with 4 threads
14:39:51 mapper indexing done
14:39:55 sequence number before filtering: 45
14:39:55 removeByComplexity: 0
14:39:55 removeByDistance: 22
14:39:55 removeIndels: 2
14:41:14 matcher indexing done
14:41:14 removeAlignables: 3
14:41:14 found 0 fusions
14:41:16 done
What your suggestion to apply genefuse to RNA-seq ?

JSON output corrupted due to a not escaped double-quotes character

I'm using genefuse JSON output to handle fusions information in order to prioritize them, so I'm using a Python script to read the JSON file. Sometimes the JSON output is corrupted because in the "qual" value of each read it could be used the double-quotes symbol. For example:
"reads":[ { "break":33, "strand":"reversed", "seq":"TTTTTTATAGGATTTGGGAAGGTAATGGAAAATTCCAGTCAAAGGGGGTTGTTCTCTGGTGGGCAGGGGCGGGGGTCACAAGGTGCTCAGTGGGGGAG", "qual":"9.B>?<5:=B94ACABA;BA@79B>B?6@A?@BB"@=<@:A=9@ABB>BB>B>?B.@A6BB?@>?@@B?7AAB>=<B=AA@A7B@:?<A9AA@@?=??" },

When genefuse writes the JSON file the double-quotes is not escaped, so trying to read the JSON file cause an error.
At the moment I'm avoiding the problem using a regex pattern to manipulate the JSON file and make it correct. I hope there's a way to update the construction of the JSON output in order to add the escape slash before double-quotes in "qual" value.

可否用gtf文件产生融合基因列表的csv文件?

如题,大概看了一下制作融合基因列表的jl脚本和文件。脚本仅仅支持hg19或hg38。我理解OpenGene的julia包里这俩人类基因组的外显子信息应该是有来源的,例如gtf文件。那么,我考虑可否通过gtf文件直接来制造候选的融合基因列表呢?

the gen_fusion_file.jl script does not work because OpenGene can not be installed

Hi !

I would like to run the julia script on a list of genes to get the file fusion.csv but I have not managed to install OpenGene, here is the error returned


julia> Pkg.add("OpenGene")
   Cloning default registries into /root/.julia/registries
   Cloning registry General from "https://github.com/JuliaRegistries/General.git"
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
ERROR: Unsatisfiable requirements detected for package OpenGene [63bcc5cb]:
 OpenGene [63bcc5cb] log:
 ├─possible versions are: 0.1.0-0.1.11 or uninstalled
 ├─restricted to versions * by an explicit requirement, leaving only versions 0.1.0-0.1.11
 └─restricted by julia compatibility requirements to versions: uninstalled — no versions left
Stacktrace:
 [1] #propagate_constraints!#61(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1005
 [2] propagate_constraints! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:946 [inlined]
 [3] #simplify_graph!#121(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1460
 [4] simplify_graph! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1460 [inlined]
 [5] macro expansion at ./logging.jl:319 [inlined]
 [6] resolve_versions!(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:338
 [7] #add_or_develop#58(::Array{Base.UUID,1}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:1164
 [8] #add_or_develop at ./none:0 [inlined]
 [9] #add_or_develop#13(::Symbol, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:64
 [10] #add_or_develop at ./none:0 [inlined]
 [11] #add_or_develop#12 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:29 [inlined]
 [12] #add_or_develop at ./none:0 [inlined]
 [13] #add_or_develop#11 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:28 [inlined]
 [14] #add_or_develop at ./none:0 [inlined]
 [15] #add_or_develop#10 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:27 [inlined]
 [16] #add_or_develop at ./none:0 [inlined]
 [17] #add#18 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:69 [inlined]
 [18] add(::String) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:69 
 [19] top-level scope at none:0

here my geneList (iI working with hg19), I would be happy if you can help me ()

CAMTA1
CCNB3
CIC
EPC1
EWSR1
FOXO1
FUS
GLI1
HMGA2
JAZF1
MEAF6
MKL2
NCOA2
NTRK3
PDGFB
PLAG1
ROS1
SS18
STAT6
TAF15
TCF12
TFE3
TFG
USP6
YWHAE

Thank you

while running Stuck

hi,
Sometimes it gets stuck and does not run. The final state is as follows. and the task is suspended. Why is it? Please help me, thanks!
23:11:23 start with 4 threads
23:11:27 mapper indexing done

the data size of raw data effected fusion result

Hi,I cut off my raw data fastq to 300M(1000X),and use gene fuse to detect. The result was negative, no fusion detected. Using genefuse to detect 2G of my raw data , genefuse gets a lot of fusion results. Maybe the size of data effected genefuse.

gen_fusion_file.jl is out of date

Hi,

I notice that the scripts/gen_fusion_file.jl is out of date and can't be used anymore.
So I wrote a simple python script to replace its function, and I would like to share it with others. However any pushes to this repo is banned, would you mind to add this script manually?

"""
Generate a user-defined fusion file 

Input file should be a text file containing 
1. gene name and 
2. transcript name (optional)

For example:
gene1   transcript1
gene2
gene3   transcript3
"""
__author__ = "Kai"
__date__ = "20/05/2020"

import argparse


def read_genelist(inputf):
    with open(inputf, "r") as fh:
        for line in fh:
            yield line.rstrip("\n").split()


def make_fusion_gene(gene, fw, refflat):
    # no transcript specified --> use the longest transcript
    if len(gene) == 1:
        transcripts = {}
        with open(refflat, "r") as fh:
            for line in fh:
                if gene[0] not in line:
                    continue
                _, transcript, chrom, _, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
                transcripts[transcript] = (chrom, start, end, exonstart, exonend)
        transcript = get_longest_transcript(transcripts.keys(), refflat)
        chrom, start, end, exonstart, exonend  = transcripts[transcript]

    # use user-specified transcript
    elif len(gene) == 2:
        with open(refflat, "r") as fh:
            for line in fh:
                if gene[1] not in line:
                    continue
                _, transcript, chrom, _, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
                break
    
    # write to a file
    header = f">{gene[0]}_{transcript},{chrom}:{start}-{end}\n"
    fw.write(header)
    exons = list(zip(exonstart.split(","), exonend.split(",")))[:-1]
    for index, each_exon in enumerate(exons, start=1):
        fw.write(f'{index},{each_exon[0]},{each_exon[1]}\n')
    fw.write("\n")


def get_longest_transcript(transcripts, refflat):
    longest_length = 0
    longest_transcript = ""
    with open(refflat, "r") as fh:
        for line in fh:
            line = line.strip().split()
            if line[1] in transcripts:
                length = int(line[5]) - int(line[4])
                if length > longest_length:
                    longest_length = length
                    longest_transcript = line[1]
    return longest_transcript


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("input", help="Input filename")
    parser.add_argument("-r", "--refflat", required=True, help="Path to the refFlat.txt file, need to be downloaded from UCSC in advance")
    parser.add_argument("-o", "--output", required=True, help="The output filename")
    args = parser.parse_args()
    
    with open(args.output, "w") as fw:
        for gene in read_genelist(args.input):
            make_fusion_gene(gene, fw, args.refflat)

Fusion-making .py script does not take into account gene orientation

Thanks for creating this tool! I have noticed that the scripts/make_fusion_genes.py reports wrong order of exons for genes in '-' orientation, which results in considering some fusions as untranscribed and showing wrong fusion structure.

E.g. NTRK3 in cancer.hg38.csv:

NTRK3_ENST00000394480.6,chr15:87859751-88256768
1,88256644,88256747
2,88256284,88256486
3,88255906,88256168
4,88184225,88184299
...

Versus NTRK3 from the fusions.csv file generated by the script:

NTRK3_NM_002530,chr15:87859750-88256739
1,87859750,87877120
2,87880269,87880428
3,87929190,87929434
4,87933011,87933184
...

Here is a quick fix I introduced to make_fusion_genes.py:

Line 33:
_, transcript, chrom, strand, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
Line 44:
_, transcript, chrom, strand, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
And insert these after line 50:
if strand == '-':
exons = exons[::-1]

This returns the exons in the right order.

IGH-BCL2 is not detected but bam is supported

I have a sample with BCL2-IGH verified by FISH. There are lots of reads to support viewed by IGV. the pictuers is as follows. I am running with default parameters,and BCL2,IGH is included in csv file

BCL2_NM_000633,chr18:60790579-60987361
1,60986406,60986613
2,60985314,60986185
3,60790578,60795992

IGH,chr14:106052774-107288051

I would like to know whether or not relevant parameters need to be adjusted. Thanks a lot

Looking for "Encompassing Reads" in the output.

I have a question. Does GeneFuse report "Encompassing Reads" in the output? Thank you.

Note:
Encompassing reads are pairs of reads with each in the pair aligned to a different gene, thereby surrounding the fusion junctions or genomic breakpoints, and spanning reads are reads partially aligned to both genes corresponding to a fusion junction.

how to see the version of genefuse?

Hi, I got the genefuse by git clone on the command line, and make, finally, I got the binary genefuse, but I can't find the version of it. How to check which version of the software, thanks!

Multiple runs show different results.

Multiple runs show different unique read count.

vec<Match> is sorted differently a bit per run.

Some Matchs have the same read_break and seq.length().

This was found with hg19.fa, druggable.hg18.csv, test genefuse fastq files.

primary transcripts used in fusion.csv

Hi, Developers
Could you explain the exact meaing of "primary transcript" in fusion.csv.
Does it mean the longest transcript or pre-mature transcript ?

many thanks
zhengxc

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.