Coder Social home page Coder Social logo

genefuse's Issues

gen_fusion_file.jl

Hi,

the same issue with gen_fusion_file.jl for csv file preparation while using Julia 0.6 version.

ERROR: LoadError: UndefVarError: readstring not defined in detect at /home/ubuntu/julia/gen_fusion_file.jl:33 in main at /home/ubuntu/julia/gen_fusion_file.jl:22 while loading /home/ubuntu/julia/gen_fusion_file.jl, in expression starting on line 119

Could you please generate result file .

My genes are attached.
genes (2).txt

Many thanks for your responce

gen_fusion_file.jl is out of date

Hi,

I notice that the scripts/gen_fusion_file.jl is out of date and can't be used anymore.
So I wrote a simple python script to replace its function, and I would like to share it with others. However any pushes to this repo is banned, would you mind to add this script manually?

"""
Generate a user-defined fusion file 

Input file should be a text file containing 
1. gene name and 
2. transcript name (optional)

For example:
gene1   transcript1
gene2
gene3   transcript3
"""
__author__ = "Kai"
__date__ = "20/05/2020"

import argparse


def read_genelist(inputf):
    with open(inputf, "r") as fh:
        for line in fh:
            yield line.rstrip("\n").split()


def make_fusion_gene(gene, fw, refflat):
    # no transcript specified --> use the longest transcript
    if len(gene) == 1:
        transcripts = {}
        with open(refflat, "r") as fh:
            for line in fh:
                if gene[0] not in line:
                    continue
                _, transcript, chrom, _, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
                transcripts[transcript] = (chrom, start, end, exonstart, exonend)
        transcript = get_longest_transcript(transcripts.keys(), refflat)
        chrom, start, end, exonstart, exonend  = transcripts[transcript]

    # use user-specified transcript
    elif len(gene) == 2:
        with open(refflat, "r") as fh:
            for line in fh:
                if gene[1] not in line:
                    continue
                _, transcript, chrom, _, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
                break
    
    # write to a file
    header = f">{gene[0]}_{transcript},{chrom}:{start}-{end}\n"
    fw.write(header)
    exons = list(zip(exonstart.split(","), exonend.split(",")))[:-1]
    for index, each_exon in enumerate(exons, start=1):
        fw.write(f'{index},{each_exon[0]},{each_exon[1]}\n')
    fw.write("\n")


def get_longest_transcript(transcripts, refflat):
    longest_length = 0
    longest_transcript = ""
    with open(refflat, "r") as fh:
        for line in fh:
            line = line.strip().split()
            if line[1] in transcripts:
                length = int(line[5]) - int(line[4])
                if length > longest_length:
                    longest_length = length
                    longest_transcript = line[1]
    return longest_transcript


if __name__ == "__main__":
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument("input", help="Input filename")
    parser.add_argument("-r", "--refflat", required=True, help="Path to the refFlat.txt file, need to be downloaded from UCSC in advance")
    parser.add_argument("-o", "--output", required=True, help="The output filename")
    args = parser.parse_args()
    
    with open(args.output, "w") as fw:
        for gene in read_genelist(args.input):
            make_fusion_gene(gene, fw, args.refflat)

How to get VAF info from genefuse results?

Hi there:
#Fusion: EML4_ENST00000318522.5:intron:18|-2:42544153___ALK_ENST00000389048.3:intron:19|+2:29446769 (total: 155, unique:13)
155 refers to the number of supporting reads and 13 is the unique reads. Can we calculate VAF by 13/155*100 = 8.4% ?
Or where else can we get VAF info from the result file? Thanks!

primary transcripts used in fusion.csv

Hi, Developers
Could you explain the exact meaing of "primary transcript" in fusion.csv.
Does it mean the longest transcript or pre-mature transcript ?

many thanks
zhengxc

What is the minimum required read length?

I conducted a simulation. When my read length is 89bp, it outputs expected fusions. However when I use read length of 85bp. There are no results.

The main change is the LOG information: "sequence number before filtering: XXX". Shorter read length result in very few (or even zero) sequence number before filtering.

How do I adjust the source code to satisfy more shorter reads?

the data size of raw data effected fusion result

Hi,I cut off my raw data fastq to 300M(1000X),and use gene fuse to detect. The result was negative, no fusion detected. Using genefuse to detect 2G of my raw data , genefuse gets a lot of fusion results. Maybe the size of data effected genefuse.

tmprss-erg

>TMPRSS2_ENST00000332149.9,chr21:41464551-41531116
1,41508081,41508159
2,41498119,41498189
3,41494356,41494578
4,41489507,41489593
5,41488394,41488513
6,41480476,41480602
7,41479172,41479282
8,41476577,41476620
9,41473325,41473496
10,41471806,41471981
11,41470648,41470743
12,41468396,41468538
13,41467734,41467886
14,41464551,41466153
>ERG_ENST00000288319.11,chr21:38380027-38661780
1,38498363,38498483
2,38445404,38445621
3,38423410,38423561
4,38403506,38403709
5,38402557,38402637
6,38400574,38400645
7,38392376,38392444
8,38391659,38391715
9,38390995,38391042
10,38380027,38383923

No fusion found using tests data

Hi Shifu,
no fusion found using tests data
./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 R1.fq.gz -2 R2.fq.gz -h r1r2n.html
15:51:11 start with 4 threads
15:51:50 mapper indexing done
15:52:20 sequence number before filtering: 0
15:52:20 removeByComplexity: 0
15:52:20 removeByDistance: 0
15:52:20 removeIndels: 0
15:54:5 matcher indexing done
15:54:5 removeAlignables: 0
15:54:5 found 0 fusions

./genefuse -r Homo_sapiens_assembly19.fasta -f druggable.hg19.csv -1 genefuse.R1.fq.gz -2 genefuse.R2.fq.gz -h genefuser1r2n.html
15:55:45 start with 4 threads
15:56:25 mapper indexing done
15:56:36 sequence number before filtering: 0
15:56:36 removeByComplexity: 0
15:56:36 removeByDistance: 0
15:56:36 removeIndels: 0
15:58:25 matcher indexing done
15:58:25 removeAlignables: 0
15:58:25 found 0 fusions

Dataset was downloaded from: http://opengene.org/dataset.html

Thanks.

error when creating fusion.csv file

Hi,

I am trying to create the fusion.csv file using the script that you provide gen_fusion_file.jl but I get the error:

loading gencode grch38
ERROR: LoadError: KeyError: key "localfile" not found

And this is what I run:
julia scripts/gen_fusion_file.jl -r GRch38 -g genes_for_csv.txt -f fusion.csv

Thanks a lot,

Erica

How to use GeneFuse to detect fusion for nanopore sequencing fastq?

Hi sfchen,
I have installed genefuse and worked for the test data successfully. However, it did not work for nanopore fastq. Could you help me? Thank you!
My command line as follow:
genefuse -r ~/database/genome/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa -f cosmic-fusion-list/cancer.GRCh38.csv -t 8 -1 ***.fastq -h ***.fusion.html > ***.log

make: *** No rule to make target 'install'. Stop.

Need help to compile from source code.
Got "make: *** No rule to make target 'install'. Stop."
when sudo make install, in ubuntu 16.04 LTS

Running binary without any error but found no fusions when run with dataset for testing.
Is there any library must have before installation, Thanks.

可否用gtf文件产生融合基因列表的csv文件?

如题,大概看了一下制作融合基因列表的jl脚本和文件。脚本仅仅支持hg19或hg38。我理解OpenGene的julia包里这俩人类基因组的外显子信息应该是有来源的,例如gtf文件。那么,我考虑可否通过gtf文件直接来制造候选的融合基因列表呢?

how to see the version of genefuse?

Hi, I got the genefuse by git clone on the command line, and make, finally, I got the binary genefuse, but I can't find the version of it. How to check which version of the software, thanks!

IGH-BCL2 is not detected but bam is supported

I have a sample with BCL2-IGH verified by FISH. There are lots of reads to support viewed by IGV. the pictuers is as follows. I am running with default parameters,and BCL2,IGH is included in csv file

BCL2_NM_000633,chr18:60790579-60987361
1,60986406,60986613
2,60985314,60986185
3,60790578,60795992

IGH,chr14:106052774-107288051

I would like to know whether or not relevant parameters need to be adjusted. Thanks a lot

Support cross pair

Read1

@name1
AGGGTTACCTGAGGATCGAATGAATTGAAATGTGTAAATTGCCGAGCACGTAGTAACCATGCAACAAGTGTTAGCTCCTATTATCCTGTCCCTTTGAGGGATGGCACCATATGGGGACACAGTGTGTGCTGCCATCTCCCTTCTACCGGC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Read2

@name1
TGGGCCCTCCCTCCCAAGCACTGGAGGAGGTACTCGTTCTGTGGGCCGGGGCCCCTCCCTCCTGAGCACTGGAGGAGGCACTTGTTCTGTGGGCCACGGCCCCTCCCTCCCCAGCACTGGAGGAGGCACTGGTTGTGTGGGCCCTGGCTC
+
EEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Multiple runs show different results.

Multiple runs show different unique read count.

vec<Match> is sorted differently a bit per run.

Some Matchs have the same read_break and seq.length().

This was found with hg19.fa, druggable.hg18.csv, test genefuse fastq files.

GeneFuse on RNA-seq

Dear sir,
I tried to apply genefuse to our RNA-seq with known fusion gene.
Somehow we couldn't find any fusion gene.
command I used:
genefuse -r hg19.fa -f druggable.hg19.csv -1 A2076_FFPE_RNA_S1_L001_R1_001.fastq.gz" -2 A2076_FFPE_RNA_S1_L001_R2_001.fastq.gz" -h A2076_FFPE_RNA.html > A2076_FFPE_RNA.result
14:39:27 start with 4 threads
14:39:51 mapper indexing done
14:39:55 sequence number before filtering: 45
14:39:55 removeByComplexity: 0
14:39:55 removeByDistance: 22
14:39:55 removeIndels: 2
14:41:14 matcher indexing done
14:41:14 removeAlignables: 3
14:41:14 found 0 fusions
14:41:16 done
What your suggestion to apply genefuse to RNA-seq ?

Can not get the fusions with the testdata

Can not detect the fusions with the test data,is anyone can help me? all the testdata is from the git and
http://opengene.org/dataset.html

Here is my shell command:
./genefuse -r Homo_sapiens_assembly19.fasta -f genes/cancer.hg19.csv -1 ./testdata/genefuse.R1.fq.gz -2 ./testdata/genefuse.R2.fq.gz -h report.html > result
12:17:45 start with 4 threads
12:18:1 mapper indexing done
12:18:6 sequence number before filtering: 0
12:18:6 removeByComplexity: 0
12:18:6 removeByDistance: 0
12:18:6 removeIndels: 0
12:18:53 matcher indexing done
12:18:53 removeAlignables: 0
12:18:53 found 0 fusions
12:18:53 done

Fusion-making .py script does not take into account gene orientation

Thanks for creating this tool! I have noticed that the scripts/make_fusion_genes.py reports wrong order of exons for genes in '-' orientation, which results in considering some fusions as untranscribed and showing wrong fusion structure.

E.g. NTRK3 in cancer.hg38.csv:

NTRK3_ENST00000394480.6,chr15:87859751-88256768
1,88256644,88256747
2,88256284,88256486
3,88255906,88256168
4,88184225,88184299
...

Versus NTRK3 from the fusions.csv file generated by the script:

NTRK3_NM_002530,chr15:87859750-88256739
1,87859750,87877120
2,87880269,87880428
3,87929190,87929434
4,87933011,87933184
...

Here is a quick fix I introduced to make_fusion_genes.py:

Line 33:
_, transcript, chrom, strand, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
Line 44:
_, transcript, chrom, strand, start, end, _, _, _, exonstart, exonend = line.rstrip("\n").split("\t")
And insert these after line 50:
if strand == '-':
exons = exons[::-1]

This returns the exons in the right order.

while running Stuck

hi,
Sometimes it gets stuck and does not run. The final state is as follows. and the task is suspended. Why is it? Please help me, thanks!
23:11:23 start with 4 threads
23:11:27 mapper indexing done

fusion only option

I got a lot of deletions when I ran our real data using illumina targeted RNA fusion panel.
This panel is sequenced by 2x75 bp. Would you add an option for fusion only?

different gene set gives different results for same fusion

Hi thank-you for a great tool. I ran genefuse on my WES data from a sample with a known ETV6--NTRK3 fusion. I ran with druggable.hg38.csv and cancer.hg38.csv which both have ETV6 and NTRK3 gene, the exons and coordinates are identical for these 2 genes in the 2 .csv files. But I only detect the ETV6--NTRK3 fusion if I use the smaller druggable.hg38.csv gene set, and not if I use cancer.hg38.csv. I was wondering if having more genes in cancer.hg38.csv potentially means reads with better alignment to other genes are aligning to those genes rather than ETV6 or NTRK3?

JSON output corrupted due to a not escaped double-quotes character

I'm using genefuse JSON output to handle fusions information in order to prioritize them, so I'm using a Python script to read the JSON file. Sometimes the JSON output is corrupted because in the "qual" value of each read it could be used the double-quotes symbol. For example:
"reads":[ { "break":33, "strand":"reversed", "seq":"TTTTTTATAGGATTTGGGAAGGTAATGGAAAATTCCAGTCAAAGGGGGTTGTTCTCTGGTGGGCAGGGGCGGGGGTCACAAGGTGCTCAGTGGGGGAG", "qual":"9.B>?<5:=B94ACABA;BA@79B>B?6@A?@BB"@=<@:A=9@ABB>BB>B>?B.@A6BB?@>?@@B?7AAB>=<B=AA@A7B@:?<A9AA@@?=??" },

When genefuse writes the JSON file the double-quotes is not escaped, so trying to read the JSON file cause an error.
At the moment I'm avoiding the problem using a regex pattern to manipulate the JSON file and make it correct. I hope there's a way to update the construction of the JSON output in order to add the escape slash before double-quotes in "qual" value.

std::bad_alloc

Hi:
I got a problem:
19:31:16 start with 12 threads
19:34:24 mapper indexing done
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

I am wondering what caused this problem?

No fusion found in tests data

amar@amar-Virtual-Machine:~/GeneFuse$ ./genefuse -1 testdata/R1.fq -2 testdata/R2.fq -r Homo_sapiens_assembly19.fasta -f genes/cancer.hg19.csv
22:53:5 start with 4 threads
22:53:9 mapper indexing done
22:53:9 sequence number before filtering: 0
22:53:9 removeByComplexity: 0
22:53:9 removeByDistance: 0
22:53:9 removeIndels: 0
22:53:25 matcher indexing done
22:53:25 removeAlignables: 0
22:53:25 found 0 fusions

./genefuse -1 testdata/R1.fq -2 testdata/R2.fq -r Homo_sapiens_assembly19.fasta -f genes/cancer.hg19.csv

genefuse v0.3.0, time used: 20 seconds

22:53:25 done
No fusion gene found in test data.Kindly reply....

the gen_fusion_file.jl script does not work because OpenGene can not be installed

Hi !

I would like to run the julia script on a list of genes to get the file fusion.csv but I have not managed to install OpenGene, here is the error returned


julia> Pkg.add("OpenGene")
   Cloning default registries into /root/.julia/registries
   Cloning registry General from "https://github.com/JuliaRegistries/General.git"
  Updating registry at `~/.julia/registries/General`
  Updating git-repo `https://github.com/JuliaRegistries/General.git`
 Resolving package versions...
ERROR: Unsatisfiable requirements detected for package OpenGene [63bcc5cb]:
 OpenGene [63bcc5cb] log:
 ├─possible versions are: 0.1.0-0.1.11 or uninstalled
 ├─restricted to versions * by an explicit requirement, leaving only versions 0.1.0-0.1.11
 └─restricted by julia compatibility requirements to versions: uninstalled — no versions left
Stacktrace:
 [1] #propagate_constraints!#61(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1005
 [2] propagate_constraints! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:946 [inlined]
 [3] #simplify_graph!#121(::Bool, ::Function, ::Pkg.GraphType.Graph, ::Set{Int64}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1460
 [4] simplify_graph! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/GraphType.jl:1460 [inlined]
 [5] macro expansion at ./logging.jl:319 [inlined]
 [6] resolve_versions!(::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:338
 [7] #add_or_develop#58(::Array{Base.UUID,1}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/Operations.jl:1164
 [8] #add_or_develop at ./none:0 [inlined]
 [9] #add_or_develop#13(::Symbol, ::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:64
 [10] #add_or_develop at ./none:0 [inlined]
 [11] #add_or_develop#12 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:29 [inlined]
 [12] #add_or_develop at ./none:0 [inlined]
 [13] #add_or_develop#11 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:28 [inlined]
 [14] #add_or_develop at ./none:0 [inlined]
 [15] #add_or_develop#10 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:27 [inlined]
 [16] #add_or_develop at ./none:0 [inlined]
 [17] #add#18 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:69 [inlined]
 [18] add(::String) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.0/Pkg/src/API.jl:69 
 [19] top-level scope at none:0

here my geneList (iI working with hg19), I would be happy if you can help me ()

CAMTA1
CCNB3
CIC
EPC1
EWSR1
FOXO1
FUS
GLI1
HMGA2
JAZF1
MEAF6
MKL2
NCOA2
NTRK3
PDGFB
PLAG1
ROS1
SS18
STAT6
TAF15
TCF12
TFE3
TFG
USP6
YWHAE

Thank you

GeneFuse ignores fusions in TCF3

Hello,
Thanks for developing this tool! I was testing it on some hematological samples with TCF3 gene fusions (confirmed by FISH), and it seems like GeneFuse is not picking them up. I have checked that the gene is correctly reflected in the reference, and the reads connecting TCF3 and its partners are uniquely mapped to its location in the corresponding BAM files. I have two samples with different fusions, and in both of them, the fusion was not predicted even as untranscribed. Can you suggest what else can I try to fix this problem?
Thanks in advance!
Cheers,
Nadezda

Looking for "Encompassing Reads" in the output.

I have a question. Does GeneFuse report "Encompassing Reads" in the output? Thank you.

Note:
Encompassing reads are pairs of reads with each in the pair aligned to a different gene, thereby surrounding the fusion junctions or genomic breakpoints, and spanning reads are reads partially aligned to both genes corresponding to a fusion junction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.