Coder Social home page Coder Social logo

sunqiangzai / fusionscan Goto Github PK

View Code? Open in Web Editor NEW

This project forked from iamlife/fusionscan

0.0 1.0 0.0 11.71 MB

a fast and accurate prediction tool for fusion genes from RNA-seq data

Home Page: http://biome.ewha.ac.kr/fusionscan/fusionscan.html

Python 81.62% Perl 18.14% Shell 0.24%

fusionscan's Introduction

======================================================================================================
Name		: FusionScan
Version		: 1.0
Last Updated	: 2014-10-25
Description	: FusionScan is	a accurate prediction tool for fusion genes from RNA-seq data. 
		 FusionScan has better	precision and recall than other	tools.
		 Please see http://fusionscan.ewha.ac.kr/ for FusionScan manual and documentation.
======================================================================================================

DOWNLOAD
	GETTING	THE SOFTWARE AND INDEX	FILES
	: You can get the recent version of fusionscan from https://github.com/iamlife/FusionScan or under link.

	FILE NAME			DESCRIPTION					RECOMENDED LOCATION
	fusionscan			fusionscan src files				your wanted location
											tar xvfz fusionscan.tar.gz
											cd fusionscan/
	hg19.2bit.gz			hg19.2bit file to run gfServer(blat)		/fusionscan/src_dir/
	nib_files_hg19.tar.gz		hg19 nib files to use nibFrag			/fusionscan/src_dir/
	bowtie2_index_hg19.tar.gz	bowtie2	index files for	hg19 & hg19.fa		/fusionscan/src_dir/
	bowtie2_index_refMrna.tar.gz	bowtie2	index files for	refMrna	& refMrna.fa	/fusionscan/src_dir/
	ssaha2_index_hg19.tar.gz	ssaha2 index files for hg19			/fusionscan/src_dir/
	gene_dir.tar.gz			gene annotation	files				/fusionscan/
	rmsk_dir.tar.gz			repeatMasker files				/fusionscan/


REQUIREMENT

	ENVIROMENT TO RUN THE SOFTWARE
	- JDK1.7 or Jre1.7, Python
	: FusionScan's main process is developed with JAVA language and	whole-process is in Python script. 


	REQUIRED INDEX AND SEQUENCE FILES
	- bowtie2 index	files for hg19 & hg19.fa
	- bowtie2 index	files for refMrna & refMrna.fa
	- ssaha2 index files for hg19


	REQUIRED PROGRAMS (go to SETTING)
	- bowtie2: to obtain unmapped reads
	- SSAHA2: to align reads to genome
	- BLAT (gfServer, gfClient): to	confirm	final candidates
	- nibFrag: to make pseudo fusion transcript
	- bl2seq (blast): to search sequence homology
	- fastq_quality_trimmer, fastx_collapser, fastx_artifacts_filter (FASTX-Toolkit): to do	pre-processes


HOW TO RUN

INSTALL

  $ tar -xvzf fusionscan.tar.gz
  $ cd fusionscan/

  *** each directory	includes ***

  - fusionscan/ : 
		- Run_FusionScan.py					: to run whole process

  - fusionscan/src_dir/: contains various programs which are	used for 
			 sequence alignment,	sequence comparison, getting sequences from genome.
			
		- FusionScan.jar: FusionScan jar file
	
		* Under	files should be	downloaded in src_dir/
		
		- hg19.2bit: to	use gfServer
		- nib_files_hg19/* : to	use nibFrag
		- bowtie2_index_hg19/*:	to use BOWTIE2
		- bowtie2_index_refMrna/*: to use BOWTIE2
		- ssaha2_index_hg19/*: to use SSAHA2
		
		* Under	binary src files which are proper to your system should	be copied in src_dir/ 
		 in case of linux_ix86_64, all	binaries are included in fusionscan/src_dir/	
		
		- nibFrag: from	BLAT
		- gfClient: from BLAT
		- gfServer: from BLAT
		- bl2seq: from BLAST
		- bowtie2: from	BOWTIE2
		- bowtie2-align: from BOWTIE2
		- bowtie2-align-debug: from BOWTIE2
		- fastq_quality_trimmer: from FASTX-Toolkit
		- fastx_collapser: from	FASTX-Toolkit
		- fastx_artifacts_filter: from FASTX-Toolkit
		- ssaha2: from SSAHA2
		- ssaha2Build: from SSAHA2

	- fusionscan/gene_dir/	: contains various gene	annotation files.
				 These	files can be downloaded	from here and extract it in fusionscan/	

		- duplicated_gene_database_dgd_Hsa_all_v68.tsv:	duplicated gene	database file
		- pseudogeneHUGO.txt: HGUO file's pseudogene part info file
		- refGene_and_indexes.txt: indexed refGene file
		- refGene.txt: raw refGene file

	- fusionscan/rmsk_dir/	: contains simple formatted repeatMasker data.
				 These	files can be downloaded	from here and extract it in fusionscan/


SETTING
	0. download index files and fasta files for hg19 and refMrna from here and extract it in fusionscan/

	*** FusionScan needs bowtie2, SSAHA2, bl2seq and FASTX-Toolkit binary files.

	1. Bowtie2 :	 http://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.1.0/
	Copy bowtie2's executable files (bowtie2, bowtie2-align,	bowtie2-align-debug) 
	proper to your linux system to fusionscan/src_dir/
	$ cp bowtie2 fusionscan/src_dir/
	$ cp bowtie2-align fusionscan/src_dir/
	$ cp bowtie2-align-debug	fusionscan/src_dir/

	2. SSAHA2 : http://www.sanger.ac.uk/resources/software/ssaha2/
	Copy SSAHA2's executalbe	files (ssaha2, ssaha2Build) 
	proper to your linux system to fusionscan/src_dir/
	$ cp ssaha2 fusionscan/src_dir/
	$ cp ssaha2Build	fusionscan/src_dir/

	3. FASTX-Toolkit : http://hannonlab.cshl.edu/fastx_toolkit/download.html
	Copy FASTX-Toolkit's executable file (fastq_quality_trimmer, fastx_collapser, fastx_artifacts_filter) 
	proper to your linux system to fusionscan/src_dir/
	$ cp fastq_quality_trimmer fusionscan/src_dir/
	$ cp fastx_collapser fusionscan/src_dir/
	$ cp fastx_artifacts_filter fusionscan/src_dir/
 
	4. blast (bl2seq) : ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/ 
	Copy blast's executable file (bl2seq) 
	proper to your linux system to fusionscan/src_dir/
	$ cp bl2seq fusionscan/src_dir/

	5. blat : http://hgdownload.cse.ucsc.edu/admin/exe/
	Copy blat's executable file (gfClient, gfServer,	nibFrag) 
	proper to your linux system to fusionscan/src_dir/
	$ cp gfClient fusionscan/src_dir/
	$ cp gfServer fusionscan/src_dir/
	$ cp nibFrag fusionscan/src_dir/

	*** cf1.	all index files	for two	alignment program are included in fusionscan/src_dir/)
	*** cf2.	in case	of linux_ix86_64 : all binaries	are included in	fusionscan/src_dir/


COMMANDS
	
	Usage:
	python Run_FusionScan.py <reads1[,reads2]> <output-dir>	<read-length> <gfserver	host:port> [options]

	** <gfserver host:port>	is need	to use blat (gfclient) at the last step.

	Options:
	--phred33 : qualities are Phred+33 [ default]
	--phred64 : qualities are Phred+64
	-P/--threads : number of threads <int> [ default: 1]
	-ms/--min-seed : minimum number of seed reads <int> [ default: 2]
	-md/--min-distance : minimum distance between two genes <int> [ default: 100000 ]
	-kfgs/--known-fusion-gene-search : searching known kinase contained fusion genes

	Example:
		python	Run_FusionScan.py Test.fa Test 75 00.00.00.00:8100 --phred33 -P 1 -ms 2


OUTPUTS

	1. FUSION INFORMATION FILE
		fusionscan/project_name/all_candidates_for_project_name_seed2.txt 
		-------------------------------------------------------
		| column         | explanation             | example  |
		--------------------------------------------------------
		| Hgene          | head gene symbol        | EML4     |
		| Hchr           | head gene's chromosome  | chr2     |
		| Hstrand        | head gene's strnad      | +        |
		| Hbp            | head gene's break point | 42522656 |
		| Tgene          | tail gene symbol        | ALK      |
		| Tchr           | tail gene's chromosome  | chr2     | 
		| Tstrand        | tail gene's strand      | -        |
		| Tbp            | tail gene's break point | 29446394 |
		| Hg-Tg          | head-tail gene pair     | EML4-ALK |
		| # of seed read | seed read count         | 13       |
		| BPseq          | 60bp sequence before & after of exon junction break point | TAGAGCCCACACCTGGGAAAGGACCTAAAGtgtaccgccggaagcaccaggagctgcaag |
		-------------------------------------------------------

	2. READ ALIGNMENT FILE FOR EACH FUSION CANDIDATE
		fusionscan/project_name/alignment_dir/EML4_ALK.RNAalign

	3. CREATED DIRECTORIES
		- fusionscan/project_name/: is created when you run Run_FusionScan.py.
					 under all directories and files are created in project_name directory.

		- alignment_dir/ (*.RNAalign*): contains the alignment files for fusion candidate reads, 
						 these are created when the total analysis is end.

		- fa_dir/: contains the splited fasta files.

		- pslx_dir/: contains the pslx formated alignment files.

		- log_dir/: contains the log files.

		- all_candidates_for_project_name_seed1.txt: result report for all candidates.

		- all_candidates_for_project_name_seed2.txt: result report for candidates having more than 2 seed reads.

		- cout_per_each_step_project_name: statistics for each step.

		- log_project_name: logs for each reads during running FusionScan.jar


	4. COVERAGE PLOT FOR FUSON GENE

		Usage: python make_cvg_plot.py <FusionScan_result_file> <RNA_seq_bamfile>

		** FusionScan's result as same as upper format , especially cases seed >= 2
		** RNA-seq has to be aligned on hg19. This should be input by user.
======================================================================================================

fusionscan's People

Contributors

iamlife avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.