Coder Social home page Coder Social logo

yingya / gridss-purple-linx Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vladsavelyev/gridss-purple-linx

0.0 0.0 0.0 2.94 MB

Standalone GRIDSS/PURPLE/LINX pipeline

License: GNU General Public License v3.0

Dockerfile 8.58% Shell 91.42%

gridss-purple-linx's Introduction

GRIDSS PURPLE LINX

The GRIDSS/PURPLE/LINX toolkit takes a pair of match tumour/normal BAM files, and performs somatic genomic rearrangement detection and classificatiion.

  • GRIDSS: performs structural variant calling
  • PURPLE: performs allele specific copy number calling
  • LINX: performs event classification, and visualisation

The simplest way to run the toolkit is via the docker image.

Reference data

The toolkit requires multiple reference data files and these have been packaged into a single file for HG37 and HG38. These can be downloaded from the following locations:

Reference Genome Download Location
GRCh37 https://resources.hartwigmedicalfoundation.nl/ then navigate to HMFTools-Resources/GRIDSS-Purple-Linx-Docker/gpl_ref_data_hg37.gz
GRCh38 https://resources.hartwigmedicalfoundation.nl/ then navigate to HMFTools-Resources/GRIDSS-Purple-Linx-Docker/gpl_ref_data_hg38.gz

Download the docker image

The docker image can be downloaded from dockerhub with the latest tag. For alternative versions, all tags can be found here.

# Download the latest version of the docker images
docker pull gridss/gridss-purple-linx:latest

Building the docker image / release package

Building the docker image is not required. If you do however want to build it yourself, then the release packaging build script will create the release package and generate a docker image for that release:

./build.sh

Running the Docker Image Pipeline

The docker images assumes the following:

  • The reference data is mounted read/write in /refdata
  • The input/output directory is mounted read/write in /data

Run docker image as follows:

docker run -v /path_to_ref_data/:/refdata \
	-v /path_to_sample_data/:/data/:Z \
	gridss/gridss-purple-linx:latest \
	-n /data/SAMPLE.sv.normal.bam \
	-t /data/SAMPLE.sv.tumor.bam \
	-s SAMPLE \
	--snvvcf /data/SAMPLE.somatic.vcf.gz \
	--ref_genome_version HG37 \
	--ulimit nofile=100000:100000

Providing a somatic point-mutation VCF can improve Purple's copy number fit for samples with low aneuploidy. This file must have the AD field populated. Otherwise use the argument --nosnvvcf.

The ulimit increase is due to GRIDSS multi-threading using many file handles.

Optional arguments

Argument Description Default
--output_dir Output directory /data/
--threads Number of threads to use number of cores available
--ref_genome_version Either HG37 or HG38 HG37
--jvmheap Maximum java heap size for high-memory steps 25g

Outputs

Outputs are located in subdirectories of --output_dir corresponding to each of the tools. Consult the tool documentation for details of the output file formats:

Memory/CPU usage

Running it's default settings, the pipeline will use 25GB of memory and as many cores are available for the multi-threaded stages (such as GRIDSS assembly and variant calling). These can be overridden using the --jvmheap and --threads argumennts. A minimum of 14GB of memory is required and at least 3GB per core should be allocated. Recommended settings are 8 threads and 25gb heap size (actual memory usage will be slightly higher than heap size).

Reference Genomes

If the BAMs have been aligned with a different ref genome than the one provided in the Hartwig reference data, then either:

  • overwrite reference genome files in /ref_data/refgenomes/ OR
  • realign the reads to the reference genome supplied with the reference genome files in /ref_data/refgenomes/

Running the Pipeline Directly

As an alternative to running the pipeline via the docker image, the following script can be called directly to execute each component in turn:

install_dir=~/
GRIDSS_VERSION=2.9.4
COBALT_VERSION=1.11
PURPLE_VERSION=2.51
LINX_VERSION=1.12
export GRIDSS_JAR=$install_dir/gridss/gridss-${GRIDSS_VERSION}-gridss-jar-with-dependencies.jar
export AMBER_JAR=$install_dir/hmftools/amber-${AMBER_VERSION}-jar-with-dependencies.jar
export COBALT_JAR=$install_dir/hmftools/count-bam-lines-${COBALT_VERSION}-jar-with-dependencies.jar
export PURPLE_JAR=$install_dir/hmftools/purity-ploidy-estimator-${PURPLE_VERSION}-jar-with-dependencies.jar
export LINX_JAR=$install_dir/hmftools/sv-linx-${LINX_VERSION}-jar-with-dependencies.jar

$install_dir/gridss-purple-linx/gridss-purple-linx.sh \
	-n /path_to_sample_data/SAMPLE.sv.normal.bam \
	-t /path_to_sample_data/SAMPLE.sv.tumor.bam \
	-s SAMPLE \
	--snvvcf /path_to_sample_data/SAMPLE.somatic.vcf.gz \
	--ref_dir ~/refdata \
	--install_dir $install_dir \
	--rundir ~/colo829_example

For the list of packages and tools required for the pipeline, see Dockerfile (https://github.com/hartwigmedical/gridss-purple-linx/blob/master/Dockerfile).

gridss-purple-linx's People

Contributors

alexiswl avatar charlesshale avatar d-cameron avatar hkeward avatar kulinseth avatar p-priestley avatar vladsavelyev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.