Coder Social home page Coder Social logo

hrdetect-pipeline's Introduction

HRDetect

This implementation of the HRDetect pipeline is intended for RESEARCH USE ONLY and NOT FOR THE PURPOSE OF INFORMING CLINICAL DECISION-MAKING.

This repository contains an implementation of HRDetect used by Eric Y. Zhao at the Genome Sciences Centre.

HRDetect is run via a Snakemake pipeline. It depends upon a working installation of R, with numerous dependencies. These may be installed by running make dependencies, which builds a miniconda environment with the necessary installations. To source that environment after it is installed, simply run source dependencies/miniconda3/bin/activate dependencies prior to running the pipeline.

Depending on your system and needs, some of these dependencies may require tweaking. Notably, the dependency installer assumes a linux-based environment.

Also necessary are two in-house tools called SignIT and hrdtools, which can be acquired by running make.

> make

if [ -d git/hrdtools ]; \
    then(cd git/hrdtools && git pull); \
    else git clone [email protected]:eyzhao/hrdtools.git git/hrdtools; \
    fi
Cloning into 'git/hrdtools'...
remote: Counting objects: 97, done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 97 (delta 1), reused 0 (delta 0), pack-reused 91
Receiving objects: 100% (97/97), 163.64 KiB | 821.00 KiB/s, done.
Resolving deltas: 100% (16/16), done.
if [ -d git/SignIT ]; \
    then(cd git/SignIT && git pull); \
    else git clone [email protected]:eyzhao/SignIT.git git/SignIT; \
    fi
Cloning into 'git/SignIT'...
remote: Counting objects: 394, done.
remote: Compressing objects: 100% (113/113), done.
remote: Total 394 (delta 50), reused 58 (delta 22), pack-reused 259
Receiving objects: 100% (394/394), 1.11 MiB | 1.44 MiB/s, done.
Resolving deltas: 100% (153/153), done.

Adding a project-specific pipeline

HRDetect expects data files of specific types and formats at specific locations. All files are housed at data/{project}/{subproject}/patients/{patient}/{sample}/, where {patient} and {sample} together form a unique ID pair for a given sample. Within each such directory, there must be four files specific to the sample.

  1. segments.tsv
  2. somatic_indels.vcf
  3. somatic_snvs.vcf
  4. somatic_sv.tsv

If you would like to use the Snakemake pipeline as is, then you can provide a project-specific file under the projects directory. Some projects files are already there as an example. You can then link to the project by adding a line include: "project/myproject.smk" in Snakefile.

If you would like to construct your own pipeline structure, please feel free to use the scripts in the scripts folder as needed.

Example pipeline

For those interested in constructing their own working pipeline using Snakemake, there is an example project named example within this repository to demonstrate how this could work.

  • An example of raw input data can be found in examples/example.
  • This example data is copied and/or processed into the appropriate format, which is then stored in data/example. The Snakemake pipeline commands that pre-process the raw data are found in projects/example.smk.
  • The expected output after running the full pipeline can be found in output/example
  • The Snakefile is currently set up to run this example data. If you would like to run an integration test yourself, try deleting part or all of output/example and data/example and running snakemake -p (adding -p doesn't change how Snakemake runs, but will show you the commands being issued in each step).

Note that you may still run into errors/issues depending on your specific environment setup. As this pipeline has not yet been tested across a wide variety of environments, we are continuing to work out the kinks.

segments.tsv

This is a file with segmented CNV/LOH calls with at least 5 columns.

  1. chr: The chromosome name
  2. start: Start position of CNV/LOH call
  3. end: End position of CNV/LOH call
  4. copy_number: The tumour copy number of the segment
  5. lohtype: The type of LOH state. Should be amongst the following:
    • ASCNA: Allele-specific copy number amplification
    • BCNA: Balanced copy number amplification
    • HET: Heterozygous (normal)
    • NLOH: Neutral LOH (loss of heterozygosity, but 2 copies present)
    • DLOH: Deletion LOH
    • ALOH: Amplification LOH

somatic_indels.vcf

A VCF file containing indels which can be parsed by R using the readVCF() function of VariantAnnotation.

somatic_snvs.vcf

A VCF file containing SNVs which can be parsed by R using the readVCF() function of VariantAnnotation.

somatic_sv.tsv

A tab-delimited file with structural variant data. Should contain the following columns:

  1. chr1: Name of the first chromosome involved
  2. pos1: Coordinate of the SV breakpoint corresponding to the first chromosome
  3. chr2: Name of the second chromosome involved
  4. pos2: Coordinate of the SV breakpoint corresponding to the second chromosome
  5. type: Can take on values DEL, DUP, TRA, or INV. Unless the value is TRA, the two chromosomes should be the same.

Citation

If you use HRDtools or this implementation of HRDetect in your publication, please cite the following study:

Zhao EY, Shen Y, Pleasance E, ... Jones SJM, Homologous Recombination Deficiency and Platinum-Based Therapy Outcomes in Advanced Breast Cancer. Clinical Cancer Research. 2017 Dec 15;23(24):7521-7530

hrdetect-pipeline's People

Contributors

eyzhao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.