Coder Social home page Coder Social logo

jh2663 / sv-callers Goto Github PK

View Code? Open in Web Editor NEW

This project forked from googlingthecancergenome/sv-callers

0.0 1.0 0.0 215.43 MB

Snakemake-based workflow for detecting structural variants in WGS data

Home Page: https://research-software.nl/software/sv-callers

License: Apache License 2.0

Python 93.55% Shell 6.45%

sv-callers's Introduction

sv-callers

DOI Published in PeerJ Build Status Codacy Badge

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. sv-callers is a Snakemake-based workflow that combines several state-of-the-art tools for detecting SVs in whole genome sequencing (WGS) data. The workflow is easy to use and deploy on any Linux-based machine. In particular, the workflow supports automated software deployment, easy configuration and addition of new analysis tools as well as enables to scale from a single computer to different HPC clusters with minimal effort.

Dependencies

  • Python 3
  • Conda - package/environment management system
  • Snakemake - workflow management system
  • Xenon CLI - command-line interface to compute and storage resources
  • jq - command-line JSON processor (optional)

The workflow includes the following bioinformatics tools:

1. Clone this repo.

git clone https://github.com/GooglingTheCancerGenome/sv-callers.git
cd sv-callers

2. Install dependencies.

# download Miniconda3 installer
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
# install Conda (respond by 'yes')
bash miniconda.sh
# update Conda
conda update -y conda
# create & activate new env with installed deps
conda env create -n wf -f environment.yaml
conda activate wf
cd snakemake

3. Configure the workflow.

  • config files:

    • analysis.yaml - analysis-specific settings (e.g., workflow mode, I/O files, SV callers, post-processing or resources used etc.)
    • environment.yaml - software dependencies and versions
  • input files:

    • example data in sv-callers/snakemake/data directory
    • reference genome in .fasta (incl. index files)
    • excluded regions in .bed (optional)
    • WGS samples in .bam (incl. index files)
    • list of (paired) samples in samples.csv
  • output files:

    • (filtered) SVs per caller and merged calls in .vcf (incl. index files)

4. Execute the workflow.

# 'dry' run only checks I/O files
snakemake -np

# 'vanilla' run (default) mimics the execution of SV callers by writing (dummy) VCF files
snakemake -C echo_run=1

Note: One sample or a tumor/normal pair generates eight SV calling jobs (i.e., 1 x Manta, 1 x LUMPY, 1 x GRIDSS and 5 x DELLY) and six post-processing jobs. See the workflow instance of single-sample (germline) or paired-sample (somatic) analysis.

Submit jobs to Slurm or GridEngine cluster

SCH=slurm   # or gridengine
snakemake -C echo_run=1 mode=p enable_callers="['manta','delly','lumpy','gridss']" --use-conda --latency-wait 30 --jobs 14 \
--cluster "xenon scheduler $SCH --location local:// submit --name smk.{rule} --inherit-env --cores-per-task {threads} --max-run-time 1 --max-memory {resources.mem_mb} --working-directory . --stderr stderr-%j.log --stdout stdout-%j.log" &>smk.log&

To perform SV calling:

  • overwrite (default) parameters directly in analysis.yaml or via the snakemake CLI (use the -C argument)

    • set echo_run=0
    • choose between two workflow modes: single- (s) or paired-sample (p - default)
    • select one or more callers using enable_callers (default all: "['manta','delly,'lumpy','gridss']")
  • use xenon CLI to set:

    • --max-run-time of workflow jobs (in minutes)
    • --temp-space (optional, in MB)
  • adjust compute requirements per SV caller according to the system used:

    • the number of threads,
    • the amount of memory(in MB),
    • the amount of temporary disk space or tmpspace (path in TMPDIR env variable) can be used for intermediate files by LUMPY and GRIDSS only.

Query job accounting information

SCH=slurm   # or gridengine
xenon --json scheduler $SCH --location local:// list --identifier [jobID] | jq ...

sv-callers's People

Contributors

arnikz avatar cshneider-zz avatar sverhoeven avatar codacy-badger avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.