Coder Social home page Coder Social logo

wf-cnv's Introduction

wf-cnv

This repository contains a Nextflow workflow for carrying out copy number analysis, using a read depth method implemented by the R package QDNAseq. The input to the workflow is either sequence data in FASTQ or BAM format, and the output per sample is an HTML report containing chromosome copy summary, ideoplot, plot of read counts per bin, links to genes in detected CNVs, and QC data. The workflow also produces read statistics, a BAM alignment file (if FASTQ was provided as input), BED files of both raw and normalised read counts, and a VCF file.

Please note, currently CNV calling is restricted to human genome builds hg19 and hg38. For more information about the workflow, please see this EPI2ME labs blog post.

Introduction

The workflow takes BAM or FASTQ data, aligns to a reference genome (if FASTQ files are supplied), and uses the R package QDNAseq to call copy number aberrations.

Best practices for human copy number calling are actively being investigated by the ONT applications team, and this workflow puts some of that work into something that can be easily used by our community.

wf-cnv also utilises our new reporting and plotting package ezcharts. This uses Python dominate and an Apache echart API to allow us to make modern, responsive layouts and plots with relative ease.

Quickstart

The workflow uses Nextflow to manage compute and software resources, and as such Nextflow will need to be installed before attempting to run the workflow.

The workflow can currently be run using either Docker or Singularity to provide isolation of the required software. Both methods are automated out-of-the-box provided either Docker or Singularity is installed.

It is not required to clone or download the git repository in order to run the workflow. For more information on running EPI2ME Labs workflows visit our website.

Workflow options

To obtain the workflow, having installed nextflow, users can run:

nextflow run epi2me-labs/wf-cnv --help

to see the options for the workflow.

Example command (BAM):

nextflow run epi2me-labs/wf-cnv --bam <PATH_TO_BAM> --bin_size <BIN_SIZE>

Example command (FASTQ):

nextflow run epi2me-labs/wf-cnv --fastq <PATH_TO_FASTQS> --reference <PATH_TO_REFERENCE> --bin_size <BIN_SIZE>

The FASTQs for three test samples are available here and can be used with the the accompanying sample sheet from here.

Example command with test data:

nextflow run epi2me-labs/wf-cnv --fastq <PATH_TO_DOWNLOADED_FASTQ> --sample_sheet <PATH_TO_DOWNLOADED_SAMPLE_SHEET> --reference /path/to/hg38.fa.gz --bin_size 500

Workflow outputs

The primary outputs of the workflow include, per sample:

  • <SAMPLE_NAME>_wf-cnv-report.html: HTML CNV report containing chromosome copy summary, ideoplot, plot of read counts per bin, links to genes in detected CNVs, and QC data: read length histogram, noise plot (noise as a function of sequence depth) and isobar plot (median read counts per bin shown as a function of GC content and mappability)
  • <SAMPLE_NAME>.stats: Read stats
  • BAM/<SAMPLE_NAME>.bam: Alignment of reads to reference (FASTQ input)
  • BAM/<SAMPLE_NAME>.bam.bai: BAM index (FASTQ input)
  • qdna_seq/<SAMPLE_NAME>_plots.pdf: QDNAseq-generated plots
  • qdna_seq/<SAMPLE_NAME>_raw_bins.bed: BED file of raw read counts per bin
  • qdna_seq/<SAMPLE_NAME>_bins.bed: BED file of corrected, normalised, and smoothed read counts per bin
  • qdna_seq/<SAMPLE_NAME>_calls.vcf: VCF file of CNV calls

Useful links

Reference

Scheinin I, Sie D, Bengtsson H, van de Wiel MA, Olshen AB, van Thuijl HF, van Essen HF, Eijk PP, Rustenburg F, Meijer GA, Reijneveld JC, Wesseling P, Pinkel D, Albertson DG, Ylstra B. DNA copy number analysis of fresh and formalin-fixed specimens by shallow whole-genome sequencing with identification and exclusion of problematic regions in the genome assembly. Genome Res. 2014 Dec;24(12):2022-32. doi: 10.1101/gr.175141.114. Epub 2014 Sep 18. PMCID

wf-cnv's People

Contributors

cjw85 avatar mattdmem avatar sarahjeeeze avatar vlshesketh avatar samstudio8 avatar nrhorner avatar amblina avatar julibeg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.