Coder Social home page Coder Social logo

bdolmo / grapes Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 95.12 MB

Integrated analysis of Structural and Copy Number Variants on clinical targeted sequencing data

License: GNU General Public License v3.0

Perl 41.64% Makefile 0.12% C++ 23.26% Dockerfile 0.22% CSS 0.26% R 0.83% C 2.77% HTML 30.91%
cnv gene-panels structural-variation structural-variants

grapes's Introduction

What is GRAPES?

GRAPES stands for "Genomic Rearrangement Analysis of Panel Enrichment Sequencing". It is yet another SV and CNV caller, but with new interesting features (Keep reading!)

Features

  • SV and CNV calling for targeted sequencing: using data from on-target, off-target and breakpoint-informative reads.
  • Clustering of samples with highly correlated read depth for improved CNV calling specificity.
  • Dynamic update of sample references using SQLite.

What's new in GRAPES 0.9.5 ? March-2021

  • New scoring metric to evaluate the confidence of each CNV.
  • Improved single-exon CNV detection.
  • Improved speed performance.
  • BAF extraction on the fly.
  • Several bug fixes.

Slide

Installation

Recommended Installation (Docker)

There is a docker repository available at:

docker pull bdolmo/grapes

And run commands with:

docker run -it bdolmo/grapes:latest GRAPES

Source installation

GRAPES will only work on Unix-based systems and supports only human genome. Before building and installing GRAPES, make sure you have available on path:

  • Perl
  • R (>= 3.3). In addition Rscript must be accessible
  • g++ compiler (>= 4.7)
  • Boost C++ library
  • openMP C++ API
  • BEDtools
  • SAMtools
  • Tabix
  • macs2
  • SQLite3
  • GNU core utils: wget, awk, sort, cat, grep, head, tail, sed, cut, paste, uniq, wc and mv. macOS users: need to install Homebrew (https://brew.sh/) and the latest XCode.

Perl modules

  • Parallel::ForkManager
  • Sort::Key::Natural
  • Statistics::Descriptive
  • JSON::MaybeXS
  • Excel::Writer::XLSX

You can install them through CPAN:

cpan Parallel::ForkManager Sort::Key::Natural Statistics::Descriptive JSON::MaybeXS Excel::Writer::XLSX

Then you can download and install the latest release:

 git clone --recursive https://github.com/bdolmo/GRAPES.git
 cd GRAPES
 ./INSTALL.PL

INSTALL.PL will compile all C++ code along with the required R packages for segmentation and plotting. In addition it will download both 75-mer and 100-mer mappability tracks for GRCh37 and GRCh38 and gnomaAD annotation files.

Commands

Targeted sequencing analysis: GRAPES wes

Workflow 1: Pooled Analysis (creates a reference using all available samples):

GRAPES wes --pooled <bam_dir> --outdir <output_dir> --bed <roi> --fasta <genome_fa> --genome <hg19/hg38> --all

(docker)

docker run -t -i \
-v $HOME/BAM_FOLDER:/bam_folder \
-v $HOME/BED_FOLDER:/bed_folder \
-v $HOME/GENOME_FOLDER:/genome_folder \
-v $HOME/OUTPUT_FOLDER:/output_folder \
-it bdolmo/grapes:latest GRAPES wes \
-all -pooled /bam_folder/ -b /bed_folder/targets.bed -f /genome_folder/genome.fa -g <hg19/hg38> -o /out_dir/ -t 4

Workflow 2: Case-Control analysis:

GRAPES wes --cases <bam_dir> --control <bam_dir> --outdir <output_dir> --bed <roi> --fasta <genome_fa> --genome <hg19/hg38> --all

(docker)

docker run -t -i \
-v $HOME/CASE_FOLDER:/case_folder \
-v $HOME/CONTROL_FOLDER:/control_folder \
-v $HOME/BED_FOLDER:/bed_folder \
-v $HOME/GENOME_FOLDER:/genome_folder \
-v $HOME/OUTPUT_FOLDER:/output_folder \
-it bdolmo/grapes:latest GRAPES wes \
-all -cases /case_folder/ -controls /control_folder/ -b /bed_folder/targets.bed -f /genome_folder/genome.fa -g <hg19/hg38> -o /out_dir/ -t 4
I/O:
-o,--outdir          STRING    Output directory
-f,--fasta          STRING    Output directory
-g,--genome      STRING    Genome reference in FASTA format
-r,--genome_version  STRING    Genome version. Choose: hg19, hg38 (default = hg19)
-b,--bed             STRING    Regions file in BED format
-t,--threads         INT       Number of CPUs (default = 1)
Options:
--all                      Perform all steps below
--refdir                   Input directory where DB references will be stored.
--breakpoint               Perform Breakpoint analysis
  --nobreakpoint           Turn off breakpoint analysis
--extract                  Extract Depth, GC and Mappability
--offtarget                Perform Off-target analysis
  --noofftarget            Turn off offtarget analysis
--buildref                 Build a reference from a pool of samples
--callcnv                  Segment and call CNVs
  --nocallcnv              Turn off CNV calling
--normalize                Normalize read depth. Choose from 'median', 'PCA'. (default='median')
--plotsingleexon           Plot single exon CNVs
  --noplotcnv              Turn off CNV plotting
--plotlargecnv             Plot segmented CNVs
  --noplotcnv              Turn off CNV plotting
--plotscatter              Plot genome-wide CNV scatter plot
--vaf                      Include Variant-Allele Frequency (VAF) analysis
  --novaf                  Turn off VAF analysis
--samtools                 Default if --vaf set. Perform variant call with samtools
--freebayes                if --vaf set, perform variant call with freebayes
--annotate                 Annotate VCF
  --noannotate             Turn off VCF annotation
--filtervcf                Filter low qual VCF entries
  --nofiltervcf            Turn off VCF filtering
--reporthtml               Write results to an HTML file (Only for gene panels)
  --noreporthtml           Turn off report HTML creation
--filterdiscordantonly     Filter discordant-only SV predictions (default = true)
  --nofilterdiscordantonly Turn off discordant-pair only filtering
--verbose                  Print sub-command messages
Tuning parameters:
 --mincorr            FLOAT   Minimum pairwise-correlation to build a reference set (default = 0.91)
 --minrefsize         INT     Default minimum number of samples to build a single baseline (default = 2)
 --maxrefsize         INT     Default maximum number of samples to build a single baseline (default = 15)
 --minzscore          FLOAT   Minimum Z-score required to output a CNV prediction (default = 2.58)
 --pcavariance        FLOAT   Variance to remove when normalizing by PCA (default = 0.7)
 --lowerdelcutoff     FLOAT   Lower-bound deletion cutoff ratio (default = 0.35)
 --upperdelcutoff     FLOAT   Upper-bound deletion cutoff ratio (default = 0.71)
 --lowerdupcutoff     FLOAT   Lower-bound duplication cutoff ratio (default = 1.24)
 --minofftargetreads  INT     Default minimum number of offtarget reads required to trigger off-target analysis (default = 1e6)
 --minofftargetsd     FLOAT   Default minimum number of std.dev from off-target rartios trigger off-target analysis	(default = 0.2)
 --minsvsize          INT     Minimum SV size to report a breakpoint call (default = 15)
 --maxsvsize          INT     Maximum SV size to report a breakpoint call (default = 5000000)
 --mindiscordants     INT     Minimum number of discordant read pairs (default = 5)
 --mindiscordantssd   INT     Minimum std.deviations from the mean insert size to consider discordant pairs (default = 10)
 --breakreads         INT     Minimum number of break reads (default = 5)

SV annotation: GRAPES annotate

GRAPES annotate --input_vcf <vcf> --output_dir <output_dir> --r_overlap <reciprocal_overlap>
Tuning parameters:
-i,--input_vcf  STRING  Input VCF file
-o,--output_dir STRING  Output directory (default=.)
-l,--r_overlap  FLOAT   Reciprocal overlap fraction (default=0.5)

grapes's People

Contributors

bdolmo avatar

Watchers

 avatar  avatar

Forkers

mel-lina

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.