Coder Social home page Coder Social logo

vap's Introduction

(V)ariant (A)nalysis (P)ipeline

Thank you for your interest in using the Variant Analysis Pipeline. VAP is a comprehensive workflow for reference mapping and variant detection of genomic and transcriptomic reads using a suite of bioinformatics tools.

Article Source:

Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data Adetunji MO, Lamont SJ, Abasht B, Schmidt CJ (2019) Variant analysis pipeline for accurate detection of genomic variants from transcriptome sequencing data. PLOS ONE 14(9): e0216838. https://doi.org/10.1371/journal.pone.0216838

Bioinformatic tools

Bioinformatic tools are grouped based on sequencing reads

Genomic Sequencing

  • BOWTIE2
  • BWA

Transcriptomic Sequencing

  • TOPHAT2
  • STAR (2-PASS)
  • HISAT2

Variant Calling (for both Genomic/Transcriptomic Sequencing)

  • PICARD + GATK HaplotypeCaller
    • sort, addreadgroups, markduplicates using Picard Tools.
    • split cigar reads using GATK from Transcriptomic Sequencing reads.
    • variant detection using GATK.

N.B. : parameters of all tools are set to default.

Software used to design the VAP workflow are:

Software Version
TopHat2 2.1.1
HiSAT2 2.1.0
STAR 2.5.2b
SAMtools 1.4.1
Picard tools 2.13.2
GATK 3.8
BWA-mem 0.7.17
BOWTIE2 2.3.5.1

Current pipeline is not compatible with GATK v4

Contact maintainer to make custom changes to the different tools

Things to be aware of

Job File

  • change config_job.file file with settings or renamed as required.
  • If parameters are not needed, they must be either removed or changed to false
    • Needed workflows (prefix: run) must be change to true (case-sensitive) e.g : SAM = false (this means there is no sam file) else input the file directory /path/to/samfiles/*sam e.g : runTopHAT = true (this means the pipeline should run TopHAT2)

Indexes for Assembly tools and Variant Calling tools

Before running the pipeline. Create indexes for the different assemblers specified REFERENCE GENOME INDEX SYNTAXS:

  • GATK :
	java -jar <picard directory>/picard.jar CreateSequenceDictionary R=<reference.fa> O=<reference.dict>
  • HISAT :
	hisat2-build <reference.fa> <path to reference.fa>/<index_name>
  • BOWTIE/TOPHAT :
	bowtie2-build <reference.fa> <path to reference.fa>/<index_name>
  • BWA :
	bwa index <reference.fa> 

N.B. For easy use make sure all <index_name> should be the same and stored in the <reference.fa> directory

Downstream Merge and Filter Step (runMergeFilter)

The downstream step performs the following:

  1. Merge SNPs from all variant calling tools initially specified to execute (TopHAT2/HiSAT2/STAR or BOWTIE/BWA).
  2. Pre-set filtering criteria using GATK-VariantFiltration tool.
    1. ReadRankPosSum (RRPS) < -8
    2. Quality by depth (QD) < 5
    3. Read depth (DP) < 10
    4. Fisher’s exact test p-value (FS) > 60
    5. Mapping Quality (MQ) < 40
    6. SnpCluster (3 SNPs in 35bp)
    7. Mann-Whitney Rank-Sum (MQRankSum) < -12.5
  3. Exploratory statistics of all variant files.

To run workflow

perl VariantAnalysisPipeline.pl -c config_job.file

vap's People

Contributors

modupeore avatar

Stargazers

 avatar  avatar

Watchers

 avatar

vap's Issues

Can this tool be used to analyse human data?

Hi,

I find your tool relevant to my RNAseq data even though I have human RNAseq data. Can I use it fo the analysis of my data or it is purely meant for the use with non-human data as stated in the paper?

Problem to deal with NCBI SRA data

Dear Dr. Adetunji,

Recently, I'm interested and using your pipeline of Variant analysis pipeline (VAP). But I met some problems.

The pipeline say:

mkdir: cannot create directory ‘/fastqc’: Permission denied
mkdir: cannot create directory ‘/bowtie’: Permission denied
mkdir: cannot create directory ‘/bowtie’: Permission denied
mkdir: cannot create directory ‘/MERGE_FILTER’: Permission denied
ls: cannot access /newlustre/home/yangliandong/03.pipeline/00.WGSpipeline/VAP/test/out//*/*/*snp.vcf: No such file or directory
cat: /newlustre/home/yangliandong/03.pipeline/00.WGSpipeline/VAP/test/out/tmp/-merge.txt.notice.log: No such file or directory

The command, log error file and config files are attached. Can you help me solve this problems? Thanks very much.

Best wishes,
Liandong

Warnings on VariantFiltration

Hi, I've been running this pipeline to test it on a human RNA sample (it works really nicely using the tool versions you developed the pipeline on), and in the final merge and select variants step, I'm seeing a lot of lines like the following:

WARN  03:22:38,287 Interpreter - ![0,2]: 'QD < 5.0;' undefined variable QD
WARN  03:22:38,288 Interpreter - ![0,14]: 'ReadPosRankSum < -8.0;' undefined variable ReadPosRankSum
WARN  03:22:38,288 Interpreter - ![0,2]: 'MQ < 40.0;' undefined variable MQ
WARN  03:22:38,288 Interpreter - ![0,9]: 'MQRankSum < -12.5;' undefined variable MQRankSum
WARN  03:22:38,288 Interpreter - ![0,2]: 'FS > 60.0;' undefined variable FS

This arises from the variables not being defined during HaplotypeCaller (GATK v3.8-0-ge9d806836) I think.

Is this a problem? I've noticed that the filtered vcf does not contain any filter lines referring to these filters.

Example of multiple paired-end fastq samples in config_job file

Hi!

This looks like it will greatly simplify my workflows. But could you give an example how I would specify multiple paired-end fastq files? For instance I have the following files I'd like to analyse:

AA3.r1.fq.gz
AA3.r2.fq.gz
CVS.r1.fq.gz
CVS.r2.fq.gz
TRR.r1.fq.gz
TRR.r2.fq.gz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.