Coder Social home page Coder Social logo

allbiotc2's Introduction

SV-Autopilot

Structural Variation AUTOmated PIpeLine Optimization Tool

by : Wai Yi Leung, Tobias Marschall, Laurent Falquet, Yogesh Paudel, Hailiang Mei, Alex Schoenhuth and Tiffanie Yael Moss

This repository is used to store scripts written during the hackathon of ALLBio Testcase 2.

We aim at providing :

  • a pipeline for automated Structural variation calling
  • an automated approach for benchmarking (new) SV tools.

More information about the project can be found at the following websites:

ALLBio Bioinformatics, Testcase#2, Google site, members only!

How to install

Grab a copy of this repository from GitHub to your home folder and store this in allbiotc2:

cd ~
git clone https://github.com/ALLBio/allbiotc2.git
cd allbiotc2/
make install

The make install command will do a system-wide install. This step requires sudo rights.

Installation instructions for sysadmins (advanced)

Please take a closer look in the following repository where the installation scripts are located. These scripts were used to install the workshop-ready and production-ready virtual machine.

https://github.com/ALLBio/allbiovm

Comments are welcome via the ticketing system from Github.

Preprocessing reference VCF (optional)

If reference calls are provided in SDI format, the following procedure can be followed to convert from SDI to VCF.

make -f ../scripts/Makefile \
    REFERENCE_VCF=~/myworkdir/ref_all.complete.vcf \
    SDI_FILE=~/myworkdir/ler_0.v7c.sdi \
    preprocess

Installing the software

The software for the pipeline is placed into one central location in the following setup:

allbio@workbench:/virdir/Scratch/software$ tree -L 1
.
├── bowtie2-2.1.0
├── breakdancer
├── bwa-0.7.4
├── circos-0.63-4
├── clever-sv
├── delly_v0.0.9
├── dwac-seq0.7
├── FastQC
├── gasv
├── picard-tools-1.86
├── pindel
├── PRISM_1_1_6
├── samtools-0.1.19
├── sickle-master
└── SVDetect_r0.8b

Running the pipeline

Configuration can be done in the conf.mk and upon invocation of the pipeline by passing them via the commandline.

The most important and required variables are:

  • PROGRAMS: Path to the directory where the programs are installed
  • PYTHON_EXE: Path to the PYTHON executable, defaults to python (system distributed version)
  • REFERENCE_DIR: Path to the reference
  • REFERENCE_VCF: Full path to the VCF file with reference SV calls for benchmarking
  • FASTQ_EXTENSION: Filename extentension of the FastQ files
  • PEA_MARK: Filenaming of the left read of FastQ: sample-PEA_MARK.FASTQ_EXTENSION
  • PEB_MARK: Filenaming of the right read of FastQ: sample-PEB_MARK.FASTQ_EXTENSION
  • *_THREADS: Set the amount of cores to used by the programs.

Example invocation of the pipeline:

THREADS=8

make -f ../scripts/Makefile \
    PROGRAMS=/virdir/Scratch/software\
    REFERENCE_DIR=../input/reference_tair9 \
    FASTQC_THREADS=$THREADS \
    BWA_OPTION_THREADS=$THREADS \
    PEA_MARK=.1 \
    PEB_MARK=.2 \
    FASTQ_EXTENSION=fastq \
    REFERENCE_VCF=/virdir/Backup/reads_and_reference/vcf_reference/ref_all.complete.vcf 

Example setup of pipeline directories

allbio@workbench:/opt/allbio/runs/synthetic_run$ tree -L 1
.
├── input
│   ├── reference_tair10
│   │   ├── bowtie2
│   │   ├── bwa
│   │   ├── reference.fa
│   │   └── reference.fa.fai
│   ├── sim-reads_1.fastq
│   ├── sim-reads_2.fastq
│   ├── sim-reads.409_10.1.fastq
│   ├── sim-reads.409_10.2.fastq
│   ├── sim-reads.511_10.1.fastq
│   ├── sim-reads.511_10.2.fastq
├── log
├── run_integrationtest
│   ├── bd.cfg
│   ├── comparison.tex
│   ├── run.sh
│   ├── sim-read-511_10.1.fastq -> ../input/sim-reads.511_10.1.fastq
│   ├── sim-read-511_10.1.filtersync.stats
│   ├── sim-read-511_10.1.singles.fastq
│   ├── sim-read-511_10.1.trimmed.fastq
│   ├── sim-read-511_10.2.fastq -> ../input/sim-reads.511_10.2.fastq
│   ├── sim-read-511_10.2.trimmed.fastq
│   ├── sim-read-511_10.bam
│   ├── sim-read-511_10.bam.bai
│   ├── sim-read-511_10.bd.vcf
│   ├── sim-read-511_10.breakdancer
│   ├── sim-read-511_10.delly
│   ├── sim-read-511_10.delly.vcf
│   ├── sim-read-511_10.flagstat
│   ├── sim-read-511_10.gasv
│   ├── sim-read-511_10.gasv.vcf
│   ├── sim-read-511_10.pindel
│   ├── sim-read-511_10.pindel.vcf
│   ├── sim-read-511_10.prism
│   ├── sim-read-511_10.prism.vcf
│   ├── sim-read-511_10.raw_fastqc
│   ├── sim-read-511_10.sam
│   ├── sim-read-511_10.trimmed_fastqc
│   └── sim-read-511_10.unsort.bam
└── scripts
    └── Makefile -> ~/allbiotc2/Makefile

allbiotc2's People

Contributors

timtebeek avatar wyleung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

allbiotc2's Issues

Error message

When using the virtual machine provided, the process stops with the following message:

make: *** No rule to make target /data/bench/runs/neanderthal/1.sam', needed byalignment'. Stop.

Any clues concerning what might be happening?

Thanks in advance!

Issue installing allbiotc2

Dear Sir,

i downloaded the allbiotc2 using git. When i try with make install, it gives below errors. Can you please help me to install this software

ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_2.fastq: No such file or directory
Install python packages for the pipeline
Reading package lists...
Building dependency tree...
Reading state information...
python-biopython is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_2.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_2.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_2.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_2.fastq: No such file or directory
/home/sukesh/allbiotc2/pindel/Makefile:91: warning: overriding commands for target install' /home/sukesh/allbiotc2/breakdancer/Makefile:57: warning: ignoring old commands for target install'
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_2.fastq: No such file or directory
/home/sukesh/allbiotc2/clever/Makefile:62: warning: overriding commands for target install' /home/sukesh/allbiotc2/pindel/Makefile:91: warning: ignoring old commands for target install'
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_1.fastq: No such file or directory
ls: cannot access *_2.fastq: No such file or directory
Reading package lists...
Building dependency tree...
Reading state information...
libboost-all-dev is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 16 not upgraded.
sudo: pip: command not found
make[1]: *** [install] Error 1
make: *** [install] Error 2

Thanks
Sukesh

makefile improvements

Dear Wai,

I see the following improvements to bring the Makefile to a more user-friendly tool.

a) add a PATH for the reads in the config file (as for the reference), and then create the simlinks automatically within the Makefile
b) calculate all the indexes for the reference automatically (bwa, samtools etc…) within the Makefile

These 2 automatic steps would already make life easier. ;-)
Laurent

sample.gasv.in: No such file or directory

run.sh doesn't create file sample.gasv.in.txt

List of files in my working directory:

drwxrwxr-x 6 allbio allbio 4096 Mar 12 15:20 ./
drwxr-xr-x 4 allbio allbio 4096 Mar 12 14:02 ../
-rw-rw-r-- 1 allbio allbio 114 Mar 12 14:22 fake.vcf
drwxrwxr-x 2 allbio allbio 12288 Mar 12 15:03 .log/
-rwxr-x--x 1 allbio allbio 367 Mar 12 14:17 run.sh*
-rw-rw-r-- 1 allbio allbio 63287543 Mar 12 14:34 s_8.bam
-rw-rw-r-- 1 allbio allbio 2120 Mar 12 14:34 s_8.bam.bai
-rw-rw-r-- 1 allbio allbio 189 Mar 12 14:34 s_8.bd.cfg
-rw-rw-r-- 1 allbio allbio 382 Mar 12 14:34 s_8.flagstat
drwxrwxr-x 2 allbio allbio 4096 Mar 12 15:03 s_8.gasv/
-rw-r--r-- 1 allbio allbio 116087446 Mar 12 14:01 s_8.MHC_1.fastq
-rw-rw-r-- 1 allbio allbio 212 Mar 12 14:20 s_8.MHC_1.filtersync.stats
-rw-rw-r-- 1 allbio allbio 0 Mar 12 14:20 s_8.MHC_1.singles.fastq
-rw-rw-r-- 1 allbio allbio 0 Mar 12 14:20 s_8.MHC_1.trimmed.fastq
-rw-r--r-- 1 allbio allbio 116087446 Mar 12 14:01 s_8.MHC_2.fastq
-rw-rw-r-- 1 allbio allbio 0 Mar 12 14:20 s_8.MHC_2.trimmed.fastq
drwxrwxr-x 4 allbio allbio 4096 Mar 12 14:20 s_8.raw_fastqc/
-rw-rw-r-- 1 allbio allbio 279057452 Mar 12 14:33 s_8.sam
drwxrwxr-x 2 allbio allbio 4096 Mar 12 14:20 s_8.trimmed_fastqc/

Start with .bam files?

Hello.

I already have alignments and would prefer not to go back to the fastq files. Is it possible to run the pipeline on .bam files?

PRISM SV runs on single chromosome only

The pipeline currently feeds/starts PRISM with a complete reference genome. PRISM however does the analysis only per chromosome.

When it is started with a full reference genome, PRISM will produce segfaults (due memory limit in python with detect_sv.py)

Proposed solution:

Initiate the analysis with PRISM by first dividing the genome in individual chromsomes/contig-files.
Start analysis per contig
Merge results from all contig analyses
Report back to main pipeline.

VCF files

Hi Wai,

All generated VCF files should be compatible with the VCF format 4.1.
In addition I would also filter out all the calls that are below 20bp since they create a lot of noise in the merged VCF and they are beyond our scope of large structural variants.

Cheers
Laurent

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.