2015-fish-dna Assembly Pipeline

About

A processing pipeline developed for an Illumina short-read de novo assembly project of an African cichlid. It contains configuration and sample metadata, a pipeline implementation built with pydoit, and analyses in IPython notebooks.

The data was generated by Russell Neches, and the pipeline in this repository was developed by Camille Scott. Both are PhD students at UC Davis.

Tutorial

Dependencies

We recommend anaconda for managing python dependencies, or that you use virtualenv to sandbox your environment. The instructions will be given for anaconda, but are applicable to virtualenv installs as well. We will assume you are running on a debian system; dependencies which are hosted in debian repositories may need to be installed manually on other platforms.

First, get our python dependencies: pydoit, which is used to manage task dependencies; jinja2, a templating library; khmer, a library for k-mer and short-read analysis; and screed, a FASTA/Q parsing library

pip install pydoit jinja2 khmer screed

Install fastqc, a program for evaluating short-read sample quality:

sudo apt-get install fastqc

Install Trimmomatic. Trimmomatic is available in Ubuntu PPAs, but many HPC environments install it in a non-standard way. So, we will install it manually. First, download the archive and unpack it:

mkdir -p $HOME/bin
cd $HOME/bin
curl -O http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.33.zip
unzip Trimmomatic-0.33.zip

Now, export the Trimmomatic directory as an environment variable. On some HPC systems, the $TRIM variable may already be set, depending on local configuration. For example, on the MSU HPCC, this is automatically set by loading the trimmomatic module, and this step can be skipped:

export TRIM=$HOME/bin/Trimmomatic-0.33

Install kmergenie. First download and unpack it:

cd $HOME/bin
curl -O http://kmergenie.bx.psu.edu/kmergenie-1.6982.tar.gz
tar -xvzf kmergenie-1.6982.tar.gz

kmergenie requires R; if you do not have it, it can be installed with:

sudo apt-get install r-base-core

Otherwise, compile and install it:

cd kmergenie-1.6982/
sudo make install

Install velvet assembler:

sudo apt-get install velvet

Running the Pipeline

Execution is managed with the awesome pydoit library. To launch with test data, use:

./pipeline -n 4 --resources test/resources.json --config test/config.json --work-dir _test/

Where -n 4 is the number of tasks to run in parallel. To run the main pipeline:

./pipeline -n 4

ie, the default paramaters run the main pipeline. This downloads the samples from a remote server -- they are big! So, it may take a while. Assuming you have all the dependencies installed correctly, when it is finished, you should find a file named {DATE-TIME}-velvet.sh, which is a PBS script for submitting the velvet job. You can also edit this and remove all the #PBS directives and the module loading to run it as a normal shell script. Otherwise, submit it with:

qsub {DATE-TIME}-velvet.sh

camillescott / 2015-fish-dna Goto Github PK

2015-fish-dna's Introduction

2015-fish-dna Assembly Pipeline

About

Tutorial

Dependencies

Running the Pipeline

2015-fish-dna's People

Contributors

Stargazers

Watchers

Forkers

2015-fish-dna's Issues

Recommend Projects

Recommend Topics

Recommend Org