Coder Social home page Coder Social logo

2015-fish-dna's Introduction

2015-fish-dna Assembly Pipeline

About

A processing pipeline developed for an Illumina short-read de novo assembly project of an African cichlid. It contains configuration and sample metadata, a pipeline implementation built with pydoit, and analyses in IPython notebooks.

The data was generated by Russell Neches, and the pipeline in this repository was developed by Camille Scott. Both are PhD students at UC Davis.

Tutorial

Dependencies

We recommend anaconda for managing python dependencies, or that you use virtualenv to sandbox your environment. The instructions will be given for anaconda, but are applicable to virtualenv installs as well. We will assume you are running on a debian system; dependencies which are hosted in debian repositories may need to be installed manually on other platforms.

First, get our python dependencies: pydoit, which is used to manage task dependencies; jinja2, a templating library; khmer, a library for k-mer and short-read analysis; and screed, a FASTA/Q parsing library

pip install pydoit jinja2 khmer screed

Install fastqc, a program for evaluating short-read sample quality:

sudo apt-get install fastqc

Install Trimmomatic. Trimmomatic is available in Ubuntu PPAs, but many HPC environments install it in a non-standard way. So, we will install it manually. First, download the archive and unpack it:

mkdir -p $HOME/bin
cd $HOME/bin
curl -O http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.33.zip
unzip Trimmomatic-0.33.zip

Now, export the Trimmomatic directory as an environment variable. On some HPC systems, the $TRIM variable may already be set, depending on local configuration. For example, on the MSU HPCC, this is automatically set by loading the trimmomatic module, and this step can be skipped:

export TRIM=$HOME/bin/Trimmomatic-0.33

Install kmergenie. First download and unpack it:

cd $HOME/bin
curl -O http://kmergenie.bx.psu.edu/kmergenie-1.6982.tar.gz
tar -xvzf kmergenie-1.6982.tar.gz

kmergenie requires R; if you do not have it, it can be installed with:

sudo apt-get install r-base-core

Otherwise, compile and install it:

cd kmergenie-1.6982/
sudo make install

Install velvet assembler:

sudo apt-get install velvet

Running the Pipeline

Execution is managed with the awesome pydoit library. To launch with test data, use:

./pipeline -n 4 --resources test/resources.json --config test/config.json --work-dir _test/

Where -n 4 is the number of tasks to run in parallel. To run the main pipeline:

./pipeline -n 4

ie, the default paramaters run the main pipeline. This downloads the samples from a remote server -- they are big! So, it may take a while. Assuming you have all the dependencies installed correctly, when it is finished, you should find a file named {DATE-TIME}-velvet.sh, which is a PBS script for submitting the velvet job. You can also edit this and remove all the #PBS directives and the module loading to run it as a normal shell script. Otherwise, submit it with:

qsub {DATE-TIME}-velvet.sh

2015-fish-dna's People

Contributors

camillescott avatar

Stargazers

Russell Neches avatar

Watchers

Russell Neches avatar James Cloos avatar  avatar

Forkers

ryneches

2015-fish-dna's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.