Coder Social home page Coder Social logo

toil-rnaseq-sc's Introduction

University of California, Santa Cruz Genomics Institute

Guide: Running the Single Cell RNA-seq Pipeline using Toil

This guide attempts to walk the user through running this pipeline from start to finish. If there are any questions please contact John Vivian ([email protected]). If you find any errors or corrections please feel free to make a pull request. Feedback of any kind is appreciated.

Overview

RNA-seq fastqs generated from 10x Chromium single-cell experiments are quantified to produce a gene by cell matrix. Additional QC plots are generated

This pipeline produces a tarball (tar.gz) file for a given sample that contains n subdirectories:

The output tarball is prepended with the UUID for the sample (e.g. UUID.tar.gz).

Dependencies

This pipeline has been tested on Ubuntu 14.04, but should also run on other unix based systems. apt-get and pip often require sudo privilege, so if the below commands fail, try prepending sudo. If you do not have sudo privileges you will need to build these tools from source, or bug a sysadmin about how to get them (they don't mind).

General Dependencies

1. Python 2.7
2. Curl         apt-get install curl
3. Docker       http://docs.docker.com/engine/installation/

Python Dependencies

1. Toil         pip install toil
2. S3AM         pip install --pre s3am (optional, needed for uploading output to S3)

System Dependencies

Installation

Inputs

The CGL RNA-seq pipeline requires an index file in order to run. This file is hosted on Synapse and can be downloaded after creating an account which takes about 1 minute and is free.

  • Register for a Synapse account
  • Either download the samples from the website GUI or use the Python API
  • pip install synapseclient
  • python
    • import synapseclient
    • syn = synapseclient.Synapse()
    • syn.login('[email protected]', 'password')
    • Get the Kallisto index reference
      • syn.get('syn5889216', downloadLocation='.')

All samples and inputs must be submitted as URLs with support for the following schemas: http://, file://, s3://, ftp://.

Samples consisting of tarballs with fastq files inside must follow the file name convention of ending in an R1/R2 or _1/_2 followed by .fastq.gz, .fastq, .fq.gz or .fq..

General Usage

Type toil-rnaseq to get basic help menu and instructions

  1. Type toil-rnaseq-sc generate to create an editable manifest and config in the current working directory.
  2. Parameterize the pipeline by editing the config.
  3. Fill in the manifest with information pertaining to your samples.
  4. Type toil-rnaseq-sc run [jobStore] to execute the pipeline.

Example Commands

Run sample(s) locally using the manifest

  1. toil-rnaseq-sc generate
  2. Fill in config and manifest
  3. toil-rnaseq-sc run ./example-jobstore

Toil options can be appended to toil-rnaseq run, for example: toil-rnaseq-sc run ./example-jobstore --retryCount=1 --workDir=/data

For a complete list of Toil options, just type toil-rnaseq run -h

Run a variety of samples locally

  1. toil-rnaseq-sc generate-config
  2. Fill in config
  3. toil-rnaseq-sc run ./example-jobstore --retryCount=1 --workDir=/data --samples \ s3://example-bucket/sample_1.tar file:///full/path/to/sample_2.tar https://sample-depot.com/sample_3.tar

Example Config

kallisto-index: s3://cgl-pipeline-inputs/rnaseq_cgl/kallisto_hg38.idx
output-dir: /data/my-toil-run
ssec: 
ci-test:

Distributed Run

To run on a distributed AWS cluster, see CGCloud for instance provisioning, then run toil-rnaseq-sc run aws:us-west-2:example-jobstore-bucket --batchSystem=mesos --mesosMaster mesos-master:5050 to use the AWS job store and mesos batch system.

Methods

Tools

All tool containers can be found on our quay.io account.

Reference Data

Tool Options

toil-rnaseq-sc's People

Contributors

jvivian avatar tpesout avatar

Stargazers

 avatar

Watchers

Benedict Paten avatar James Cloos avatar  avatar David Haussler avatar  avatar

toil-rnaseq-sc's Issues

Add consolidation step

run_single_cell should launch a childJob for the graphing and a followOn job that consolidates the outputs of the graphing step and the primary job into a single output tarball.

You'll need to make use of promises to pass values from the output of the child job to the follow on.

Add tests

Model tests after toil-rnaseq OR, more preferably, you can just write test functions (example) since we'll be using py.test as the test runner and I really dislike python's native unittest framework.

A nice convenience is we can piggyback off of the pachterlab test data so we don't need to re-host anything in S3, just git clone it when running the tests.

Update README

Once we have CI setup, we'll be pushing this project to PyPi and will want an up-to-date README.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.