Coder Social home page Coder Social logo

pandora's Introduction

Pandora

Identification and Discovery of Tumor Associated Microbes via RNAseq

Introduction

Pandora is a multi-step pipeline to find pathogen sequences in RNAseq data. It includes modules for host separation, assembly, blasting contigs, and orf discovery. As input, Pandora takes paired fastq files; as output, it produces a report.

Dependencies

The following programs must be in your PATH:

Pandora depends on the following Python modules:

Workflow

To accomplish diverse tasks, Pandora has various subcommands (like, say, the program git). The primary subcommand is scan, which is a pipeline comprising the following steps:

  1. Subtraction of reads mapping to host genome
  2. De-Novo assembly of remaining reads
  3. BLAST of assembled contigs
  4. ORF search in contigs of unknown origin
  5. Filter and parse blast results into tidy human-readable report

The aggregate subcommand [...].

Additional Files

Pandora requires various references and annotation files.

For scan step 1, please provide:

  • a host genome indexed for STAR
  • a host genome indexed for bowtie2
  • (optional) a gtf describing the genes of the host

For scan step 3, please provide:

  • the BLAST nucleotide collection nt database

For scan step 4, you can optionally provide:

  • the BLAST protein collection nr database

For scan step 5, you can optionally provide:

  • a text file of "blacklist" non-pathogen taxids for filtering. If you do not provide one, the script will use resources/blacklist.txt by default. This list contains any taxid children of the nodes chordata (Taxonomy ID: 7711) or "other sequences" (Taxonomy ID: 28384)

Because there are a considerable number of files involved, you can specify their paths with a configuration file instead of command line flags. See pandora.config.txt for example formatting. Note that options specified as flags take precedence over options specified via the configuration file.

Usage Examples

pandora.py scan -id patient1 -r1 mate_1.fastq.gz -r2 mate_2.fastq.gz --gzip --refstar /path/ref/STAR --refbowtie /path/ref/bowtie/hg19 -db /path/ref/blastdb/nt

Here is an example command using a configuration file:

pandora.py scan -id patient1 -r1 mate_1.fastq.gz -r2 mate_2.fastq.gz --gzip --verbose -c pandora.config.txt

Notes

Currently, Pandora makes use of the Oracle Grid Engine by default. The reason for this is that blast is computationally intensive, embarrassingly parallelizable, and lends itself very nicely to cluster computing. You can turn this off with the --noSGE flag, but blast will be very slow.

Note that RNA-seq enriched for poly-A transcripts will miss prokaryotic pathogens.

Status: Active Development

pandora's People

Contributors

szairis avatar ioanfilip2 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.