Coder Social home page Coder Social logo

voorloopnul / decouphage Goto Github PK

View Code? Open in Web Editor NEW
5.0 2.0 2.0 6 MB

The art of decorating a phage genome by gluing feature cutouts into it.

Home Page: https://labs.voorloop.com/decouphage/

License: MIT License

Dockerfile 1.87% Python 97.23% Shell 0.90%
annotation bacteria bioinformatics blast genome phages virology

decouphage's Introduction

Decouphage logo

Decouphage: the art of decorating a Phage genome by gluing feature cutouts into it.

As the name suggests decouphage is a tool designed to annotate phage genomes. It only external dependency is ncbi-blast+ everything else is optional.

Relevant branches

Table of contents

  1. Highlights
  2. Validation
  3. How can I use decouphage
  4. Installation
  5. Databases

Highlights

  • Can be easily installed in Linux or Mac computers. Only requirement is ncbi-blast+.
  • Can be extended with prodigal, but as default it uses phanotate for ORF calling.
  • Decouphage is fast, using a Macbook most phage genomes can be annotated in less than a minute.
  • Uses ncbi NR database containing non-identical sequences from GenBank CDS translations, PDB, Swiss-Prot, PIR, and PRF.
  • Allow manual curation using the web interface.

decouphage web interface

Validation

Decouphage validation was made in comparison to RAST(Rapid Annotation using Subsystem Technology), a tool that is often praised for its good Prokaryotic annotation capabilities.

Decouphage outperforms RAST when calling some of the most relevant product categories:

alt text

The CDS annotation agreement between Decouphage and RAST is high, reaching up to 94% for some products:

Enzyme Agreement rate with RAST
endonuclease 94%
exonuclease 58%
helicase 70%
hydrolase 73%
kinase 86%
ligase 94%
methyltransferase 65%
polymerase 76%
primase 78%
protease 85%
recombinase 28%
reductase 90%
synthase 84%
terminase 94%
transferase 60%

A precise comparison of product-to-position is difficult given differences in spelling, typos, synonyms, and interchangeable names, but the table above can give a good idea of the similarities.

To corroborate the surplus of annotations that decouphage achieves, the amount of "hypothetical protein" and "Phage protein" was also checked:

Product Decouphage Rast Agreement rate with RAST
hypothetical protein 3945 6302 53%
phage protein 0 1626 N/A1
Total products 9692 9692 N/A2
  1. Decouphage does not include products containing "phage protein" as they usually are a noise source.
  2. The genbank file generated by RAST was used as input for decouphage to ensure no difference in the number of CDS.

This table shows that Decouphage potentially assigns 2x more meaningful products than RAST when annotating a phage genome.

How can I use decouphage

Options

Usage: decouphage [OPTIONS] INPUT_FILE

Options:
  --prodigal             Use prodigal for orf calling instead of phanotate.
  -d, --db PATH
  -o, --output TEXT
  -t, --threads INTEGER  [default: 1]
  --tmpdir TEXT          Folder for intermediate files.
  --no_orf_calling       Annotate CDS from genbank file.
  --locus_tag TEXT       Locus tag prefix.
  --download_db          Download default database.
  -v, --verbose          More verbose logging for debugging purpose.
  --help                 Show this message and exit.

I want to discover and annotate a lot of ORFs

decouphage genome.fasta -o genome.gb

I want to use prodigal to find my genes

decouphage genome.fasta -o genome.gb --prodigal

I have a genbank with poor annotation and want more

In this mode decouphage will reuse the genbank ORFs and just run the annotation procedure.

decouphage genome.gbk -o genome.gb --no-orf-calling

Installation

You have multiple options to install and run decouphage:

Ubuntu

Install decouphage:

pip install decouphage

(Required) Install ncbi-blast+

apt install ncbi-blast+

(Optional) Install dependencies:

apt install prodigal trnascan-se

Docker

Run with docker (Already includes dependencies and databases):

docker run decouphage/decouphage

Databases

Decouphage database is derived from NCBI NR database clustered at 90% identity and 90% sequence length.

Downloading database

Download database to default location in $HOME/.decouphage/db/

decouphage --download_db

Making custom databases

Make blast database

makeblastdb -in database.fa -parse_seqids -blastdb_version 5 -dbtype prot

decouphage's People

Contributors

voorloopnul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.