Coder Social home page Coder Social logo

ivon's Introduction

IVON: Integrated Viral Ontology Network

alt text

What is IVON?

IVON is a systematic classification system for viruses. By relying on only the genomic sequence of the query virus and its similarity to known genes (present in custom databases), IVON can assign the query virus a barcode. Each position within the barcoe represents a distinct viral feature from the type of polymerase present to method of interaction with its host. This sequence similarity based approach removes the methodological issues associated with current viral taxonomy such as morphology and host type and instead provides viral groupings based entirely on feature similarity.

Installing

IVON is not a program or tool but a framework for the annotation of viruses, therefore no installation is required. To run IVON download this repository and the Example can be run straight away with your local sequence aligner of choice.

Before running IVON on novel genomes we suggest you running the Database_update.sh script in the Databases folder using the code sh Database_update.sh. This will download up to date versions of each of the databases IVON requires that cannot be provided via GitHub due to size limits. We also suggest citing the date of download when publishing with IVON so that the exact databases can be retrieved and experiments replicated.

Barcode value table

This table identifies which databases require a match for each of the barcode values to be assigned. This therefore allows researchers to better understand the IVON barcoding system.

An example of a barcode is 1.0.0.0.1.0.12.0 which is the barcode for HIV. The first value indicates the presence of an RNA-directed DNA polymerase complex. Each 0 indicates the lack of any identified protein associated with that process, in this case that includes integrase, capsid proteins, tail proteins, host entry and host exit processes. The 1 in the fifth position indicates the presence of a viral envelope and the 12 in the seventh position identifies HIV-1 as interacting with its hsot by modulating host morphology or physiology, negative regulation of viral processes and regulating the hosts viral response.

Databases

Each of the IVON databases was originally named after a key viral process in its lifecycle, grouped into 8 catagories making up the barcodes. Each viral process was original named based on its GO term, however inconsistancy between databases has led to alteration to these terms being used for each of the databases e.g. DNA-directed DNA polymerase complex has been replaced with DNA-directed DNA polymerase activity. This allows for utilisation of the UNIPROT database for updating of databases.

IVON annotation

For the annotation of a viral genome against the IVON provided databases we suggest using the python code provided below. This script will annotate yours genome using DIAMOND (https://github.com/bbuchfink/diamond), however the script can easily be adapted to run your local sequence aligner (LSA) of choice.

Before its use, you must convert each of the FASTA database files into the database format for your LSA of choice, in this example DIAMOND and so *.dmnd files are used within a folder called DIAMOND_db. With DIAMOND this can be achieved using the diamond makedb --in DATABASE.fa -d DATABASE. Your genome must also have been passed through a gene predictor such as Prodigal (https://github.com/hyattpd/Prodigal). This will provide a fasta file containing coding regions that can be used as input by IVON.

import glob
import sys
import subprocess

novel_genome = 'GENOME_CDS.fa'

for cfile in glob.glob('DIAMOND_db/*.dmnd'):
    print 'Comparing genome to... ' + cfile.split('/')[1].replace('.dmnd','')
    sys.stdout.flush()
    needed = cfile.replace('.dmnd','')
    bashCommand =  'diamond blastx -d ' + needed + ' -q ' + novel_genome + ' -a ' + novel_genome + '-OUTPUT/' + cfile.split('/')[1].split('.')[0] + '-RAN '
    process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
    output, error = process.communicate()
    #print output
    
    bashCommand = 'diamond view -a ' + novel_genome + '-OUTPUT/' + cfile.split('/')[1].split('.')[0] + '-RAN -o ' + novel_genome + '-OUTPUT/' + cfile.split('/')[1].split('.')[0] + '-RAN.m8'
    process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
    output, error = process.communicate()    
    

Once the genomes coding regions have been annotated against the IVON databases IVON_barcoder.py can be run to assign the viral genome an IVON barcode. IVON_barcoder.py can be run as shown below;

python IVON_barcoder.py -i GENOME_CDS.fa -f GENOME_CDS.fa-OUTPUT

This will then provide a barcode such as 1.0.0.0.1.0.12.0 (barcode for HIV) that can be compared to other barcoded viral species using the NCBI_viral_species_barcode.txt file.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.