IVON is a systematic classification system for viruses. By relying on only the genomic sequence of the query virus and its similarity to known genes (present in custom databases), IVON can assign the query virus a barcode. Each position within the barcoe represents a distinct viral feature from the type of polymerase present to method of interaction with its host. This sequence similarity based approach removes the methodological issues associated with current viral taxonomy such as morphology and host type and instead provides viral groupings based entirely on feature similarity.
IVON is not a program or tool but a framework for the annotation of viruses, therefore no installation is required. To run IVON download this repository and the Example can be run straight away with your local sequence aligner of choice.
Before running IVON on novel genomes we suggest you running the Database_update.sh
script in the Databases
folder using the code sh Database_update.sh
. This will download up to date versions of each of the databases IVON requires that cannot be provided via GitHub due to size limits. We also suggest citing the date of download when publishing with IVON so that the exact databases can be retrieved and experiments replicated.
This table identifies which databases require a match for each of the barcode values to be assigned. This therefore allows researchers to better understand the IVON barcoding system.
An example of a barcode is 1.0.0.0.1.0.12.0
which is the barcode for HIV. The first value indicates the presence of an RNA-directed DNA polymerase complex. Each 0
indicates the lack of any identified protein associated with that process, in this case that includes integrase, capsid proteins, tail proteins, host entry and host exit processes. The 1
in the fifth position indicates the presence of a viral envelope and the 12
in the seventh position identifies HIV-1 as interacting with its hsot by modulating host morphology or physiology, negative regulation of viral processes and regulating the hosts viral response.
Each of the IVON databases was originally named after a key viral process in its lifecycle, grouped into 8 catagories making up the barcodes. Each viral process was original named based on its GO term, however inconsistancy between databases has led to alteration to these terms being used for each of the databases e.g. DNA-directed DNA polymerase complex has been replaced with DNA-directed DNA polymerase activity. This allows for utilisation of the UNIPROT database for updating of databases.
For the annotation of a viral genome against the IVON provided databases we suggest using the python code provided below. This script will annotate yours genome using DIAMOND (https://github.com/bbuchfink/diamond), however the script can easily be adapted to run your local sequence aligner (LSA) of choice.
Before its use, you must convert each of the FASTA database files into the database format for your LSA of choice, in this example DIAMOND and so *.dmnd
files are used within a folder called DIAMOND_db
. With DIAMOND this can be achieved using the diamond makedb --in DATABASE.fa -d DATABASE
. Your genome must also have been passed through a gene predictor such as Prodigal (https://github.com/hyattpd/Prodigal). This will provide a fasta file containing coding regions that can be used as input by IVON.
import glob
import sys
import subprocess
novel_genome = 'GENOME_CDS.fa'
for cfile in glob.glob('DIAMOND_db/*.dmnd'):
print 'Comparing genome to... ' + cfile.split('/')[1].replace('.dmnd','')
sys.stdout.flush()
needed = cfile.replace('.dmnd','')
bashCommand = 'diamond blastx -d ' + needed + ' -q ' + novel_genome + ' -a ' + novel_genome + '-OUTPUT/' + cfile.split('/')[1].split('.')[0] + '-RAN '
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
#print output
bashCommand = 'diamond view -a ' + novel_genome + '-OUTPUT/' + cfile.split('/')[1].split('.')[0] + '-RAN -o ' + novel_genome + '-OUTPUT/' + cfile.split('/')[1].split('.')[0] + '-RAN.m8'
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
Once the genomes coding regions have been annotated against the IVON databases IVON_barcoder.py
can be run to assign the viral genome an IVON barcode. IVON_barcoder.py
can be run as shown below;
python IVON_barcoder.py -i GENOME_CDS.fa -f GENOME_CDS.fa-OUTPUT
This will then provide a barcode such as 1.0.0.0.1.0.12.0
(barcode for HIV) that can be compared to other barcoded viral species using the NCBI_viral_species_barcode.txt
file.