Coder Social home page Coder Social logo

virsorter's Introduction

VirSorter

This is not the Source code of the VirSorter App, (which is available on CyVerse). This is a forked version of the VirSorter repository merely cleaned up a touch to run easier outside of Docker. If you would like to check out the real VirSorter App simply head over to Big Simon's repo.

The inspiration for me to fork this repository was to inforporate it into the Baby Virome pipeline, a lightweight and (somewhat) scaleable virome (viral metagenome) analysis pipeline.

The only modifications you'll see in this repository are meant to help the VirSorter code base improve in running time and commandline documentation (I hope). Oh, and to remove all of the Docker-related features and documentation. It's really difficult to run Docker on Linux systems at your instutitution or company because they won't dish out those sweet sweet sudo privileges (even if you have a PhD). And yeah you can run Docker without sudo, but good luck getting your IT department on board with that.

Publication

Result files

The main output files of VirSorter are:

File Description
VIRSorter_global-phage-signal.csv Comma-separated table listing the viral predictions from VirSorter (one row per prediction).
Metrics_files/VIRSorter_affi-contigs.tab Pipe-delimited table listing the annotation of all predicted ORFs in all contigs. More details below.
Predicted_viral_sequences/ FASTA and Genbank files of predicted viral sequences.
Fasta_files/ Intermediary files, including predicted proteins.
Tab_files/ Intermediary files, including results of the search agasint PFAM and the virus database.

More details on VIRSorter_affi-contigs.tab file: Lines starting with a ">" are "headers", i.e. information about the contig (contig name, number of genes, "c" for circular or "l" for linear). All other lines are information about the genes, with different columns as follows: Gene name, start, stop, length, strand, Hit in the virus protein cluster database, hit score, hit e-value, category of the virus protein cluster (see below), Hit in PFAM, hit score, hit e-value.

The categories of virus clusters represent the range of genomes in which this virus cluster was detected, i.e. 0: hallmark genes found in Caudovirales, 1: non-hallmark gene found in Caudovirales, 2: non-hallmarke gene found exclusively in virome(s), 3: hallmark gene not found in Caudovirales, 4: non-hallmark gene not found in Caudovirales.

Dependencies

Check out the INSTALL.md file.

Data Container

The 12G of dependent data exists as a separate data container called "virsorter-data."

This is the Dockerfile for that:

FROM perl:latest

MAINTAINER Ken Youens-Clark <[email protected]>

COPY Generic_ref_file.refs /data/

COPY PFAM_27 /data/PFAM_27

COPY Phage_gene_catalog /data/Phage_gene_catalog

COPY Phage_gene_catalog_plus_viromes /data/Phage_gene_catalog_plus_viromes

COPY VirSorter_Readme.txt /data

COPY VirSorter_Readme_viromes.txt /data

VOLUME ["/data"]

Then do:

$ docker build -t kyclark/virsorter-data .
$ docker create --name virsorter-data kyclark/virsorter-data /bin/true

Authors

Simon Roux [email protected] is the author of Virsorter

Rev DJN 26Jan2018

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.