Coder Social home page Coder Social logo

hivtrace's Introduction

HIV-TRACE

HIV-TRACE is an application that identifies potential transmission clusters within a supplied FASTA file with an option to find potential links against the Los Alamos HIV Sequence Database.

Installation

System Dependencies

  • gcc >= 6.0.0
  • python3 >= 3.5.1
  • tn93 >= 1.0.6

HIV-TRACE requires tn93 be installed and python3.

Install using pip

pip3 install biopython
pip3 install numpy
pip3 install scipy
pip3 install cython
pip3 install hivtrace

Tested with Python 3.5.1, 3.5.3, 3.6.1, 3.9.*, 3.11.

Example Usage

hivtrace -i ./INPUT.FASTA -a resolve -r HXB2_prrt -t .015 -m 500 -g .05 -c

Options Summary

-i --input

A FASTA file, with nucleotide sequences to be analyzed. Each sequence will be aligned to the chosen reference sequence prior to network inference. Sequence names may include munged attributes, e.g. ISOLATE_XYZ|2005|SAN DIEGO|MSM

-a --ambiguities

Handle ambiguious nucleotides using one of the following specified strategies.

Option Description
resolve count any resolutions that match as a perfect match
average average all possible resolutions
skip skip all positions with ambiguities
gapmm count character-gap positions as 4-way mismatches, otherwise same as average

For more details, please see the the MBE paper.

-r --reference

The sequence that will be used to align all provided sequences to. It is assumed that the input sequences are in fact homologous to the reference and do not have too much indel variation.

Option Description
HXB2_vif Viral Infectivity Factor
HXB2_vpu Viral Protein U
HXB2_int
HXB2_vpr Viral Protein R
HXB2_pr
HXB2_pol The genomic region encoding the viral enzymes protease, reverse transcriptase, and integrase
HXB2_tat Transactivator of HIV gene expression
HXB2_rt
NL4-3_prrt
HXB2_prrt
HXB2_nef 27-kd myristoylated protein produced by an ORF located at the 3' end of primate lentiviruses
HXB2_gag The genomic region encoding the capsid proteins (group specific antigens)
HXB2_env Viral glycoproteins produced as a precursor (gp160)
HXB2_rev The second necessary regulatory factor for HIV expression
Path/to/FASTA/file Path to a custom reference file

Please reference the landmarks of the HIV-1 genome if the presets seem foreign to you.

-t --threshold

Two sequences will be connected with a putative link (subject to filtering, see below), if and only if their pairwise distance does not exceed this threshold.

-m --minoverlap

Only sequences who overlap by at least this many non-gap characters will be included in distance calculations. Be sure to adjust this based on the length of the input sequences. You should aim to have at least 2/(distance threshold) aligned characters.

-g --fraction

Affects only the Resolve option for handling ambiguities. Any sequence with no more than the selected proportion [0 - 1] will have its ambiguities resolved (if possible), and ambiguities in sequences with higher fractions of them will be averaged. This mitigates spurious linkages due to highly ambiguous sequences.

-u --curate

Screen for contaminants by marking or removing sequences that cluster with any of the contaminant IDs.

Option Description
remove Remove spurious edges from the inferred network
report Flag all sequences sharing a cluster with the reference
separately Flag all sequences and report them via secondary tn93 command
none Do nothing

-f --filter

Use a phylogenetic test of conditional independence on each triangle in the network to remove spurious transitive connections which make A->B->C chains look like A-B-C triangles.

Option Description
remove reports supurious transitive connections
report removes supurious transitive connections

-s --strip_drams

Masks known DRAMs (Drug Resistance-Associated Mutation) positions from provided sequences.

Option Description
lewis Mask (with ---) the list of codon sites defined in Lewis et al.
wheeler Mask (with ---) the list of codon sites defined in Wheeler et al.

-c --compare

Compare uploaded sequences to all public sequences. Retrieved periodically from the Los Alamos HIV Sequence Database

-o --output

Specify output filename. If no output filename is provided, then the output filename will be <input_filename>.results.json

Viewing JSON files

You can either use the command hivtrace_viz <path_to_json_file> or visit https://veg.github.io/hivtrace-viz/ and click Load File.

hivtrace's People

Contributors

stevenweaver avatar spond avatar niemasd avatar jwertheim avatar sahawut avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.