Coder Social home page Coder Social logo

vnom's Introduction

VNom

Viroid Nominator (VNom): a reference free tool for nominating viroid-like de novo assembled contigs

=================================================

overview

more detail to found in the accompanying preprint.

VNom_overview

VNom works by sequential filtering:

  1. identify contigs with terminal k-mer repeats (consistent with circularity) and attempt to resolve any concatemers within said contigs
  2. cluster contigs based on sequence identity allowing for circular permutation
  3. keep clusters that contain both positive and negative sense polarities (indicative of active replication in the sample)
  4. using these clusters, query all the previously discarded contigs for high confidence hits and add to said clusters

outputs are stored to 4_final_clusters (so if this dir wasn't written, VNom failed to nominate viroid-like contigs - the stdout might contain enough inforation to say what happened. I find that often the dual-polarity filter is where VNom quits - that is, this appears to be a fairly high bar. Omitting it doesn't make molecular sense and appears to give a large false positive rate).

VNom puposefully takes a vague approach to nominating viroid-like contigs, which means its outputs are not guaranteed to be viroids. Strictly, VNom gives a set of clusters whose molecular characteristics are not inconsistent with being viroids. I've found that this is a reasonably stringent set of requirements but repetitive sequences (say, centromeric sequences) do pop up.

Because of how VNom works, the input contigs need to be derived from stranded RNA-seq. VNom is built to use the output from rnaSPAdes as a source of contigs - other de bruijn graph assemblers should work but there are currently some hard-coded SPAdes-specific seqID manipulations that go on in VNom (you can take any contigs you want and spoof SPAdes seqIDs to try VNom - it works reasonably well).

=================================================

installation (Linux)

  1. make sure you have conda installed

  2. create conda environment:

cd VNom/

conda env create -f VNom_conda.yml

conda activate VNom
  1. install circUCLUST
cd dependencies/

wget https://github.com/rcedgar/circuclust/releases/download/v1.0/circuclust_linux64

mv circuclust_linux64 circuclust

chmod +x circuclust
  1. install USEARCH

(in dependencies/)

wget https://www.drive5.com/downloads/usearch11.0.667_i86linux32.gz

gunzip usearch11.0.667_i86linux32.gz

mv usearch11.0.667_i86linux32 usearch

chmod +x usearch
  1. install mars
(in dependencies/)

git clone https://github.com/lorrainea/MARS

cd MARS/

./pre-install.sh

make -f Makefile
  1. test VNom

here, I filter out any contigs with 'N's in them, and also re-name the 'NODE' string in each contig to be more informative later.

A KEY POINT ON NAMES:

a. the contig seqIDs need to be in the extact same layout (wrt underscores) as the default rnaSPAdes output, here I replace 'NODE' with a more informative string - adding more underscores will cause VNom to crash

b. the contigs file must have a single underscore name with a .fasta file ending (so X_Y.fasta is good, but XY.fasta is bad)

c. you must specify this single underscore name without the file ending for VNom

cd ../../test_data

sed 's/NODE/SRR11060618/g' SRR11060618_subset.fasta > peach_subset.fasta

seqkit grep -v -s -p 'N' peach_subset.fasta > temp && mv temp peach_subset.fasta

python ../VNom.py -i peach_subset -max 2000 -CF_k 10 -CF_simple 0 -CF_tandem 1 -USG_vs_all 1 > peach_subset_VNom.log

vnom's People

Contributors

zheludev avatar

Stargazers

Daniel Krochmal avatar David Wilkins avatar Yuanyuan Wang avatar Vilmar Benetti Filho avatar Joan Lluis Pons Ramon avatar  avatar Greg Gavelis avatar Alastair Roberts avatar Markus Sommer avatar Benjamin Lee avatar

Watchers

Fire Lab avatar  avatar C. L. Herne avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.