Coder Social home page Coder Social logo

ust's Introduction

UST

UST is a bioinformatics tool for constructing a spectrum-preserving string set (SPSS) representation from sets of k-mers.

Note: This software has been subsumed by ESSCompress. To use UST, download ESSCompress and follow the UST instructions in the README.

Requirements

GCC >= 4.8 or a C++11 capable compiler

Quick start

To install, compile from source:

git clone https://github.com/medvedevgroup/UST
cd UST
make

After compiling, use

./ust -i [unitigs.fa] -k [kmer_size]

e.g.

./ust -i examples/k11.unitigs.fa -k 11

The important parameters are:

  • k [int] : The k-mer size that was used to generate the input, i.e. the length of the nodes of the node-centric de Bruijn graph.
  • i [input-file] : Unitigs file produced by BCALM2 in FASTA format.
  • a [0 or 1] : Default is 0. A value of 1 tells UST to preserve abundance. Use this option when the input file was generated with the -all-abundance counts option of BCALM2.

The output is a FASTA file with extenstion "ust.fa" in the working folder, which is the SPSS representaiton of the input. If the program is run with the option -a 1, an additional count file with extension "ust.counts" will also be generated.

Detailed Usage

In order to build a SPSS representation for your k-mer set, you must first run BCALM2 on your set of k-mers. BCALM2 will construct a set of unitigs. Those unitigs are then fed as input to ust, which outputs a FASTA file with the SPSS representation. Note that the k parameter to ust must match the -kmer-size used when running BCALM2.

If you would like to store the data on disk in compressed form (like UST-Compress in our paper), you can then install and run MFCompress on the output of UST as follows: MFCompressC mykmers.ust.fa

If you would like to build a membership data structure based on UST, then

  • Install bwtdisk and dbgfm.
  • Change the two variables "DBGFM_DIRECTORY" and "BWTDISK_DIRECTORY" in the script ust-fm.sh to point to the locations where dbgfm and bwtdisk are installed. Alternatively, you can add the path to both tools in your environment PATH variable and then modify the script accordingly.
  • Run ust-fm.sh as follows: ust-fm.sh mykmers.ust.fa

Citation

If using UST in your research, please cite

@inproceedings{RahmanMedvedevRECOMB20,
  author    = {Amatur Rahman and Paul Medvedev},
  title     = {Representation of $k$-mer sets using spectrum-preserving string sets},
  booktitle = {Research in Computational Molecular Biology - 24th Annual International Conference, {RECOMB} 2020, Padua, Italy, May 10-13, 2020, Proceedings},
  series    = {Lecture Notes in Computer Science},
  volume    = {12074},
  pages     = {152--168},
  publisher = {Springer},
  year      = {2020
}

Note that the general notion of an SPSS was independently introduced under the name of simplitigs. Therefore, if citing this general notion, please also cite:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.