Coder Social home page Coder Social logo

nus-cs2103-ay1819s2 / meshclust Goto Github PK

View Code? Open in Web Editor NEW

This project forked from bioinformaticstoolsmith/meshclust

0.0 1.0 0.0 167 KB

MeShClust: an intelligent tool for clustering DNA sequences

Home Page: https://doi.org/10.1093/nar/gky315

Makefile 2.72% C++ 97.28%

meshclust's Introduction

MeShClust
Release version

Requirements: g++ 4.9.1 or later, requires Homebrew on Mac OS X

Compilation using g++ (homebrew) and GNU Make on Mac OS X
CXX=g++-7 make

see: https://stackoverflow.com/questions/29057437/compile-openmp-programs-with-gcc-compiler-on-os-x-yosemite


Linux/Unix compilation:
make

If you find this tool helpful, please cite:

James, Benjamin T. et al. (2018), MeShClust: an intelligent tool for clustering DNA sequences. Nucleic Acids Research, gky315.

Usage: bin/meshclust *.fasta [--id 0.90] [--kmer 3] [--delta 5] [--output output.clstr] [--iterations 20] [--align] [--sample 3000] [--pivot 40] [--threads TMAX]

The most important parameter, --id, controls the identity of the sequences.
    If the identity is below 60%, alignment is automatically used instead of k-mer measures.
    However, alignment can be forced with the --align parameter.

--kmer decides the size of the kmers. It is by default automatically decided by average sequence length,
       but if provided, MeShClust can speed up a little by not having to find the largest sequence length.
       Increasing kmer size can increase accuracy, but increases memory consumption fourfold.

--delta decides how many clusters are looked around in the final clustering stage.
	Increasing it creates more accuracy, but takes more time.

--output specifies the output file, in CD-HIT's CLSTR format

--iterations specifies how many iterations in the final stage of merging are done until convergence.

--align forces alignment to be used, which can be much slower than k-mer features, but is
	more accurate than using k-mer features to guess alignment.

--threads sets the number of threads to be used. By default OpenMP uses the number of available cores
	  on your machine, but this parameter overwrites that.

--sample selects the total number of sample pairs of sequences used for both training and testing.
	 1500 is the default value.

--pivot selects the maximum number of pairs selected from one pivot sequence. Increasing this means
	less pivots are available, but more pairs are selected for one sequence, which can lead to
	higher training accuracy. The default value is 40.

If the argument is not listed here, it is interpreted as an input file.


License

Academic use: The software is provided as-is under the GNU GPLv3.
Any restrictions to use for-profit or non-academics: License needed.

meshclust's People

Contributors

benjamin-james avatar zhangjiayu0303 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.