Coder Social home page Coder Social logo

fastkmers's Introduction

Rust

fastkmers

A simple program for getting k-mer counts from a fastq/fasta file, written in Rust.

Description

This command line program takes a fastq/fasta file as input and outputs the counts of k-mers of a specified length. It is implemented using hash tables and a simple algortihm but is still reasonably fast (mostly by using parallel computation with the Rayon library). It can also be used to get per cycle base content for Illumina reads, by setting the k-mer size to the cycle count.

Install

I provide precompiled binaries for linux only here, but it is simple to compile and run:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
git clone https://github.com/angelovangel/fastkmers.git

cd fastkmers
cargo build --release

The executable file fastkmers is now under ./target/release/

Usage

# Make sure the executable is in your path
# check available options

fastkmers -h

# to get 4-mer counts and a summary
fastkmers -k 4 -s file.fastq.gz

# output json, input fasta
fastkmers -k 4 -j file.fasta

# stdin can also be used as input, use -
cat file.fasta | fastkmers -k 4 -j -

# query for a specific k-mer
fastkmers -k 5 -q "AATTG" file.fastq.gz

# query with regex is also supported
# this example would match all 5-mers whose last 4 bases are: not T| A | T or G | A
fastkmers -k 5 -q "[^T]A[T|G]A$" file.fastq.gz

# get base contents per cycle (the number of cycles has to be known beforehand)
fastkmers -k 126 -c tests/test.fasta

The k-mer counts are printed to stdout as a tab-separated table or as json.

Speed

I haven't compared to other programs (e.g. jellyfish), below are some measurements of the execution times for different k-mer sizes of the E. coli MG1655 genome, performed on a MacBook Pro 2018 (Intel i5 and 8 Gb RAM).

hyperfine -r 4 --warmup 1 --export-csv hyperfine-kmer-size.csv -P kmer 4 29 'fastkmers -k {kmer} -a mg1655.fasta'

img

fastkmers's People

Contributors

angelovangel avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.