fastutils

Quick start
Available commands
Command details
Bug report
Copyright and License

Quick start

# check number of reads, number of bases, and base composition of a fasta/q file
fastutils stat -i reads.fastq
# check mean read length
fastutils length -i reads.fastq | datamash mean 1
# convert fastq to fasta
fastutils format -i reads.fastq > reads.fasta
# print reads longer than 1000 bp and format in lines of length 60 bp
fastutils format -i reads.fastq -m 1000 -w 60 > reads.1000.fasta
# interleave paired-end dataset
fastutils interleave -1 reads_1.fastq -2 reads_2.fastq -q > reads.fastq
# subsample 25x coverage of reads randomly (assuming E.coli dataset)
fastutils subsample -i reads.fastq -d 25 -g 4.6m -r > reads.subsample.fasta
# print first 1 million bp of chr1 and format in lines of length 60 bp
fastutils subseq -i hg38.fa -o - chr1:0-1000000 | fastutils format -w 60 > chr1.chunk.fasta
# compare each sequences with its reverse complement and print lexicographically smaller one
fastutils revcomp -i reads.fastq -l > reads.lex.fasta
# piping example; Get all contigs of chrX
cat hg38.fa | fastutils format | grep ">chrX" -A1 | fastutils cutN -i - > chrX.contigs.fa

Available commands

stat         prints general statistics of fasta/q files
length       prints read ids and their length in tabular format
format       re-formats the fasta/q file based on user's needs
interleave   generate interleave paired end reads
revcomp      prints the reverse complement of each sequence
subsample    output a fraction of reads depending on the desired coverage
subseq       extracts a subsequence from the fasta/q file
cutN         breaks fasta entries into contigs (if containing N's)

For details about each command enter fastutils <command> -h.

Command details

fastutils stat

Reports the number of reads, number of bases, and base composition of the input FASTA/Q file.

Usage: fastutils stat [options]

I/O options:
     -i,--in STR         input file in fasta/q format [stdin]
     -o,--out STR        output file [stdout]

More options:
     -m,--minLen INT     min read length [0]
     -M,--maxLen INT     max read length [INT64_MAX]
     -h,--help           print this help

fastutils length

Prints the name and length of each read (separated by tab), one read per line.

Usage: fastutils length [options]

I/O options:
     -i,--in STR            input file in fasta/q format [stdin]
     -o,--out STR           output file [stdout]

More options:
     -m,--minLen INT        min read length [0]
     -M,--maxLen INT        max read length [LLONG_MAX]
     -t,--total             print total number of bases in third column
     -h,--help              print this help

fastutils format

Change the format of the input file.

Usage: fastutils format [options]

I/O options:
     -i,--in STR            input file in fasta/q format [stdin]
     -o,--out STR           output file [stdout]

More options:
     -w,--lineWidth INT     size of lines in fasta output. Use 0 for no wrapping [0]
     -m,--minLen INT        min read length [0]
     -M,--maxLen INT        max read length [LLONG_MAX]
     -q,--fastq             output reads in fastq format if possible
     -n,--noN               do not print entries with N's
     -c,--comment           print comments in headers
     -d,--digital           use read index instead as read name
     -k,--keep              keep  name as a comment when using -d
     -p,--prefix STR        prepend STR to the name
     -s,--suffix STR        append STR to the name
     -P,--pacbio            use pacbio's header format
     -h,--help              print this help

fastutils interleave

Takes two fasta/q files of one or multiple paired-end/mate-pair library and print the sequences in interleaved/interlaced format.

Usage: fastutils interleave [options] -1 lib1_1.fq -2 lib1_2.fq [-1 lib2_1.fq -2 lib2_2.fq ...]

I/O options:
     -1,--in1 STR           fasta/q file containing forward (left) reads [required]
     -2,--in2 STR           fasta/q file containing reverse (right) reads [required]
     -o,--out STR           output interlaced reads in STR file [stdout]
More options:
     -q,--fastq              output reads in fastq format if possible
     -s,--separator CHR     separator character [.]
     -h,--help              print this help

fastutils revcomp

Print the reverse complement of the sequences contained in the input.

Usage: fastutils revcomp [options]

I/O options:
     -i,--in STR            input file in fasta/q format [stdin]
     -o,--out STR           output file [stdout]

More options:
     -w,--lineWidth INT     size of lines in fasta output. Use 0 for no wrapping [0]
     -q,--fastq             output reads in fastq format if possible
     -c,--comment           print comments in headers
     -l,--lex               output lexicographically smaller sequence
     -h,--help              print this help

fastutils subsample

Downsamples the input file to a desired depth of coverage. User can choose to select random reads, longest reads, or from top (default).

Usage: fastutils subsample -i input -d depth -g genomeSize

I/O options:
     -i,--in STR            input file in fasta/q format. This options is required if -r or -l are used [stdin]
     -o,--out STR           output file [stdout]

More options:
     -d,--depth INT         coverage of the subsampled set [required]
     -g,--genomeSize FLT    length of the genome. Accepted suffixes are k,m,g [required]
     -r,--random            subsample randomly instead of selecting top reads
     -l,--longest           subsample longest reads instead of selecting top reads
     -s,--seed INT          seed for random number generator
     -q,--fastq             output reads in fastq format if possible
     -c,--comment           print comments in headers
     -n,--num               use read index instead of read name
     -k,--keep              keep name as a comment when using -n
     -h,--help              print this help

fastutils subseq

Extracts desired subsequences from input file.

Usage: fastutils subseq [options] <name:start-end> [<name2:start2-end2> ...]

Required options:
         -i STR        input file in fastx format. Use - for stdin.
         -o STR        output file. Use - for stdout.

More options:
         -v            print version and build date
         -h            print this help

fastutils cutN

Cuts fasta entries at N bases. This is useful for converting scaffolds to contigs.

Usage: fastutils cutN [options]

Required options:
         -i STR        input file in fastx format. Use - for stdin.
         -o STR        output file in fasta format. Use - for stdout.

More options:
         -v            print version and build date
         -h            print this help

Bug report

Please report the bugs through issue tracker at https://github.com/haghshenas/fastutils/issues.

Copyright and License

This software is released under GNU General Public License (v3.0)

novapyth / fastutils Goto Github PK

fastutils's Introduction

fastutils

Quick start

Available commands

Command details

fastutils stat

fastutils length

fastutils format

fastutils interleave

fastutils revcomp

fastutils subsample

fastutils subseq

fastutils cutN

Bug report

Copyright and License

fastutils's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent