Coder Social home page Coder Social logo

tseemann / kounta Goto Github PK

View Code? Open in Web Editor NEW
10.0 4.0 3.0 96 KB

๐Ÿงฎ ๐Ÿ”ข Generate multi-sample k-mer count matrix from WGS

License: GNU General Public License v3.0

Perl 100.00%
kmer-counting genomics-data machine-learning data-science gwas

kounta's Introduction

Build Status License: GPL v3 Don't judge me

kounta

Introduction

This tool will take a bunch (N) of contigs (FASTA) or reads (FASTQ.gz) and generate a tab-separated matrix with M rows and N+1 columns, where M is the number unique k-mers found across the inputs, and the columns are the k-mer string and the counts for the N genomes.

It relies on kmc for efficient k-mer counting, then uses standard Unix tools like sort, paste, cut and join to combine all the data into an output file without having to ever have it all in memory at once. The more --threads and --ram you can give it, the faster it will run, assuming your disk can keep up.

Quick Start

Using contigs

% ls *.fna
01.fna 02.fna 03.fna 04.fna

% kounta --kmer 7 --out kmers.tsv *.fna
<snip>
Done.

% head kmers.tsv
#KMER    01.fna 02.fna 03.fna 04.fna
AAAAAAA	 0      1      2      1 
AAAAAAT  1      1      1      1
AAAAAAG  3      0      0      0
AAAAATA  0      1      1      0
etc.

Using reads

% ls *q.gz
AX_R1.fq.gz BX_R1.fq.gz CX_R1.fq.gz DX_R1.fq.gz

% kounta --kmer 7 --threads 8 --ram 4 --out kmers.tsv *.fq.gz
<snip>
Done.

% head kmers.tsv
#KMER    AX_R1.fq.gz BX_R1.fq.gz CX_R1.fq.gz DX_R1.fq.gz
AAAAAAA	           0          45          21          33 
AAAAAAT           22          21          26          87
AAAAAAG           34           0           0           0
AAAAATA            0          91          76           0
etc.

Notes

  • Do not mix samples of reads and contigs, because the k-mer frequencies will be not comparable.
  • When using reads, the minimum k-mer frequency reported is --minfreq
  • When using reads, it is recommended to only use R1, and ignore R2 as it is normally noisier and more error-prone, and doesn't add much extra information
  • If you only want "core" k-mers, you can grep -v -w 0 kmers.tsv > core.tsv (NOTE: will removed header line)
  • To binarize the results to presence/absence you can sed -e '1 ! s/[1-9][0-9]*/1/g' kmers.tsv > yesno.tsv (NOTE: will mess up header line)

Installation

Conda

Install Conda or Miniconda:

conda install -c conda-forge -c bioconda -c defaults kounta

Homebrew

Install HomeBrew (Mac OS X) or LinuxBrew (Linux).

brew install brewsci/bio/kounta

Source

This will install the latest version direct from Github. You'll need to add the kounta bin directory to your $PATH, and also ensure all the dependencies are installed.

cd $HOME
git clone https://github.com/tseemann/kounta.git
$HOME/kounta/bin/kounta --help

Dependencies

  • perl >= 5.26
  • kmc >= 3.1
  • GNU parallel >= 20160101
  • GNU sort, paste, join, cut, uniq, wc

License

kounta is free software, released under the GPL 3.0.

Issues

Please submit suggestions and bug reports to the Issue Tracker

Author

Torsten Seemann

kounta's People

Contributors

ar0ch avatar tseemann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

kounta's Issues

Only print rows with at least S non-zero entries

Filter out rows that don't have at least S samples having that kmer.
ie. if > S 0 (zeroes) then exclude row

eg

AAAAAAAAAAAAAAAAAAAAC   0       0       0       0       0       0       0       0       0       0    00       0       0       0       0       0       0       0       0       0       0       0       0    00       0       0       0       0       0       0       0       0       0       0       0       0    00       0       0       0       0       0       0       0       0       0       0       0       0    00       0       0       0       0       0       0       0       0       0       0       0       0    00       0       0       0       0       0       0       0       0       0       0       0       0    00       0       0       0       0       0       0       0       14      0       0       0       0    00       0

Empty output file containing only header!

Hello, I ran into a problem. Using MacOS, example files and kounta conda installation (separate envrionment) I got empty output file containing only header. Terminal output:

(kounta_env) Mhkls-MacBook-Pro:Desktop mhklm$ kounta --kmer 7 --out OUTS example/*.fna
Option: --threads = 1
Option: --minfreq = 3
Option: --fofn = undefined
Option: --tempdir = auto
Option: --kmer = 7
Option: --out = OUTS
Option: --ram = 4
Found: sort 2.3 => /usr/bin/sort
Found: join unknown => /usr/bin/join
Found: paste unknown => /usr/bin/paste
Found: uniq unknown => /usr/bin/uniq
Found: wc unknown => /usr/bin/wc
Found: parallel unknown => /Users/mihkelm/miniconda3/envs/kounta_env/bin/parallel
Found: kmc 3.1.0 => /Users/mihkelm/miniconda3/envs/kounta_env/bin/kmc
Found: kmc_tools unknown => /Users/mihkelm/miniconda3/envs/kounta_env/bin/kmc_tools
Temp folder: /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP
Counting k-mers in example/NC_002023.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_002023.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_002023.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.kmers
AAAAAAA	2
AAAAAAC	7
AAAAAAG	8
AAAAAAT	2
AAAAACA	11
Counting k-mers in example/NC_002204.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_002204.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_002204.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.kmers
AAAAAAA	1
AAAAAAC	3
AAAAAAG	4
AAAAAAT	6
AAAAACA	9
Counting k-mers in example/NC_004910.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_004910.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_004910.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.kmers
AAAAAAC	2
AAAAAAG	2
AAAAAAT	1
AAAAACA	5
AAAAACC	4
Counting k-mers in example/NC_006307.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_006307.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_006307.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.kmers
AAAAAAC	5
AAAAAAG	4
AAAAAAT	11
AAAAACA	14
AAAAACC	5
Counting k-mers in example/NC_007357.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_007357.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_007357.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.kmers
AAAAAAC	5
AAAAAAG	2
AAAAAAT	1
AAAAACA	3
AAAAACC	1
Counting k-mers in example/NC_007373.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_007373.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_007373.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.kmers
AAAAAAA	4
AAAAAAC	7
AAAAAAG	5
AAAAAAT	4
AAAAACA	12
Counting k-mers in example/NC_007378.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_007378.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_007378.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.kmers
AAAAAAA	1
AAAAAAC	2
AAAAAAG	3
AAAAAAT	1
AAAAACA	2
Counting k-mers in example/NC_026423.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_026423.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_026423.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.kmers
AAAAAAA	1
AAAAAAC	3
AAAAAAT	1
AAAAACA	6
AAAAACC	2
Counting k-mers in example/NC_026438.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_026438.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_026438.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.kmers
AAAAAAA	2
AAAAAAC	3
AAAAAAG	5
AAAAACA	4
AAAAAGA	5
Counting k-mers in example/NC_026952.fna
Running: LC_ALL=C kmc -cs65535 -m4 -sm -k7 -t1 -fm -ci1 example/NC_026952.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP 1>/dev/null
Stage 1: 100%
Sorting k-mers in NC_026952.fna
Running: LC_ALL=C kmc_tools transform /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna dump -s /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.kmers
in1: 100% 
Running: LC_ALL=C head -n 5 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.kmers
AAAAAAC	3
AAAAAAG	8
AAAAAAT	1
AAAAACA	11
AAAAACC	3
Find unique kmers from 10 files
Running: LC_ALL=C sort --parallel 1 --buffer-size 4G -m --batch-size=10 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/*.kmers | cut -f1 | uniq > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers
sort: unrecognized option `--parallel'
Usage: sort [-bcCdfigMmnrsuz] [-kPOS1[,POS2] ... ] [+POS1 [-POS2]] [-S memsize] [-T tmpdir] [-t separator] [-o outfile] [--batch-size size] [--files0-from file] [--heapsort] [--mergesort] [--radixsort] [--qsort] [--mmap] [--human-numeric-sort] [--version-sort] [--random-sort [--random-source file]] [--compress-program program] [file ...]
Running: LC_ALL=C head -n 10 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers
Joining uniqmers to sample k-mer counts
Running: LC_ALL=C parallel -j 1 -v "join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers {} | cut -d ' ' -f 2 > {.}.count" ::: /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.kmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.kmers
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:

  O. Tange (2018): GNU Parallel 2018, Mar 2018, ISBN 9781387509881,
  DOI https://doi.org/10.5281/zenodo.1146014

This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.

More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#Citation-notice

To silence this citation notice: run 'parallel --citation' once.

Come on: You have run parallel 5332 times. Isn't it about time 
you run 'parallel --citation' once to silence the citation notice?

join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.count
join -o '1.1 2.2' -j 1 -a 1 -e 0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.kmers | cut -d ' ' -f 2 > /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.count
Writing header for output file: OUTS
Combining 10 count files.
Running: LC_ALL=C paste /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.count /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.count >> OUTS
Running: LC_ALL=C wc -l /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/*
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.count
    5537 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002023.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.count
    5385 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_002204.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.count
    5660 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_004910.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.count
    4755 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_006307.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.count
    5625 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007357.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.count
    5592 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007373.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.count
    5568 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_007378.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.count
    5652 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026423.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.count
    5563 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026438.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.count
    4998 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/NC_026952.fna.kmers
       0 /var/folders/ry/xmq4zmld1jn8ry5czjsfv3040000gn/T/su_7cPMSJP/uniqmers
   54335 total
Result in: OUTS
Done.

Duplicated kmer lines

Seems to be S lines of each kmer.

AAAAAAAAAAAAAAAAAAAAAAAAA       0       0       7       3       0       0       29      26  >
AAAAAAAAAAAAAAAAAAAAAAAAA       0       0       7       3       0       0       29      26  >
AAAAAAAAAAAAAAAAAAAAAAAAA       0       0       7       3       0       0       29      26  >
AAAAAAAAAAAAAAAAAAAAAAAAA       0       0       7       3       0       0       29      26  >
AAAAAAAAAAAAAAAAAAAAAAAAA       0       0       7       3       0       0       29      26  >
AAAAAAAAAAAAAAAAAAAAACAAA       0       0       0       3       0       0       0       0   >

Kmers are in every second column

#KMER   NC_002023.fna   NC_002204.fna   NC_004910.fna   NC_006307.fna   NC_0073 

AAAAAAACA       AAAAAAACA 1     AAAAAAACA 0     AAAAAAACA 0     AAAAAAACA 0    

AAAAAAACT       AAAAAAACT 0     AAAAAAACT 0     AAAAAAACT 0     AAAAAAACT 0 

AAAAAAAGA       AAAAAAAGA 1     AAAAAAAGA 0     AAAAAAAGA 0     AAAAAAAGA 0    

AAAAAAAGC       AAAAAAAGC 0     AAAAAAAGC 1     AAAAAAAGC 0     AAAAAAAGC 0  

counter precision?

I've just tried out kounta - very nice for a quick look at some data!

image

At low K (8 for me) some mers are overflowing the available counter... any thoughts on how to deal with that?

Add --fofn option

Allow user to provide file of filenames to use.
Optionally with LABEL \t FILENAME ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.