alekseyzimin / masurca Goto Github PK

View Code? Open in Web Editor NEW

237.0 16.0 35.0 3.97 GB

License: GNU General Public License v3.0

Makefile 3.89% Perl 25.37% M4 69.72% Shell 1.02%

masurca genome assembly bioinformatics

masurca's People

Contributors

Stargazers

Watchers

masurca's Issues

RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, <STDIN> line 2004.

Hello MaSuRCA team,

I used MaSuRCA (3.2.7) to hybrid assemble Brassica genome, and i meet the following two errors:

**./assemble.sh: line 158: work2.1/readPlacementsInSuperReads.final.read.superRead.offset.ori.txt: No such file or directory

RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.**

[Sat Jun 30 13:09:06 CST 2018] Processing pe library reads
[Sat Jun 30 13:27:49 CST 2018] Processing sj library reads
[Sat Jun 30 13:49:18 CST 2018] Average PE read length 150
[Sat Jun 30 13:49:19 CST 2018] Using kmer size of 99 for the graph
[Sat Jun 30 13:49:20 CST 2018] MIN_Q_CHAR: 33
[Sat Jun 30 13:49:20 CST 2018] Creating mer database for Quorum
[Sat Jun 30 14:22:21 CST 2018] Error correct PE.
[Sat Jun 30 15:30:09 CST 2018] Error correct JUMP.
[Sat Jun 30 16:10:38 CST 2018] Estimating genome size.
[Sat Jun 30 19:26:30 CST 2018] Estimated genome size: 423828686
[Sat Jun 30 19:26:30 CST 2018] Creating k-unitigs with k=99
[Sat Jun 30 21:51:17 CST 2018] Creating k-unitigs with k=31
[Sat Jun 30 23:51:16 CST 2018] Filtering mate pairs
Assuming outtie orientation
./assemble.sh: line 158: work2.1/readPlacementsInSuperReads.final.read.superRead.offset.ori.txt: No such file or directory
Chimeric/Redundant jump reads:
80879566 chimeric_sj.txt
385590798 redundant_sj.txt
466470364 total
[Sun Jul 1 11:00:32 CST 2018] Creating FRG files
[Sun Jul 1 11:09:06 CST 2018] Computing super reads from PE
Using CABOG from is /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../CA8/Linux-amd64/bin
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=17, d=0.029
Estimated Genome Size 423828686
Estimated Ploidy 1
Using 30 threads
Output prefix mr.41.15.17.0.029
Pacbio coverage <30x, using the longest subreads
Reducing super-read k-mer size
Mega-reads pass 1
Running locally in 1 batch
compute_psa 4268399 1643220171
Processed 500000 super reads, irreducible 370521, processing 682 super reads per second
Processed 1000000 super reads, irreducible 797782, processing 841 super reads per second
Processed 1500000 super reads, irreducible 1258713, processing 1392 super reads per second
Processed 2000000 super reads, irreducible 1715565, processing 1655 super reads per second
Processed 2500000 super reads, irreducible 2141569, processing 2118 super reads per second
Mega-reads pass 2
Running locally in 1 batch
compute_psa 2237976 6053800446
Refining alignments
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2002.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.

configure:error: cannot find boost/icl/interval_set.hpp

home@home-Lenovo-H30-50:~/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7$ sudo ./install.sh

pwd
ROOT=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7
[ -z ]
DEST=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7
mkdir -p dist-bin
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/dist-bin
which make
ln -sf /usr/bin/make /home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/dist-bin/gmake
ln -sf /home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/PkgConfig.pm /home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/dist-bin/pkg-config
grep -c ^processor /proc/cpuinfo
export NUM_THREADS=4
BINDIR=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/bin
LIBDIR=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/lib
export PKG_CONFIG_PATH=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/lib/pkgconfig:
cd global-1
./configure --prefix=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7 --bindir=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/bin --libdir=/home/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7/lib
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for a thread-safe mkdir -p... /bin/mkdir -p
checking for gawk... no
checking for mawk... mawk
checking whether make sets $(MAKE)... yes
checking whether make supports nested variables... yes
checking whether make supports nested variables... (cached) yes
checking how to print strings... printf
checking for style of include used by make... GNU
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking whether gcc understands -c and -o together... yes
checking dependency style of gcc... gcc3
checking for a sed that does not truncate output... /bin/sed
checking for grep that handles long lines and -e... /bin/grep
checking for egrep... /bin/grep -E
checking for fgrep... /bin/grep -F
checking for ld used by gcc... /usr/bin/ld
checking if the linker (/usr/bin/ld) is GNU ld... yes
checking for BSD- or MS-compatible name lister (nm)... /usr/bin/nm -B
checking the name lister (/usr/bin/nm -B) interface... BSD nm
checking whether ln -s works... yes
checking the maximum length of command line arguments... 1572864
checking whether the shell understands some XSI constructs... yes
checking whether the shell understands "+="... yes
checking how to convert x86_64-unknown-linux-gnu file names to x86_64-unknown-linux-gnu format... func_convert_file_noop
checking how to convert x86_64-unknown-linux-gnu file names to toolchain format... func_convert_file_noop
checking for /usr/bin/ld option to reload object files... -r
checking for objdump... objdump
checking how to recognize dependent libraries... pass_all
checking for dlltool... no
checking how to associate runtime and link libraries... printf %s\n
checking for ar... ar
checking for archiver @file support... @
checking for strip... strip
checking for ranlib... ranlib
checking command to parse /usr/bin/nm -B output from gcc object... ok
checking for sysroot... no
checking for mt... mt
checking if mt is a manifest tool... no
checking how to run the C preprocessor... gcc -E
checking for ANSI C header files... yes
checking for sys/types.h... yes
checking for sys/stat.h... yes
checking for stdlib.h... yes
checking for string.h... yes
checking for memory.h... yes
checking for strings.h... yes
checking for inttypes.h... yes
checking for stdint.h... yes
checking for unistd.h... yes
checking for dlfcn.h... yes
checking for objdir... .libs
checking if gcc supports -fno-rtti -fno-exceptions... no
checking for gcc option to produce PIC... -fPIC -DPIC
checking if gcc PIC flag -fPIC -DPIC works... yes
checking if gcc static flag -static works... yes
checking if gcc supports -c -o file.o... yes
checking if gcc supports -c -o file.o... (cached) yes
checking whether the gcc linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking whether -lc should be explicitly linked in... no
checking dynamic linker characteristics... GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking whether stripping libraries is possible... yes
checking if libtool supports shared libraries... yes
checking whether to build shared libraries... yes
checking whether to build static libraries... yes
checking for gcc... (cached) gcc
checking whether we are using the GNU C compiler... (cached) yes
checking whether gcc accepts -g... (cached) yes
checking for gcc option to accept ISO C89... (cached) none needed
checking whether gcc understands -c and -o together... (cached) yes
checking dependency style of gcc... (cached) gcc3
checking for g++... g++
checking whether we are using the GNU C++ compiler... yes
checking whether g++ accepts -g... yes
checking dependency style of g++... gcc3
checking how to run the C++ preprocessor... g++ -E
checking for ld used by g++... /usr/bin/ld -m elf_x86_64
checking if the linker (/usr/bin/ld -m elf_x86_64) is GNU ld... yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking for g++ option to produce PIC... -fPIC -DPIC
checking if g++ PIC flag -fPIC -DPIC works... yes
checking if g++ static flag -static works... yes
checking if g++ supports -c -o file.o... yes
checking if g++ supports -c -o file.o... (cached) yes
checking whether the g++ linker (/usr/bin/ld -m elf_x86_64) supports shared libraries... yes
checking dynamic linker characteristics... (cached) GNU/Linux ld.so
checking how to hardcode library paths into programs... immediate
checking for pthread_create in -lpthread... yes
checking for library containing clock_gettime... none required
checking for __int128... yes
checking for OpenMP flag of C++ compiler... -fopenmp
checking for Boost headers version >= 1.46.0... yes
checking for Boost's header version... 1_54
checking boost/icl/interval_set.hpp usability... no
checking boost/icl/interval_set.hpp presence... yes
configure: WARNING: boost/icl/interval_set.hpp: present but cannot be compiled
configure: WARNING: boost/icl/interval_set.hpp: check for missing prerequisite headers?
configure: WARNING: boost/icl/interval_set.hpp: see the Autoconf documentation
configure: WARNING: boost/icl/interval_set.hpp: section "Present But Cannot Be Compiled"
configure: WARNING: boost/icl/interval_set.hpp: proceeding with the compiler's result
configure: WARNING: ## ------------------------------- ##
configure: WARNING: ## Report this to [email protected] ##
configure: WARNING: ## ------------------------------- ##
checking for boost/icl/interval_set.hpp... no
configure: error: cannot find boost/icl/interval_set.hpp

runCA error

I am running masurca v3.2.3. I got an error at celera assembly step. The log report is below:

[Wed 23 May 17:16:57 BST 2018] Processing pe library reads
[Wed 23 May 17:34:23 BST 2018] Average PE read length 200
[Wed 23 May 17:34:24 BST 2018] Using kmer size of 127 for the graph
MIN_Q_CHAR: 33
[Wed 23 May 17:34:25 BST 2018] Creating mer database for Quorum
[Wed 23 May 17:44:48 BST 2018] Error correct PE.
[Wed 23 May 18:49:29 BST 2018] Estimating genome size.
Estimated genome size: 641438513
[Wed 23 May 18:59:38 BST 2018] Creating k-unitigs with k=127
[Wed 23 May 19:39:32 BST 2018] Computing super reads from PE
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=13, d=0.02
Using MaSuRCA files from work1, k-unitig mer 41
Estimated Genome Size 641438513
Estimated Ploidy 1
Using CA installation from /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/../CA8/Linux-amd64/bin
Using 320 threads
Output prefix mr.41.15.13.0.02
Detected nanopore data, we have to rename the reads
/tsl/scratch/witekk/nanopore/high_quality_minion_data.fastq generated
Reducing super-read k-mer size
Mega-reads pass 1
compute_psa 1578015 1250919193
Processed 500000 super reads, irreducible 359409, processing 902 super reads per second
Processed 1000000 super reads, irreducible 681217, processing 2403 super reads per second
Processed 1500000 super reads, irreducible 1016462, processing 2325 super reads per second
Processed 2000000 super reads, irreducible 1342366, processing 2083 super reads per second
Mega-reads pass 2
compute_psa 1393233 5012389650
Refining alignments
read sequence for 136f2596-55be-408d-8250-ab8b5e4e744e not found at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/add_pb_seq.pl line 19, line 1.
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine.sh: line 15: delta-filter: command not found
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine.sh: line 16: show-coords: command not found
Can't load '/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/../lib/perl/mummer.so' for module mummer: /usr/lib64/libc.so.6: version GLIBC_2.18' not found (required by /tsl/software/testing/brew/default/x86_64/lib/libstdc++.so.6) at /usr/lib64/perl5/DynaLoader.pm line 190. at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/../lib/perl/mummer.pm line 11. Compilation failed in require at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine_alignments.pl line 8. BEGIN failed--compilation aborted at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine_alignments.pl line 8. rm: cannot remove ‘t..matches.0.maximal_mr.fa’: No such file or directory rm: cannot remove ‘t..matches.0.maximal_mr.names’: No such file or directory Joining awk: cmd. line:1: fatal: cannot open file mr.41.15.13.0.02.all.txt' for reading (No such file or directory)
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/mega_reads_assemble_nomatch.sh: line 269: mr.41.15.13.0.02.all.txt: No such file or directory
Generating assembly input files
awk: cmd. line:1: fatal: cannot open file `mr.41.15.13.0.02.1.fa' for reading (No such file or directory)
stat: cannot stat ‘mr.41.15.13.0.02.1.fa’: No such file or directory
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/mega_reads_assemble_nomatch.sh: line 288: /641438513/1+1: syntax error: operand expected (error token is "/641438513/1+1")
Coverage threshold for splitting unitigs is 20 minimum ovl 250
Running assembly
runCA -s runCA.spec consensus=pbutgcns -p genome -d CA.mr.41.15.13.0.02 stopAfter=consensusAfterUnitigger mr.41.15.13.0.02.1.frg mr.41.15.13.0.02.1.mates.frg cgwErrorRate=0.12 useGrid=0 scriptOnGrid=0 merylThreads=16 frgCorrThreads=1 frgCorrConcurrency=12 cnsConcurrency=6 ovlCorrConcurrency=10 ovlConcurrency=10 ovlThreads=8 ovlMemory=8GB
Assembly stopped or failed, see CA.mr.41.15.13.0.02.log
[Sun 27 May 10:06:44 BST 2018] Assembly stopped or failed, see CA.mr.41.15.13.0.02.log

On checking the file CA.mr.41.15.13.0.02.log, it reported this:

runCA failed.

Stack trace:

at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/runCA line 1121.
main::caFailure('invalid unitigger specified (bogart); must be 'utg' or 'bog'', undef) called at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/runCA line 706
main::setParameters() called at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/runCA line 5314

Failure message:

invalid unitigger specified (bogart); must be 'utg' or 'bog'

The celera assembler version I am running is 6.1

no genome.ctg.fasta

We used MASURCA 3.2.6 for genome assembly but found no fasta file for contig sequences at the end. Only final.genome.scf.fasta appeared in the CA folder. Is this normal? Below is the content from the log file:

Processing pe library reads
Average PE read length 126
Using kmer size of 83 for the graph
cat: write error: Broken pipe
MIN_Q_CHAR: 33
Creating mer database for Quorum
Error correct PE.
Estimating genome size.
Estimated genome size: 697935949
Creating k-unitigs with k=83
Computing super reads from PE
Celera Assembler
ovlMerThreshold=75
Overlap/unitig success
recomputing A-stat for super-reads
recomputing A-stat for super-reads
Unitig consensus success
CA success
No gap closing possible.
Assembly complete, final scaffold sequences are in CA/final.genome.scf.fasta
All done

Thanks

masurca 3.2.5

I am running masurca for a (highly homozygote) plant genome of 1.3 Gb on a cluster with 2 Tb RAM and 90 cores. I have 100x Illumina coverage and ca 12x PacBio. Masurca is already running for 8 days and it predicted 89000 overlap jobs, that are running at the speed of ca 100/hour.
I have 2 questions:

at this speed one can predict only the overlap jobs will take 36 more days. Is this something that is expected for this configuration and genome size? It is important that I dicuss with the cluster administrators if that is the case.
I am running v3.2.5 for 8 days and now I noticed in a previous post you do not recommend that. Shall I stop and revert to 3.2.4? Is there any way I can still use the files already outputed by 3.2.5?
Thanks a lot

Gatekkeper fails due to genome.gkpStore existing

I have encountered an error in mega_reads_assemble_cluster.sh in line 610:
$CA_PATH/runCA -s runCA.spec -p genome -d $CA stopAfter=consensusAfterUnitigger $COORDS.1.frg $SR_FRG $OTHER_FRG 1>> $CA.log 2>&1
The previous step leaves the folder genome.gkpStore, which leads to runCA failing because the folder already exists. I checked the folder and it's empty. It only contains empty directories.
Erasing the folder before that line solves the issue and executes the code.

No pre-processing of Mate-pair/jump libraries?

Does Masurca have a module to detect the biotin stuffer sequence in Nextera mate-pair libraries and split the reads? Does, "IMPORTANT! Do not use third party tools top pre-process the Illumina data before providing it to MaSuRCA" apply to MP libraries. I'm guessing that Masurca uses kmer coverage data to compute lots of stuff and is sensitive to different trimming parameters. Since MP data is pretty biased and requires processing to be useful, I'm guessing it doesn't get used for this purpose. I have tried running Masurca with PE, PacBio and MP(unprocessed) data and the assemblies were very poor compared to using just PE and PacBio data (n50 of ~30 vs. 220kb, respectively), so I'm guessing there is at least something wrong with my raw MP data. Right not I'm running it again including the split MP data. This splitting was preceded by a trimming step recommended for bbtools splitnextera (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/split-nextera-guide/).

Related question: Does the MP data just come into play for the assembly? Can I use an existing output directory and rerun the assembly process including the MP data? I deleted the PE, MP, PB masurca directory and don't remember if there were MP.cor files :(

Thanks,
Earl

(also, thanks for developing this assembler!)

The assembly of tetraploid

I just tried to assembly the tetraploid with MaSuRca 3.2.7, but the file "PLOIDY.txt" shows "1", how can I fix that?

Assembly failure during terminator step

I received the error message below during an assembly with Masurca 3.2.4. The file with the incorrect array type is present in the 7-CGW folder and the 8-consensus step appears to have run successfully. Please let me know what I might do to correct this error. Thanks!

----------------------------------------START Wed Mar 21 09:50:44 2018
/clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/CA/Linux-amd64/bin/terminator -g /global/scratch/blackman/LRD/CA/genom
e.gkpStore -t /global/scratch/blackman/LRD/CA/genome.tigStore 13 -c /global/scratch/blackman/LRD/CA/7-CGW/genome 12 -o /global/scratch/blackm
an/LRD/CA/9-terminator/genome > /global/scratch/blackman/LRD/CA/9-terminator/genome.asm.err
/clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/CA/Linux-amd64/bin/terminator: AS_configure()-- AS_CGW_ERROR_RATE set t
o 0.15
====> Reading /global/scratch/blackman/LRD/CA/7-CGW/genome.ckp.12 at Wed Mar 21 09:50:44 2018

Expecting array of type <EdgeCGW_T> but read array of type <>
----------------------------------------END Wed Mar 21 09:50:56 2018 (12 seconds)
ERROR: Failed with signal HUP (1)
================================================================================

runCA failed.

Stack trace:

at /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/bin/../CA/Linux-amd64/bin/runCA line 1121.
main::caFailure('terminator failed', '/global/scratch/blackman/LRD/CA/9-terminator/terminator.err') called at /clusterfs/vector/home/gro
ups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/bin/../CA/Linux-amd64/bin/runCA line 4394
main::terminate() called at /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/bin/../CA/Linux-amd64/bin/runCA lin
e 5348

Failure message:

terminator failed

Error in assemble.sh

masurca 3.2.5 writes 'CA8/Linux-amd64/bin' in assemble.sh for starting the CA run, but the install path is actually just 'CA/Linux-amd64/bin'
this is the error:
[Mon Mar 5 09:45:54 CST 2018] Using linking mates Using CABOG from is /media/data/software/MaSuRCA-3.2.5/bin/../CA8/Linux-amd64/bin runCA not found at /media/data/software/MaSuRCA-3.2.5/bin/../CA8/Linux-amd64/bin! cat: CA_dir.txt: No such file or directory [Mon Mar 5 09:45:54 CST 2018] Assembly stopped or failed, see .log

jellyfish memory allocation error

FYI: estimated genome size is 2.2 Gb and coverage of illumina ~ 70X and PacBio coverage: ~14X

Getting memory allocation error below (using MaSuRCA v3.2.6) :

terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 770000000000 bytes of memoryterminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 770000000000 bytes of memory
./assemble.sh: line 120: 59122 Aborted jellyfish count -m 31 -t 16 -C -s $JF_SIZE -o k_u_hash_0 pe.cor.fa sj.cor.fa
Failed to open input file 'k_u_hash_0'
Estimated genome size:
Creating k-unitigs with k=99
./assemble.sh: line 125: *2: syntax error: operand expected (error token is "*2")
Creating k-unitigs with k=31
./assemble.sh: line 130: *2: syntax error: operand expected (error token is "*2")
Super reads failed, check super2.err and files in ./work2/

runCA failed and meryl failed -- Too many open files

Masurca-3.2.6b was running on a 144 CPUs workstation (Ubuntu 17.10) with 2 TB RAM available.
Unfortunately runCA and meryl steps failed in this command:

/nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/CA8/Linux-amd64/bin/meryl  -B -C -v -m 22 -memory 65536 -threads 158 -c 0  -L 2  -s /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore:chain  -o /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15
.17.0.029/0-mercounts/genome-C-ms22-cm0 > /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/0-mercounts/meryl.err 2>&1
----------------------------------------END Thu Apr 19 03:53:38 2018 (69 seconds)
ERROR: Failed with signal HUP (1)
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 1613.
        main::caFailure("meryl failed", "/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0"...) called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 2483
        main::runMeryl(22, 0, "-C", undef, undef, undef, "obt", 1) called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 2698
        main::meryl() called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 3667
        main::createOverlapJobs("trim") called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 4062
        main::overlapTrim() called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 6522

Here is the error log content

----------------------------------------
Last few lines of the relevant log file (/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/0-mercounts/meryl.err):

Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/uid (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qsb (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
Can fit 154190604 mers into table with prefix of 23 bits, using  414.000MB (   0.000MB for positions)
Can fit 154190604 mers into table with prefix of 23 bits, using  414.000MB (   0.000MB for positions)
Can fit 154190604 mers into table with prefix of 23 bits, using  414.000MB (   0.000MB for positions)
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/snm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/snm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/uid (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files

----------------------------------------
Failure message:

meryl failed

I restarted masurca several times, but the same issue persists (runCA and meryl failed).
Do you have an idea what might causes that bug?
Thanks.

overlapInCore issues, one set with 454 data, not another

MaSuRCA 3.2.2 RC1

MaSuRCA was run before on a related organism "previous" without any particular problems.
It had PacBIO, Illumina PE, and 454 data. Another organism "current" with a similar data
mix is having all sorts of problems at overlapInCore. The data is summarized below:

            previous     current
--------------------------------  CA reports
numFrags   101696805   225978692   <- CA number of seqs
merThresh        131         173   <-  it calculated this
ovl jobs        8839       58860   <- current 6.6X larger 
--------------------------------  statistics for input files
pe N        54131150    99999732   <- current 1.8X larger 
pe bp     7945611718 10027259444   <- current 1.3X larger 
mr N          329516     1384768   <- current 4.2X larger 
mr bp      807632269  5246427833   <- current 6.5X larger 
mates N       200170      707934   <- current 3.5X larger 
mates bp    80268170   283881534   <- current 3.5X larger 
454 N       43616768    74925978   <- current 1.7X larger 
454 bp    3093627853  5315577364   <- current 1.7X larger 
total N     98277604   177018412   <- != CA count
total bp    11.9 Gbp    20.9 Gbp

Previous had 454 data which was only one each paired and single libraries. Current has several. Each library was processed similarly, and the frg file settings are the same. Note the discrepancy in the number of sequences in the input vs. the fragments that CA reports. The runCA.spec file produced by MaSuRCA
differs only slightly from the one used for current:

diff $CURRENT/runCA.spec $PREVIOUS/runCA.spec
1c1
< batOptions=-repeatdetect 41 41 41 -el 64 
---
> batOptions=-repeatdetect 36 36 36 -el 63

The first problem is that the lines in ovlopt like:
-h 1-27573 -r 1-27573 --hashstrings 27573 --hashdatalen 100003008
apparently have a hashstrings value which is too large. When it runs with these values the jobs invariably just emit a series of messages:
ERROR: Hash table full
to their out files, and run forever. Some experimenting showed that reducing the --hashstrings value by 25% let jobs run. So a copy of overlap.sh was patched to reduce the hashstrings value by 25% at run time (also to only use a single thread, because -t 2 processes were often far below 200% CPU, but -t 1 processes never are), and a copy of runCA was patched to use that version of the script. So far it is up to 2250 jobs (in 18 hours) and none have hung completely. (The estimated ~20 day run time just for this phase is not wonderful, it seems pretty long for a 40CPU machine and an ~1Gbp organism.)

However....

There is a huge variation in run times. Some jobs complete in less than 5 minutes. Others have been running for more than 1000 minutes, and might run for who knows how much longer. That is at least a 200X difference in run times. All the overlapInCore jobs are using ~100% CPU.

Why is there such a huge variation?

Another issue is the amount of memory used, which is too little. Even the longest running jobs are only using 3045m VIRT and 2.7g RES in top. lscpu shows 40 "CPUs" and that is the number used, so around 120Gb of RAM is employed by these processes. The system has 512 Gb. I tried to make overlapInCore use more with the switch:
-M '8GB'
but it acted like the switch didn't exist. (Just checked the source code for OverlapInCore.C, the parameter reading section has no handling for "-M".) Is there some other way to induce this program to use more memory?

Thanks.

multiple errrors with 3.2.7

Good morning,
Masurca has generated the following pair of errors. I was wondering if they are related?
First,

[Tue Jul  3 18:41:15 EDT 2018] Processing pe library reads
[Tue Jul  3 19:19:34 EDT 2018] Average PE read length 250
[Tue Jul  3 19:19:34 EDT 2018] Using kmer size of 127 for the graph
[Tue Jul  3 19:19:36 EDT 2018] MIN_Q_CHAR: 33
[Tue Jul  3 19:19:36 EDT 2018] Creating mer database for Quorum
[Tue Jul  3 19:55:31 EDT 2018] Error correct PE.
[Tue Jul  3 22:25:02 EDT 2018] Estimating genome size.
[Tue Jul  3 22:45:18 EDT 2018] Estimated genome size: 2305482575
[Tue Jul  3 22:45:18 EDT 2018] Creating k-unitigs with k=127
[Wed Jul  4 00:37:37 EDT 2018] Computing super reads from PE 
Using CABOG from is /opt/packages/masurca/3.2.7/bin/../CA8/Linux-amd64/bin
stat: cannot stat ‘work1/superReadSequences.fasta’: No such file or directory
/opt/packages/masurca/3.2.7/bin/mega_reads_assemble_cluster.sh: line 127: /2305482575/3: syntax error: operand expected (error token is "/2305482575/3")
/opt/packages/masurca/3.2.7/bin/mega_reads_assemble_cluster.sh: line 128: [: -lt: unary operator expected
/opt/packages/masurca/3.2.7/bin/mega_reads_assemble_cluster.sh: line 129: [: -gt: unary operator expected

It's unclear to me why the cannot stat message appears. Indeed the superReadSequences.fasta is missing, yet the work1 directory indeed was created and a createLengthStatisticsFiles.Failed subdirectory exists.

The run continued despite this error message, with additional .log file information being produced:

Running mega-reads correction/assembly
Using mer size 15 for mapping, B=15, d=0.02
Estimated Genome Size 2305482575
Estimated Ploidy 
Using 24 threads
Output prefix mr.41.15.15.0.02
Using 30x of the longest ONT reads
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ufasta)
failed to extract the best long reads

I'm curious why masurca fails here. Could you specify what to search for in our system to recover the required gclib it seems to want?

Thanks

LHE_COVERAGE parameter

Hi,

I wish to confirm that the LHE_COVERAGE parameter is the estimated coverage of long reads given to the assembler. Currently its description is difficult to interpret.

RuntimeError Usage: Options_minmatch(self,m)

Hello,

We seem to be getting "RuntimeError Usage: Options_minmatch(self,m)" errors in our MaSurca run. Please see brief logs below:

[Fri Jun  1 09:54:31 BST 2018] Estimated genome size: 536282973
[Fri Jun  1 09:54:31 BST 2018] Creating k-unitigs with k=41
[Fri Jun  1 10:06:19 BST 2018] Creating k-unitigs with k=31
[Fri Jun  1 10:25:34 BST 2018] Filtering mate pairs
..
..
Output prefix mr.41.15.17.0.029
Pacbio coverage >10x, using 10x of the longest reads
Reducing super-read k-mer size
Mega-reads pass 1
Running locally in 1 batch
compute_psa 6939941 1423963606
Processed 500000 super reads, irreducible 420469, processing 14285 super reads per second
Processed 1000000 super reads, irreducible 769474, processing 35714 super reads per second
..
..
Processed 3500000 super reads, irreducible 2410744, processing 38461 super reads per second
Processed 4000000 super reads, irreducible 2783819, processing 45454 super reads per second
Mega-reads pass 2
Running locally in 1 batch
compute_psa 3073547 2330298699
Refining alignments
Attempt to free unreferenced scalar: SV 0x32db870, Perl interpreter: 0x7d5010 at /cm/shared/apps/MaSuRCA/3.2.6/bin/../lib/perl/mummer.pm line 262, <STDIN> line 2002.
..
..

Attempt to free unreferenced scalar: SV 0x558fd80, Perl interpreter: 0x7d5010 at /cm/shared/apps/MaSuRCA/3.2.6/bin/../lib/perl/mummer.pm line 262, <STDIN> line 18346.
RuntimeError Usage: Options_minmatch(self,m); at /cm/shared/apps/MaSuRCA/3.2.6/bin/refine_alignments.pl line 51, <STDIN> line 18346.
Joining
refine/join alignments failed
[Sat Jun  2 00:20:26 BST 2018] Assembly stopped or failed, see CA.mr.41.15.17.0.029.log

Any thoughts on whats going on here? Our first guess was that the error is coming from mummer where the --minmatch limit is causing issues..
The log file referenced at the assembly failure doe snot seem to exist at all so we could not debug further.

Aborted quorum_error_correct_reads

Hi, I'm actually using MaSuRCA-3.2.6 to assemble my genome and a ran the fallowing script:

    #PBS -S /bin/bash
    #PBS -l nodes=1:ppn=8:bigmem,mem=100gb
    #PBS -e /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.error
    #PBS -o /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.out
    #PBS -N ACG-006
    #PBS -q q1week
    
    
    DATA
    PE= pe 150 22 /pandata/LEPIWASP/ACG-0006_0027/frag_1.fastq /pandata/LEPIWASP/ACG-0006_0027/frag_2.fastq
    
    END
    
    PARAMETERS
    #set this to 1 if your Illumina jumping library reads are shorter than 100bp
    EXTEND_JUMP_READS=0
    #this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
    GRAPH_KMER_SIZE = auto
    #set this to 1 for all Illumina-only assemblies
    #set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
    #otherwise keep at 0
    USE_LINKING_MATES = 0
    #specifies whether to run mega-reads correction on the grid
    USE_GRID=0
    #specifies queue to use when running on the grid MANDATORY
    GRID_QUEUE=all.q
    #batch size in the amount of long read sequence for each batch on the grid
    GRID_BATCH_SIZE=300000000
    #coverage by the longest Long reads to use
    LHE_COVERAGE=30
    #this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms 
    LIMIT_JUMP_COVERAGE = 300
    #these are the additional parameters to Celera Assembler.  do not worry about performance, number or processors or batch sizes -- these are computed automatically. 
    #set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
    CA_PARAMETERS =  cgwErrorRate=0.15
    #minimum count k-mers used in error correction 1 means all k-mers are used.  one can increase to 2 if Illumina coverage >100
    KMER_COUNT_THRESHOLD = 1
    #whether to attempt to close gaps in scaffolds with Illumina data
    CLOSE_GAPS=1
    #auto-detected number of cpus to use
    NUM_THREADS = 16
    #this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
    JF_SIZE = 200000000
    #set this to 1 to use SOAPdenovo contigging/scaffolding module.  Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data
    SOAP_ASSEMBLY=0
    END

Then, I got the asemble.sh file and I ran it as well and got the following .out:

[Sat Jun 16 22:32:45 CEST 2018] Processing pe library reads
 [Sat Jun 16 22:49:04 CEST 2018] Average PE read length 150
 [Sat Jun 16 22:49:05 CEST 2018] Using kmer size of 49 for the graph
 [Sat Jun 16 22:49:06 CEST 2018] MIN_Q_CHAR: 33
 WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1115876884, this automatic increase may be not enough!
 [Sat Jun 16 22:49:06 CEST 2018] Creating mer database for Quorum
 [Sat Jun 16 23:09:23 CEST 2018] Error correct PE.
 [Sat Jun 16 23:11:49 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.

`and .error: `

 /panhome/TOOLS/MaSuRCA-3.2.6/assemble.sh: line 102: 46750 Aborted                 quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)
 ) --contaminant=/panhome/TOOLS/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 16 -w 10 -e 3 -M quorum_mer_db.jf pe.re
 named.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1

Does someone have an idea of what is going on here? Thanks for your help.

The 2 fasta files are comming from an illumina Hiseq 3000 150bp and the genome size of my specie is around 1.5 GB.

Results should have lots of duplication.

Dear Dr. Zimin,
I used MaSuRCA to assembly a plant genome. The PE sequence predicted the genome have about 2% heterozygous and genome size is about 1.7 Gb. I used about four short read library and one PacBio library inculding about 50x PE data and about 30 x PacBio reads.
My experience is MaSuRCA will take very long time two steps:
One is mega-read cluster using PacBio read, more than three months.
Other one is overlap .frg files more than three months since I have about 240,000 overlaps.
I change somethings to fast finish the assembly. Maybe the change is not correct. One is the PacBio reads was 1000 chunks. I am not sure that this change is correct.
Other is overlap files. But I think this change is correct.
Finally,
I get the genome sequence as following results. I think my genome sequence have lots of duplication.
Could help me check the results and give me some suggestion for improvement my results?
Thanks,
Fuyou

#BUSCO was run in mode: genome
Summarized benchmarks in BUSCO notation:
C:83%[D:53%],F:3.4%,M:12%,n:1440
Representing: 1205
Complete Single-Copy BUSCOs 764
Complete Duplicated BUSCOs 49
Fragmented BUSCOs 186
Missing BUSCOs 1440
Total BUSCO groups searched

====================
Scaffolds | withGaps | withoutGaps

#Seqs | 137,936
Min | 101 | 101
1st Qu.| 8,009 | 7,987
Median | 13,537 | 13,476
Mean | 22,901 | 22,790
3rd Qu.| 23,890 | 23,709
Max | 1,020,449 | 1,018,898
Total | 3,159,008,652 | 3,143,578,659
n50 | 36,831 | 36,555
n90 | 10,703 | 10,654
n95 | 8,089 | 8,063

Contigs | withNs | withoutNs

#Seqs | 145,321
Min | 70 | 70
1st Qu.| 7,669 | 7,669
Median | 13,041 | 13,041
Mean | 21,631 | 21,631
3rd Qu.| 22,638 | 22,637
Max | 967,921 | 967,901
Total | 3,143,578,659 | 3,143,503,525
n50 | 33,855 | 33,855
n90 | 10,263 | 10,263
n95 | 7,715 | 7,715

Gaps

#Seqs | 7,385
Min | 25
1st Qu.| 676
Median | 1,514
Mean | 2,089
3rd Qu.| 2,965
Max | 18,505
Total | 15,429,993
n50 | 3,390
n90 | 1,094
n95 | 750

Non-gapped Ns Count: 75134

Unitig has no placement using v3.6.2

Hi,

I am assembling several closely related draft genomes with masurca v3.6.2 (not beta) based on illumina PE-only libs. Most assemblies finish without problems but one assembly fails to create one consensus unitig.
(from runCA3 and CA/7-0-CGW/cgw.out)

ERROR:  Unitig 30217 has no placement; probably not run through consensus.
Segmentation fault (core dumped)

as I am using the official release v 3.6.2, CA attempts to fix this problem but fails
(from CA/fix_unitig_consensus/unitig_failures)
../5-consensus/genome_017.err:MultiAlignUnitig()-- Unitig 30217 FAILED. Could not align fragment 1034014.

I supose fragment 1034014 fails to align and as a consequence, no consensus unitig is produced (i.e. no UTG in either version 2 or 3 of the tigStore)
(from tigStore v1:)

...
FRG type R ident  18705479 container   1199472 parent   1199472 hang    217    -97 position    769    619
FRG type R ident   1011721 container         0 parent   1058008 hang    170    184 position    660   1174
FRG type R ident   1034014 container         0 parent   1472858 hang     90    189 position    660   1167
FRG type R ident  13210399 container   1011721 parent   1011721 hang     92   -272 position    752    902
FRG type R ident  13210403 container   1011721 parent   1011721 hang     92   -272 position    752    902
...

If I extract Unitig 30217 from the tigstore, manually remove fragment 1034014, replace the tigstore version 1 entry and try to generate a unitig consensus

tigStore -g genome.gkpStore -t genome.tigStore 1 -d layout -u 30217 > unitig30217.tmp
tigStore -g genome.gkpStore -t genome.tigStore 1 -R unitig30217.tmp
utgcns -g genome.gkpStore -t genome.tigStore 1 3 -u 30217

it claims to be successful

MultiAlignStore::dumpMASRfile()-- Writing 'genome.tigStore/seqDB.v002.p003.utg' partitioned.

NumColumnsInUnitigs             = 0
NumGapsInUnitigs                = 0
NumRunsOfGapsInUnitigReads      = 0
NumColumnsInContigs             = 0
NumGapsInContigs                = 0
NumRunsOfGapsInContigReads      = 0
NumAAMismatches                 = 0
NumVARRecords                   = 0
NumVARStringsWithFlankingGaps   = 0
NumUnitigRetrySuccess           = 0

Consensus finished successfully.  Bye.

but does not produce the required UTG entry in tigStore version 3 (or 2):

unitig 30217
len 0
cns 
qlt 
data.unitig_coverage_stat -4.867138
data.unitig_microhet_prob 1.000000
data.unitig_status        X
data.unitig_unique_rept   X
data.contig_status        U
data.num_frags            30
data.num_unitigs          0
FRG type R ident   1502113 container         0 parent   1214644 hang   -208   -266 position    402      0
FRG type R ident  16305263 container   1502113 parent   1502113 hang    167    -85 position    317    167
FRG type R ident  10313351 container   1502113 parent   1502113 hang    173    -79 position    323    173
FRG type R ident  14227777 container   1502113 parent   1502113 hang    192    -60 position    342    192
FRG type R ident  11548311 container   1502113 parent   1502113 hang    202    -50 position    352    202
FRG type R ident   1214644 container         0 parent   1502113 hang    208    266 position    208    668
FRG type R ident   1864621 container   1214644 parent   1214644 hang     34    -99 position    571    242
FRG type R ident   1880555 container   1214644 parent   1214644 hang     50    -87 position    258    583
FRG type R ident  18705478 container   1214644 parent   1214644 hang     62   -248 position    270    420
FRG type R ident   1304841 container         0 parent   1214644 hang     96     76 position    304    744
FRG type R ident  18891280 container   1214644 parent   1214644 hang     99   -211 position    307    457
FRG type R ident  18895288 container   1214644 parent   1214644 hang     99   -211 position    307    457
FRG type R ident   9462916 container   1214644 parent   1214644 hang    106   -204 position    464    314
FRG type R ident   1771722 container   1214644 parent   1214644 hang    110      0 position    318    668
FRG type R ident  14883894 container   1214644 parent   1214644 hang    152   -158 position    510    360
FRG type R ident  12415492 container   1214644 parent   1214644 hang    169   -141 position    527    377
FRG type R ident   1653488 container         0 parent   1304841 hang     80     14 position    758    384
FRG type R ident   1199472 container         0 parent   1653488 hang     18    108 position    402    866
FRG type R ident   1199473 container   1199472 parent   1199472 hang      0      0 position    402    866
FRG type R ident  11474300 container   1199473 parent   1199473 hang     81   -233 position    633    483
FRG type R ident  11474474 container   1199473 parent   1199473 hang     81   -233 position    633    483
FRG type R ident   1058008 container         0 parent   1199472 hang     88    124 position    990    490
FRG type R ident   3571944 container   1058008 parent   1058008 hang      9   -341 position    649    499
FRG type R ident   9092394 container   1199473 parent   1199473 hang    111   -203 position    663    513
FRG type R ident   9092814 container   1199473 parent   1199473 hang    111   -203 position    663    513
FRG type R ident  18891281 container   1058008 parent   1058008 hang     31   -319 position    671    521
FRG type R ident  18895289 container   1058008 parent   1058008 hang     31   -319 position    671    521
FRG type R ident   1472858 container         0 parent   1199472 hang    168    112 position    570    978
FRG type R ident  18705479 container   1199472 parent   1199472 hang    217    -97 position    769    619
FRG type R ident   1011721 container         0 parent   1058008 hang    170    184 position    660   1174

I am totally happy to completely delete the problematic unitig because the assembly merely serves to error correct long reads, for which I use alternative approaches in parallel.
However, I don't seem to use the correct syntax and also don't know how to remove it from all versions of the tigstore (the following just prints the help function):

tigStore -g genome.gkpStore -t genome.tigStore 1 -D -u 30217

Any suggestion on how to fix the problematic unitig or kick it out completely would be much appreciated!

Thanks and best,
Evelien

ERROR: failed to merge alignments at position 554

Hello MaSuRCA team,
I used MaSuRCA (version: MaSuRCA-3.2.7) to de novo assemble Brassica genomes. However, I meet the following error messeage:

Running locally in 1 batch
compute_psa 2044593 5215127586
Refining alignments
Joining

ERROR: failed to merge alignments at position 554

   Please file a bug report

terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
xargs: nucmer: terminated by signal 6
Generating assembly input files
Coverage threshold for splitting unitigs is 20 minimum ovl 250
Running assembly
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

compiling issue

Dear Mr Zimin,

I'd like to compile Masurca on our server. I've followed all your steps, but once I run make, I receive the attached error message (
error.txt
). Please can you help me to solve my problem.

Cheers

Bastian Heimburger

Using Nanopore, PacBio, and Illumina together

Hello Aleksey,

I was wondering if I could use Nanopore and Pacbio together for the hybrid assembly. We have about 70x pacbio, 20x long-read Nanopore, and ~ 100x Illumina for a 650 Mb genome. What do you recon is the best strategy I can use with Masurca?

Thank you in advance!

MaSuRCA-3.2.7 fail to create kunitigs

Hello,
masurca-3.2.7 has stopped and generated the following error in super2.err file:

Error with file '/ceph/sge-tmp/jnguinka/fbn.fish.gen/Assemblies/MaSuRCA/guillaumeKUnitigsAtLeast32bases_all.jump.fasta'
Output file "work2/kUnitigLengths.txt" is of size 0, must be at least of size 1. Bye!
mv work2/numKUnitigs.txt work2/createLengthStatisticsFiles.Failed
mv work2/maxKUnitigNumber.txt work2/createLengthStatisticsFiles.Failed
mv work2/kUnitigLengths.txt work2/createLengthStatisticsFiles.Failed

Here is the content of quorum.err

[2018/07/27 17:43:53] Loading mer database
[2018/07/27 19:11:49] Loading contaminant sequences
[2018/07/27 19:11:49] Computing Poisson cutoff
[2018/07/27 19:25:46] distinct mers:5419768881 total mers:79782915223 estimated coverage:14.7207
[2018/07/27 19:25:46] lambda:0.0490691 collision_prob:0.00333333 poisson_threshold:0.0001
[2018/07/27 19:25:46] Using cutoff of 4
[2018/07/27 19:25:46] Correcting reads
[2018/07/28 01:21:11] Done

The following 03 filles are all empty:

guillaumeKUnitigsAtLeast32bases_all.jump.fasta.tmp
guillaumeKUnitigsAtLeast32bases_all.fasta.tmp
ESTIMATED_GENOME_SIZE.txt

Every second read in pe.cor.tmp.log and sj.cor.tmp.log is skipped
Here is their content of pe.cor.tmp.log:

tail -100 pe.cor.tmp.log
Skipped pe2761295299: No high quality mer
Skipped pe2761295053: No high quality mer
Skipped pe2761295055: No high quality mer
Skipped pe2761295057: No high quality mer
Skipped pe2761295059: No high quality mer
Skipped pe2761295061: No high quality mer
Skipped pe2761295063: No high quality mer
Skipped pe2761295065: No high quality mer
Skipped pe2761295067: No high quality mer
Skipped pe2761295069: No high quality mer
Skipped pe2761295071: No high quality mer
Skipped pe2761295073: No high quality mer
Skipped pe2761295075: No high quality mer

tail sj.cor.tmp.log
Skipped m2363681195: No high quality mer
Skipped m2363700910: No high quality mer
Skipped m2363700911: No high quality mer
Skipped m2363727470: No high quality mer
Skipped m2363727471: No high quality mer
Skipped m2363765722: No high quality mer
Skipped m2363765723: No high quality mer
Skipped m2363801496: No high quality mer
Skipped m2363801497: No high quality mer
Skipped m2363809641: No high quality mer


My OS is : Ubuntu 16.04 

I have no clue what might have gone wrong and at which step.

I would appreciate any help, 
Thanks

Difference bewteen "final.genome.scf.fasta" and "9-terminator/genome.scf.fasta"

After running Masurca, I found out that the statistics from CA.mr.41.15.15.0.02.log are computed from the fasta file found in the 9-terminator folder.
In Masurca doc it is said the output is final.genome.scf.fasta, however the total length of this file is lower than the one found in the 9-terminator folder.

I'm a bit confused as to which assembly is the correct one, and why there is a different in length (and associated stats) with these two assemblies?

Cheers,
Martin Binet

make errors

make
autoreconf -fi
Fcntl.c: loadable library and perl binaries are mismatched (got handshake key 0xdb00080, needed 0xde00080)
make: *** [configure] Error 1

ESTIMATED_GENOME_SIZE value were twice the size of the genome size

we use masurca the assembly the genome, we got the ESTIMATED_GENOME_SIZE.txt file, the value in ESTIMATED_GENOME_SIZE.txt were twice the size of the genome size

Unitig failure in CA

Hi,
I was using masurca to assemble a genome. Unfortunately, I got an error which I cannot fix, can you please help to fix it.

cat: write error: Broken pipe
mkdir: cannot create directory `CA/fix_unitig_consensus': File exists
INSERTING unitig 82475
MultiAlignStore::dumpMASRfile()-- Writing '../genome.tigStore/seqDB.v001.p005.utg' partitioned.
INSERTING unitig 151276
MultiAlignStore::dumpMASRfile()-- Writing '../genome.tigStore/seqDB.v001.p008.utg' partitioned.

In runCA3.out, it says

ERROR: Failed with signal SEGV (11)
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 1121.
        main::caFailure("scaffolder failed", "/30days/GROUPS/Q0196RW/yyuan/myProject/masurca_20+30/"...) called at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 4066
        main::CGW("7-0-CGW", undef, "/30days/GROUPS/Q0196RW/yyuan/myProject/masurca_20+30/"..., 2, undef, 1) called at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 4260
        main::scaffolder() called at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 5346

----------------------------------------
Last few lines of the relevant log file (/30days/GROUPS/Q0196RW/yyuan/myProject/masurca_20+30/CA/7-0-CGW/cgw.out):

...processed 61000000 fragments.
...processed 62000000 fragments.
...processed 63000000 fragments.
...processed 64000000 fragments.
...processed 65000000 fragments.
...processed 66000000 fragments.
...processed 67000000 fragments.
...processed 68000000 fragments.
...processed 69000000 fragments.
...processed 70000000 fragments.
...processed 71000000 fragments.
...processed 72000000 fragments.
...processed 73000000 fragments.
...processed 74000000 fragments.
...processed 75000000 fragments.
...processed 76000000 fragments.
...processed 77000000 fragments.
...processed 78000000 fragments.
Reading unitigs.
ERROR:  Unitig 82475 has no placement; probably not run through consensus.

----------------------------------------
Failure message:

scaffolder failed

I also looked at http://wgs-assembler.sourceforge.net/wiki/index.php/Unitig_Consensus_Failures_in_CA_6. However, it didn't give a clear solution.

Assembly stopped or failed, see CA.mr.41.15.13.0.02.log

Dear Team MaSuRCA,
I'm getting the below error in MaSuRCA - 3.2.2
"Assembly stopped or failed, see CA.mr.41.15.13.0.02.log"
CA.mr.41.15.13.0.02.log
config.txt

Please resolve ASAP.

Thanks
Regards,
Hithesh

estimated genome size value by masurca is half the genome size

Estimated genome size text file is showing the value that is half the genome size. Should I be concerned?

Could not parse delta file, /dev/stdin

MaSuRCA 2.3.1 running on Ubuntu 14.04 LTS produces work1 directory and super1.err output file, then reports "Refining alignments". The next output is "ERROR: Could not parse delta file, /dev/stdin", followed by "error no: 402", then by "rm: cannot remove 't..matches.0.maximal_mr.fa' - this continues for a series of tmp files numbered up to about 850

Confusion about configuration file

Hi Aleksey,

thanks for the great program!

I've been testing MaSurCA with a few different genomes because we are evaluating which assembler will perform best for out data ( ~8-10x Nanopore, ~20-40x Illumina, 2-3GB genome size).

I'm a bit confused about what is your recommended best configuration file, because in your README on GitHub you seem to have two configuration files as examples, however with varying recommendations. For instance:

File 1:
#set this to 1 for all Illumina-only assemblies

#set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs

#otherwise keep at 0

USE_LINKING_MATES = 0

File 2
• USE_LINKING_MATES=1

most of the paired end reads end up in the same super read and thus are not passed to the assembler. Those that do not end up in the same super read are called ”linking mates” . The best assembly results are achieved by setting this parameter to 1 for Illumina-only assemblies. If you have more than 2x coverage by long (454, Sanger, etc) reads, set this to 0.

Now our data falls in-between 2x and 20x long-read coverage, so you understand my confusion. Could you maybe edit the README so that it is less confusing? Thank you!

jellyfish in runCA1 fails because $ovlT is not set

My masurca run failed in the runCA1 step with the message "Jellyfish failed". When I look at the logs and the scripts I think it is because the $ovlIT variable is not set. Interestingly, in this run this variable is set to "" (=it is empty) in the environment.sh file. After a bit of digging, I think this is because I provide a ESTIMATED_GENOME_SIZE.txt file. Due to that the line "jellyfish count -m 31 -t 36 -C -s $JF_SIZE -o k_u_hash_0 pe.cor.fa" in the assemble.sh script is not executed and therefore no k_u_hash_0 file is written. Later in the assemble.sh this k_u_hash_0 is needed to set the ovlMerThreshold variable with jellyfish. But because the file is not available, ovlMerThreshold ends up empty and therefore it is empty also in the environment.sh file.
Should I not provide an ESTIMATED_GENOME_SIZE.txt file? Or could the code be updated so that the assemble.sh always creates a k_u_hash_0 file?

And by the way, thanks for the great software!
Stefan

Absolute paths in release tarballs...

I'm trying to package masurca for bioconda, however the conda build process is failing to successfully extract masurca tarballs. This is a result of the tarballs being created with a preceding '/' on the path, which conda build tries to extract equivalent to 'tar -P' , and subsequently fails to create a /MaSuRCA-3.2.4 directory.

I've raised this issue with the bioconda team who will try to resolve the issue upstream with the conda developers, but in the meantime would it be possible to generate release tarballs with relative paths (and no preceding '/').

Many thanks
James

mean and standard deviation of insert size?

Given the fastq files of paired-end data and mate-pair data, how does one calculate the mean and standard deviation of the insert size?
Thanks

Running assembly :1 overlap correction jobs failed with buffer overflow detected error

Hi, my run failed duringRunning assembly. I've tried to rerun it, but got exactly the same error. Any hints?
Bests,

----------------------------------------START CONCURRENT Tue Mar 20 16:57:51 2018
/home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/ovlcorr.sh 1 > /home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/0001.err 2>&1
----------------------------------------END CONCURRENT Tue Mar 20 16:57:58 2018 (7 seconds)
Overlap correction job 1 (/home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/0001) failed.
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 1613.
        main::caFailure('1 overlap correction jobs failed; remove /home/lpryszcz/clust...', undef) called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 4514
        main::overlapCorrection() called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 6526

----------------------------------------
Failure message:

1 overlap correction jobs failed; remove /home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/ovlcorr.sh (or run by hand) to try again

Overlap correction job 1 (/home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/0001) failed.
================================================================================

runCA failed.

----------------------------------------
Stack trace:

 at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 1613.
        main::caFailure('1 overlap correction jobs failed; remove /home/lpryszcz/clust...', undef) called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 4514
        main::overlapCorrection() called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 6526

----------------------------------------
Failure message:

1 overlap correction jobs failed; remove /home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/ovlcorr.sh (or run by hand) to try again

memory usage

Hi I am using masurca (v3.2.6) assembler for assembly of plat with 1Gbp genome and I can observer quite high memory usage in command assembly.sh in line:

create_k_unitigs_large_k -c $(($KMER-1)) -t 32 -m $KMER -n $(($ESTIMATED_GENOME_SIZE*2)) -l $KMER -f perl -e 'print 1/'$KMER'/1e5' pe.cor.fa | grep --text -v '^>' | perl -ane '{$seq=$F[0]; $F[0]=~tr/ACTGactg/TGACtgac/;$revseq=reverse($F[0]); $h{($seq ge $revseq)?$seq:$revseq}=1;}END{$n=0;foreach $k(keys %h){print ">",$n++," length:",length($k),"\n$k\n"}}' > guillaumeKUnitigsAtLeast32bases_all.fasta.tmp && mv guillaumeKUnitigsAtLeast32bases_all.fasta.tmp guillaumeKUnitigsAtLeast32bases_all.fasta

specifically seccond perl command:

perl -ane '{$seq=$F[0]; $F[0]=~tr/ACTGactg/TGACtgac/;$revseq=reverse($F[0....

is using over 750G RAM, is this memory usage in this step normal? Do I have to use server with more RAM? Or is the a way how to decrease memory usage?
Best regards,
Petr

Restart run from version 3.2.6

Dear Dr. Zimin,
Would it be possible to restart a run from version 3.2.6 using version 3.2.7 and if so, at what stage? Specifically, if at the megaread stage, do all the megaread (mr.41.15.17.0.029) files need to be recalculated, or can 3.2.7 pick up after mr.41.15.17.0.029.txt is created, but before the other mr files such as mr.41.15.17.0.029.all_mr.fa are created?
This would be a major time saver for comparing assemblies, as the mr.41.15.17.0.029.txt file took many days to create but the remaining mr files took only a short time.
Thanks for the update and software.
tsetsob

estimating JF_SIZE

Hi,
The readme.md file lists two possible ways to estimate an appropriate value for the JF_SIZE parameter. The first one listed on that page states:

#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 200000000

However a little later on in the document where an example is provided, the way proposed to derive that value is possibly a bit different:

JF_SIZE=2000000000
jellyfish hash size, set this to about 10x the genome size.

I have two questions related to this value:

I have only two kinds of read types to use in my genome assembly: paired-end Illumina data and long-read Nanopore data. If I was to estimate by coverage, should I estimate that according to all data, or just short (or long?) read data?
I'm working with a mammalian genome that is about 2 Gb. Taking into consideration that this is a moderately large genome, is there a minimum amount of memory requirement that should be allocated when the JF_SIZE parameter gets beyond a certain value?

Thanks very much!

Failure on Illumina-ONT hybrid assembly

Hi there

Hybrid assembly gives following error:

Using CABOG from is MaSuRCA-3.2.6/bin/../CA8/Linux-amd64/bin
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=15, d=0.02
Estimated Genome Size 368796815
Estimated Ploidy 1
Using 12 threads
Output prefix mr.41.15.15.0.02
Using 30x of the longest ONT reads
Reducing super-read k-mer size
Mega-reads pass 1
Running locally in 1 batch
compute_psa 4935290 1781966868

Processed 500000 super reads, irreducible 381804, processing 2222 super reads per second
Processed 1000000 super reads, irreducible 706011, processing 1818 super reads per second
Processed 1500000 super reads, irreducible 1042813, processing 1930 super reads per second
Processed 2000000 super reads, irreducible 1421777, processing 1779 super reads per second
Processed 2500000 super reads, irreducible 1876509, processing 1779 super reads per second
Mega-reads pass 2
Running locally in 1 batch
compute_psa 2280190 2472577921
Refining alignments
ERROR: failed to merge alignments at position 487
Please file a bug report
xargs: refine.sh: exited with status 255; aborting
Joining
awk: cmd. line:1: fatal: cannot open file `mr.41.15.15.0.02.all.txt' for reading (No such file or directory)
MaSuRCA-3.2.6/bin/mega_reads_assemble_cluster.sh: line 504: mr.41.15.15.0.02.all.txt: No such file or directory
mega-reads joining failed
[Tue Jul 24 22:56:40 UTC 2018] Assembly stopped or failed, see CA.mr.41.15.15.0.02.log

Can you give any advice?

Nick

Error correction of PE reads failed

Hi there,
I tried to use MaSuRCA-3.2.6 to assemble a genome(size about 500M). And we have 200x Illumina and 10x pacbio. But I meet a problem(log show below).

[Mon May 14 06:14:24 CEST 2018] Creating mer database for Quorum
terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 220000000000 bytes of memory
./assemble.sh: line 99: 47367 Exit 2 awk '{print substr($0,1,200)}' p1.renamed.fastq p2.renamed.fastq p3.renamed.fastq
47368 Aborted (core dumped) | quorum_create_database -t 40 -s $JF_SIZE -b 7 -m 24 -q $((MIN_Q_CHAR + 5)) -o quorum_mer_db.jf.tmp /dev/stdin
[Mon May 14 06:14:25 CEST 2018] Error correct PE.
./assemble.sh: line 106: 47371 Aborted (core dumped) quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)) --contaminant=/home/lin/liyanbo/Tools/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 40 -w 10 -e 3 -M quorum_mer_db.jf p1.renamed.fastq p2.renamed.fastq p3.renamed.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
[Mon May 14 06:14:25 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.

And there is no pe.cor.log only have quorum.err which says that:
[2018/05/14 06:14:25] Loading mer database
terminate called after throwing an instance of 'std::runtime_error'
what(): Can't open 'quorum_mer_db.jf' for reading

Any advice to help me get up and running would be appreciated.

gzip: stdout: Broken pipe

Hi,
I am running MaSurCa version 3.2.6 but I got numerous Broken pipe massages which probably lead that it failed.

Here is my config file:

# example configuration file

# DATA is specified as type {PE,JUMP,OTHER,PACBIO} and 5 fields:
# 1)two_letter_prefix 2)mean 3)stdev 4)fastq(.gz)_fwd_reads
# 5)fastq(.gz)_rev_reads. The PE reads are always assumed to be
# innies, i.e. --->.<---, and JUMP are assumed to be outties
# <---.--->. If there are any jump libraries that are innies, such as
# longjump, specify them as JUMP and specify NEGATIVE mean. Reverse reads
# are optional for PE libraries and mandatory for JUMP libraries. Any
# OTHER sequence data (454, Sanger, Ion torrent, etc) must be first
# converted into Celera Assembler compatible .frg files (see
# http://wgs-assembler.sourceforge.com)
DATA
#PE= pe 180 20  /FULL_PATH/frag_1.fastq  /FULL_PATH/frag_2.fastq
#JUMP= sh 3600 200  /FULL_PATH/short_1.fastq  /FULL_PATH/short_2.fastq

#read length 260
PE= pa 21 2 /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R1_001.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R2_001.fastq.gz
PE= pb 21 2 /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R1_002.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R2_002.fastq.gz
...

PE= pm 464 26 /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R1_001.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R2_001.fastq.gz
PE= pn 464 26 /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R1_002.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R2_002.fastq.gz
...
#pacbio reads must be in a single fasta file! make sure you provide absolute path
PACBIO=/work/waterhouse_team/All_RawData/Benth/PacBio_gDNA/PacB_NbAll_Temp1R.fasta
#OTHER=/FULL_PATH/file.frg
END

PARAMETERS
#set this to 1 if your Illumina jumping library reads are shorter than 100bp
EXTEND_JUMP_READS=0
#this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
GRAPH_KMER_SIZE = auto
#set this to 1 for all Illumina-only assemblies
#set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
#otherwise keep at 0
USE_LINKING_MATES = 0
#specifies whether to run mega-reads correction on the grid
USE_GRID=0
#specifies queue to use when running on the grid MANDATORY
GRID_QUEUE=all.q
#batch size in the amount of long read sequence for each batch on the grid
GRID_BATCH_SIZE=300000000
#coverage by the longest Long reads to use
LHE_COVERAGE=30
#this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms
LIMIT_JUMP_COVERAGE = 300
#these are the additional parameters to Celera Assembler.  do not worry about performance, number or processors or batch sizes -- these are computed automatically.
#set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
CA_PARAMETERS =  cgwErrorRate=0.15
#minimum count k-mers used in error correction 1 means all k-mers are used.  one can increase to 2 if Illumina coverage >100
KMER_COUNT_THRESHOLD = 1
#whether to attempt to close gaps in scaffolds with Illumina data
CLOSE_GAPS=0
#auto-detected number of cpus to use
NUM_THREADS = 8
#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 544000000000
#set this to 1 to use SOAPdenovo contigging/scaffolding module.  Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data
SOAP_ASSEMBLY=0
END

Here is the output:

Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
nucmer OK
mega_reads_assemble_cluster.sh OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Tue Jun 26 12:54:26 AEST 2018] Processing pe library reads

gzip: 
gzip: stdout: Broken pipe
stdout: Broken pipe

gzip: stdout: Broken pipe
...
[Tue Jun 26 13:36:55 AEST 2018] Average PE read length 250
[Tue Jun 26 13:37:02 AEST 2018] Using kmer size of 127 for the graph
cat: write error: Broken pipe
[Tue Jun 26 13:37:03 AEST 2018] MIN_Q_CHAR: 33
[Tue Jun 26 13:37:03 AEST 2018] Creating mer database for Quorum
awk: cmd. line:1: (FILENAME=pa.renamed.fastq FNR=680) fatal: print to "standard output" failed (Broken pipe)
./assemble.sh: line 143: 64061 Exit 2                  awk '{print substr($0,1,200)}' pa.renamed.fastq pb.renamed.fastq pc.renamed.fastq pd.renamed.fastq pe.renamed.fastq pf.renamed.fastq pg.renamed.fastq ph.renamed.fastq pi.renamed.fastq pj.renamed.fastq pk.renamed.fastq pl.renamed.fastq pm.renamed.fastq pn.renamed.fastq po.renamed.fastq pp.renamed.fastq pq.renamed.fastq pr.renamed.fastq ps.renamed.fastq pt.renamed.fastq pu.renamed.fastq pv.renamed.fastq pw.renamed.fastq px.renamed.fastq py.renamed.fastq
     64062 Killed                  | quorum_create_database -t 8 -s $JF_SIZE -b 7 -m 24 -q $((MIN_Q_CHAR + 5)) -o quorum_mer_db.jf.tmp /dev/stdin
[Tue Jun 26 13:40:57 AEST 2018] Error correct PE.
./assemble.sh: line 150: 64406 Aborted                 (core dumped) quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)) --contaminant=/lustre/work-lustre/waterhouse_team/apps/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 8 -w 10 -e 3 -M quorum_mer_db.jf pa.renamed.fastq pb.renamed.fastq pc.renamed.fastq pd.renamed.fastq pe.renamed.fastq pf.renamed.fastq pg.renamed.fastq ph.renamed.fastq pi.renamed.fastq pj.renamed.fastq pk.renamed.fastq pl.renamed.fastq pm.renamed.fastq pn.renamed.fastq po.renamed.fastq pp.renamed.fastq pq.renamed.fastq pr.renamed.fastq ps.renamed.fastq pt.renamed.fastq pu.renamed.fastq pv.renamed.fastq pw.renamed.fastq px.renamed.fastq py.renamed.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
[Tue Jun 26 13:40:57 AEST 2018] Error correction of PE reads failed. Check pe.cor.log.

pe.cor.log has not be generated.

What did I miss?

Thank you in advance.

Michal

MaSuRCA -3.2.5 slurm job

I am running MaSuRCA on a cluster which supports SLURM. what should I do If I want to submit multiple jobs for hybrid assembly with illumina and pacbio?

make error invalid unsigned integer 10M

Checking out the latest version 3.2.6b and building it at Ubuntu 16.4 produces the following error:

make[2]: Entering directory '/mnt/soft/masurca/build/global/MUMmer'
  YAGGO    tests/generate_sequences_cmdline.hpp
/mnt/soft/masurca/MUMmer/tests/generate_sequences_cmdline.yaggo:17: In Option genome-size|G: Option genome-size|G: Invalid unsigned integer '10M'
Makefile:2603: recipe for target 'tests/generate_sequences_cmdline.hpp' failed
make[2]: *** [tests/generate_sequences_cmdline.hpp] Error 1
make[2]: Leaving directory '/mnt/soft/masurca/build/global/MUMmer'
Makefile:849: recipe for target 'install-special' failed
make[1]: *** [install-special] Error 2

Yaggo has version number 1.5.10

Disk space

Hi,

I am running a de novo hybrid assembly with illumina PE and pacbio long reads, and i'm facing a disk space issue. The intermediate files that MaSuRCA writes is consuming all my disk space and the assembly have failed several times because of it.
At this point, my assembly running the following script: /usr/local/MaSuRCA-3.2.4/bin/translateReduceFile.perl work1_mr1/superReadNames.txt work1_mr1/reduce.tmp > work1_mr1/reduce.tmp.renamed
I would like to know if there is any files that are safe to delete from the output files.

Thanks a lot,

Isabela

how to completely uninstall masurca

how to completely uninstall masurca?
thank you

Scaffolding stage 7: cgw running only one thread, verry long time

I have a 1.2 GB plant genome with ~100X illumina plus ~5X pacbio. Overall 3.2.6b has been running on a 64 core machine for 25 days now, and has been in the stage 7 scaffolding for the past 18 days. Early stages made use of all the cores but cgw is running in only single thread. Is this expected behavior, and how long should I let it go before getting too worried and pulling the plug?
It does appear to be progressing, as the tail of cgw.out is showing both successes and failures for different scaffolds:

ExamineSEdgeForUsability_Interleaved()-- Interleaving failed, will not merge. isQualityScaffoldMergingEdge()-- Merge scaffolds 238878 (241194.0bp) and 246626 (8586182.0bp): gap -322821.5bp +- 6836.8bp weight 2 BA_AB edge isQualityScaffoldMergingEdge()-- Merge scaffolds 238878 (241194.0bp) and 246626 (8586182.0bp): FAIL LARGE NEGATIVE GAP isQualityScaffoldMergingEdge()-- NEW fail (545711/806589) isQualityScaffoldMergingEdge()-- Merge scaffolds 223313 (47479.4bp) and 246625 (8458792.0bp): gap -331469.1bp +- 5009.2bp weight 2 AB_BA edge isQualityScaffoldMergingEdge()-- scaffold 223313 instrumenter happy 154.0 gap 52.4 misorient close 0.0 correct 4.0 far 0.0 oriented close 0.0 far 6.0 missing 127.2 external 15.4 isQualityScaffoldMergingEdge()-- scaffold 246625 instrumenter happy 123185.0 gap 2799.6 misorient close 389.0 correct 838.0 far 2453.0 oriented close 81.0 far 5019.0 missing 119506.7 external 81.8 isQualityScaffoldMergingEdge()-- scaffold (new) instrumenter happy 123341.0 gap 2800.2 misorient close 389.0 correct 842.0 far 2453.0 oriented close 81.0 far 5025.0 missing 119697.1 external 81.8 isQualityScaffoldMergingEdge()-- before: 0.490 satisfied (123338/128423 good/bad mates) after: 0.490 satisfied (123340/128487 good/bad mates) isQualityScaffoldMergingEdge()-- ARE happy enough to merge 101 (0.490 >= 0.975) || (0.490 >= 0.490) || ((123340 > 123338) && (32.000 <= 0.300)) isQualityScaffoldMergingEdge()-- NEW pass (545712/806589) ExamineSEdgeForUsability_Interleaved()-- Interleaving failed, will not merge. isQualityScaffoldMergingEdge()-- Merge scaffolds 194762 (20141.8bp) and 246626 (8586182.0bp): gap -336435.0bp +- 5923.6bp weight 2 AB_AB edge isQualityScaffoldMergingEdge()-- scaffold 194762 instrumenter happy 108.0 gap 6.5 misorient close 1.0 correct 4.0 far 0.0 oriented close 0.0 far 6.0 missing 179.7 external 37.8 isQualityScaffoldMergingEdge()-- scaffold 246626 instrumenter happy 118815.0 gap 3212.7 misorient close 441.0 correct 951.0 far 2730.0 oriented close 98.0 far 5559.0 missing 118110.4 external 111.9 isQualityScaffoldMergingEdge()-- scaffold (new) instrumenter happy 118925.0 gap 3209.7 misorient close 442.0 correct 956.0 far 2733.0 oriented close 98.0 far 5573.0 missing 118309.4 external 111.9 isQualityScaffoldMergingEdge()-- before: 0.481 satisfied (118922/128080 good/bad mates) after: 0.481 satisfied (118924/128111 good/bad mates) isQualityScaffoldMergingEdge()-- ARE happy enough to merge 101 (0.481 >= 0.975) || (0.481 >= 0.481) || ((118924 > 118922) && (15.500 <= 0.300)) isQualityScaffoldMergingEdge()-- NEW pass (545713/806589) ExamineSEdgeForUsability_Interleaved()-- Interleaving failed, will not merge. isQualityScaffoldMergingEdge()-- Merge scaffolds 182770 (45783.6bp) and 246625 (8458792.0bp): gap -338272.0bp +- 4727.2bp weight 2 AB_BA edge

Mate-pair values incorrect in meanAndStdevByPrefix.sj.txt

Hi,

The values in the meanAndStdevByPrefix.sj.txt file are different from what I provided in the config.txt file.
All of my mate-pair libraries are given a mean of 500 and a standard deviation of 100 in the meanAndStdevByPrefix.sj.txt file. Is this an expected behavior?
The values I provided for my paired-end data are correctly displayed in meanAndStdevByPrefix.pe.txt.

make error

swig/perl5/swig_wrap.cpp:341:20: fatal error: string.h: No such file or directory
#include <string.h>

restarting during consensus step

I'm wondering how robust Masurca is to restarting after getting killed during the assembly step. Basically, my cluster has a queue that is preemptable so jobs can be killed and restarted if a higher priority job gets assigned to the node it is running on.

I have three samples I'm assembling. The consensus step seems to take a ton of time, thus it has been preempted in 2/3 assemblies. The 5-consensus/ from the assembly that was not preempted has outputs like this:
[user@login3 assemblies]$ ls SAMPLE1_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus|tail -n 20
genome.129.iid
genome_129.success
genome_130.cns.err
genome.130.fa
genome_130.fix.err
genome_130.fixes
genome.130.iid
genome_130.success
genome_131.cns.err
genome.131.fa
genome_131.fix.err
genome_131.fixes
genome.131.iid
genome_131.success
genome.fixes
genome.fixes.err
genome.partitioned
genome.partitioned.err
genome.sampling
genome.sampling.dat

Another is missing .fa .iid .lay files for the last 2 iterations and has some extra files for earlier iterations:
[user@login3 assemblies]$ ls SAMPLE2_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus|tail -n 40
genome.130.iid
genome.130.lay
genome_130.success
genome.130.tmp.layout
genome_131.cns.err
genome.131.fa
genome.131.fasta
genome.131.fasta.qual
genome.131.fasta.qv
genome_131.fix.err
genome_131.fixes
genome.131.iid
genome.131.lay
genome_131.success
genome.131.tmp.layout
genome_132.cns.err
genome.132.fa
genome.132.fasta
genome.132.fasta.qual
genome.132.fasta.qv
genome_132.fix.err
genome_132.fixes
genome.132.iid
genome.132.lay
genome_132.success
genome.132.tmp.layout
genome_133.cns.err
genome_133.fix.err
genome_133.fixes
genome_133.success
genome_134.cns.err
genome_134.fix.err
genome_134.fixes
genome_134.success
genome.fixes
genome.fixes.err
genome.partitioned
genome.partitioned.err
genome.sampling
genome.sampling.dat

Finally, the third assembly is still running and has been stuck on making the final .cns.err for 40 hrs.
[earlm1@login3 assemblies]$ ll SAMPLE3_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus|tail -n 20
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_129.success
-rw-r--r-- 1 earlm1 schlenke 47970 May 1 16:49 genome_130.cns.err
-rw-r--r-- 1 earlm1 schlenke 269 May 1 16:49 genome_130.fix.err
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_130.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_130.success
-rw-r--r-- 1 earlm1 schlenke 43216 May 1 16:49 genome_131.cns.err
-rw-r--r-- 1 earlm1 schlenke 269 May 1 16:49 genome_131.fix.err
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_131.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_131.success
-rw-r--r-- 1 earlm1 schlenke 41547 May 1 16:49 genome_132.cns.err
-rw-r--r-- 1 earlm1 schlenke 269 May 1 16:49 genome_132.fix.err
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_132.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_132.success
-rw-r--r-- 1 earlm1 schlenke 18943 May 1 17:06 genome_133.cns.err
-rw-r--r-- 1 earlm1 schlenke 1972 May 1 17:44 genome_133.fix.err
-rw-r--r-- 1 earlm1 schlenke 7951113 May 1 17:44 genome_133.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 17:44 genome_133.success
-rw-r--r-- 1 earlm1 schlenke 1142416 May 3 14:27 genome_134.cns.err
-rw-r--r-- 1 earlm1 schlenke 0 Apr 24 17:37 genome.partitioned
-rw-r--r-- 1 earlm1 schlenke 0 Apr 24 16:59 genome.partitioned.err

The file is largely just a list of alignment failures.
[user@login3 assemblies]$ tail SAMPLE3_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus/genome_134.cns.err
MultiAlignUnitig()-- failed to align fragment 59091605 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 45901675 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 43236611 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 10799714 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 8139984 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 49529163 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 36254081 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 30386433 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 13041716 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 39923494 in unitig 6690241.

Question 1: should I kill the SAMPLE3 assembly, remove the 5-consensus dir and restart (and prevent it from getting preempted)
Question 2: should i trust the assembly of SAMPLE2? It eventually finished the consensus step and continued through to produce a scaffold file that doesn't seem obviously messed up.

Thanks,
Earl

trouble compiling Masurca on Ubuntu

Hi. Are you willing to make a Masurca binary available for Ubuntu? I'm interested in using Masurca for transcript assembly as described in the StringTie paper, but I'm surrendering after spening a couple of hours trying to install it to no avail.

The first challenging dependency is gcc. I tried the Ubuntu default (7.2.0) and it complains about -V and -qversion. I commented those out but ran into the problems with boost (below). Because of the cryptic fatal error messages, I also tried installing gcc v4.7 for Ubuntu but it behaves the same way as 7.2.0 (doesn't recognize -V and -qversion). There doesn't seem to be an Ubuntu option at CERN.

The second challenging dependency is boost. FYI, it is not mentioned here. I installed boost and tried pointing install.h to that folder, but it reports that it can't find a working installation of boost:

configure: Detected BOOST_ROOT; continuing with --with-boost=/usr/include/boost/
checking for Boost headers version >= 1.46.0... no
configure: cannot find Boost headers version >= 1.46.0
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ##
## Could not find a working installation of Boost. Set BOOST_ROOT to the path where the Boost headers are installed or set BOOST_ROOT=install to have it downloaded from the Internet and installed locally. For example: BOOST_ROOT=install ./install.sh ##

The same thing happens if I modify install.sh and pass that directory directly to configure (./configure ... --with-boost=...).

I copied the boost directory into if global-1 and got a different error:

checking for Boost headers version >= 1.46.0... /usr/include
checking for Boost's header version... 1_58
checking boost/icl/interval_set.hpp usability... no
checking boost/icl/interval_set.hpp presence... yes
configure: WARNING: boost/icl/interval_set.hpp: present but cannot be compiled
configure: WARNING: boost/icl/interval_set.hpp:     check for missing prerequisite headers?
configure: WARNING: boost/icl/interval_set.hpp: see the Autoconf documentation
configure: WARNING: boost/icl/interval_set.hpp:     section "Present But Cannot Be Compiled"
configure: WARNING: boost/icl/interval_set.hpp: proceeding with the compiler's result
configure: WARNING:     ## ------------------------------- ##
configure: WARNING:     ## Report this to [email protected] ##
configure: WARNING:     ## ------------------------------- ##
checking for boost/icl/interval_set.hpp... no
configure: error: cannot find boost/icl/interval_set.hpp

Thanks.

alekseyzimin / masurca Goto Github PK

masurca's People

Contributors

Stargazers

Watchers

Forkers

masurca's Issues

==================== Scaffolds | withGaps | withoutGaps

#Seqs | 137,936 Min | 101 | 101 1st Qu.| 8,009 | 7,987 Median | 13,537 | 13,476 Mean | 22,901 | 22,790 3rd Qu.| 23,890 | 23,709 Max | 1,020,449 | 1,018,898 Total | 3,159,008,652 | 3,143,578,659 n50 | 36,831 | 36,555 n90 | 10,703 | 10,654 n95 | 8,089 | 8,063

Contigs | withNs | withoutNs

#Seqs | 145,321 Min | 70 | 70 1st Qu.| 7,669 | 7,669 Median | 13,041 | 13,041 Mean | 21,631 | 21,631 3rd Qu.| 22,638 | 22,637 Max | 967,921 | 967,901 Total | 3,143,578,659 | 3,143,503,525 n50 | 33,855 | 33,855 n90 | 10,263 | 10,263 n95 | 7,715 | 7,715

Gaps

#Seqs | 7,385 Min | 25 1st Qu.| 676 Median | 1,514 Mean | 2,089 3rd Qu.| 2,965 Max | 18,505 Total | 15,429,993 n50 | 3,390 n90 | 1,094 n95 | 750

ERROR: failed to merge alignments at position 554

Recommend Projects

Recommend Topics

Recommend Org

====================
Scaffolds | withGaps | withoutGaps

#Seqs | 137,936
Min | 101 | 101
1st Qu.| 8,009 | 7,987
Median | 13,537 | 13,476
Mean | 22,901 | 22,790
3rd Qu.| 23,890 | 23,709
Max | 1,020,449 | 1,018,898
Total | 3,159,008,652 | 3,143,578,659
n50 | 36,831 | 36,555
n90 | 10,703 | 10,654
n95 | 8,089 | 8,063

#Seqs | 145,321
Min | 70 | 70
1st Qu.| 7,669 | 7,669
Median | 13,041 | 13,041
Mean | 21,631 | 21,631
3rd Qu.| 22,638 | 22,637
Max | 967,921 | 967,901
Total | 3,143,578,659 | 3,143,503,525
n50 | 33,855 | 33,855
n90 | 10,263 | 10,263
n95 | 7,715 | 7,715

#Seqs | 7,385
Min | 25
1st Qu.| 676
Median | 1,514
Mean | 2,089
3rd Qu.| 2,965
Max | 18,505
Total | 15,429,993
n50 | 3,390
n90 | 1,094
n95 | 750