alekseyzimin / masurca Goto Github PK
View Code? Open in Web Editor NEWLicense: GNU General Public License v3.0
License: GNU General Public License v3.0
Hello MaSuRCA team,
I used MaSuRCA (3.2.7) to hybrid assemble Brassica genome, and i meet the following two errors:
**./assemble.sh: line 158: work2.1/readPlacementsInSuperReads.final.read.superRead.offset.ori.txt: No such file or directory
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.**
[Sat Jun 30 13:09:06 CST 2018] Processing pe library reads
[Sat Jun 30 13:27:49 CST 2018] Processing sj library reads
[Sat Jun 30 13:49:18 CST 2018] Average PE read length 150
[Sat Jun 30 13:49:19 CST 2018] Using kmer size of 99 for the graph
[Sat Jun 30 13:49:20 CST 2018] MIN_Q_CHAR: 33
[Sat Jun 30 13:49:20 CST 2018] Creating mer database for Quorum
[Sat Jun 30 14:22:21 CST 2018] Error correct PE.
[Sat Jun 30 15:30:09 CST 2018] Error correct JUMP.
[Sat Jun 30 16:10:38 CST 2018] Estimating genome size.
[Sat Jun 30 19:26:30 CST 2018] Estimated genome size: 423828686
[Sat Jun 30 19:26:30 CST 2018] Creating k-unitigs with k=99
[Sat Jun 30 21:51:17 CST 2018] Creating k-unitigs with k=31
[Sat Jun 30 23:51:16 CST 2018] Filtering mate pairs
Assuming outtie orientation
./assemble.sh: line 158: work2.1/readPlacementsInSuperReads.final.read.superRead.offset.ori.txt: No such file or directory
Chimeric/Redundant jump reads:
80879566 chimeric_sj.txt
385590798 redundant_sj.txt
466470364 total
[Sun Jul 1 11:00:32 CST 2018] Creating FRG files
[Sun Jul 1 11:09:06 CST 2018] Computing super reads from PE
Using CABOG from is /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../CA8/Linux-amd64/bin
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=17, d=0.029
Estimated Genome Size 423828686
Estimated Ploidy 1
Using 30 threads
Output prefix mr.41.15.17.0.029
Pacbio coverage <30x, using the longest subreads
Reducing super-read k-mer size
Mega-reads pass 1
Running locally in 1 batch
compute_psa 4268399 1643220171
Processed 500000 super reads, irreducible 370521, processing 682 super reads per second
Processed 1000000 super reads, irreducible 797782, processing 841 super reads per second
Processed 1500000 super reads, irreducible 1258713, processing 1392 super reads per second
Processed 2000000 super reads, irreducible 1715565, processing 1655 super reads per second
Processed 2500000 super reads, irreducible 2141569, processing 2118 super reads per second
Mega-reads pass 2
Running locally in 1 batch
compute_psa 2237976 6053800446
Refining alignments
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2002.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
RuntimeError Usage: new_Options(); at /data/mg1/caix/src/AssemblySoftware/MaSuRCA-3.2.7/bin/../lib/perl/mummer.pm line 262, line 2004.
home@home-Lenovo-H30-50:~/Documents/Chloroplast_Assembly/MaSuRCA-3.2.7$ sudo ./install.sh
I am running masurca v3.2.3. I got an error at celera assembly step. The log report is below:
[Wed 23 May 17:16:57 BST 2018] Processing pe library reads
[Wed 23 May 17:34:23 BST 2018] Average PE read length 200
[Wed 23 May 17:34:24 BST 2018] Using kmer size of 127 for the graph
MIN_Q_CHAR: 33
[Wed 23 May 17:34:25 BST 2018] Creating mer database for Quorum
[Wed 23 May 17:44:48 BST 2018] Error correct PE.
[Wed 23 May 18:49:29 BST 2018] Estimating genome size.
Estimated genome size: 641438513
[Wed 23 May 18:59:38 BST 2018] Creating k-unitigs with k=127
[Wed 23 May 19:39:32 BST 2018] Computing super reads from PE
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=13, d=0.02
Using MaSuRCA files from work1, k-unitig mer 41
Estimated Genome Size 641438513
Estimated Ploidy 1
Using CA installation from /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/../CA8/Linux-amd64/bin
Using 320 threads
Output prefix mr.41.15.13.0.02
Detected nanopore data, we have to rename the reads
/tsl/scratch/witekk/nanopore/high_quality_minion_data.fastq generated
Reducing super-read k-mer size
Mega-reads pass 1
compute_psa 1578015 1250919193
Processed 500000 super reads, irreducible 359409, processing 902 super reads per second
Processed 1000000 super reads, irreducible 681217, processing 2403 super reads per second
Processed 1500000 super reads, irreducible 1016462, processing 2325 super reads per second
Processed 2000000 super reads, irreducible 1342366, processing 2083 super reads per second
Mega-reads pass 2
compute_psa 1393233 5012389650
Refining alignments
read sequence for 136f2596-55be-408d-8250-ab8b5e4e744e not found at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/add_pb_seq.pl line 19, line 1.
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine.sh: line 15: delta-filter: command not found
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine.sh: line 16: show-coords: command not found
Can't load '/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/../lib/perl/mummer.so' for module mummer: /usr/lib64/libc.so.6: version GLIBC_2.18' not found (required by /tsl/software/testing/brew/default/x86_64/lib/libstdc++.so.6) at /usr/lib64/perl5/DynaLoader.pm line 190. at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/../lib/perl/mummer.pm line 11. Compilation failed in require at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine_alignments.pl line 8. BEGIN failed--compilation aborted at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/refine_alignments.pl line 8. rm: cannot remove ‘t..matches.0.maximal_mr.fa’: No such file or directory rm: cannot remove ‘t..matches.0.maximal_mr.names’: No such file or directory Joining awk: cmd. line:1: fatal: cannot open file
mr.41.15.13.0.02.all.txt' for reading (No such file or directory)
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/mega_reads_assemble_nomatch.sh: line 269: mr.41.15.13.0.02.all.txt: No such file or directory
Generating assembly input files
awk: cmd. line:1: fatal: cannot open file `mr.41.15.13.0.02.1.fa' for reading (No such file or directory)
stat: cannot stat ‘mr.41.15.13.0.02.1.fa’: No such file or directory
/tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/mega_reads_assemble_nomatch.sh: line 288: /641438513/1+1: syntax error: operand expected (error token is "/641438513/1+1")
Coverage threshold for splitting unitigs is 20 minimum ovl 250
Running assembly
runCA -s runCA.spec consensus=pbutgcns -p genome -d CA.mr.41.15.13.0.02 stopAfter=consensusAfterUnitigger mr.41.15.13.0.02.1.frg mr.41.15.13.0.02.1.mates.frg cgwErrorRate=0.12 useGrid=0 scriptOnGrid=0 merylThreads=16 frgCorrThreads=1 frgCorrConcurrency=12 cnsConcurrency=6 ovlCorrConcurrency=10 ovlConcurrency=10 ovlThreads=8 ovlMemory=8GB
Assembly stopped or failed, see CA.mr.41.15.13.0.02.log
[Sun 27 May 10:06:44 BST 2018] Assembly stopped or failed, see CA.mr.41.15.13.0.02.log
On checking the file CA.mr.41.15.13.0.02.log, it reported this:
runCA failed.
Stack trace:
at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/runCA line 1121.
main::caFailure('invalid unitigger specified (bogart); must be 'utg' or 'bog'', undef) called at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/runCA line 706
main::setParameters() called at /tsl/software/testing/brew/default/x86_64/Cellar/masurca/3.2.3/bin/runCA line 5314
Failure message:
invalid unitigger specified (bogart); must be 'utg' or 'bog'
The celera assembler version I am running is 6.1
We used MASURCA 3.2.6 for genome assembly but found no fasta file for contig sequences at the end. Only final.genome.scf.fasta appeared in the CA folder. Is this normal? Below is the content from the log file:
Processing pe library reads
Average PE read length 126
Using kmer size of 83 for the graph
cat: write error: Broken pipe
MIN_Q_CHAR: 33
Creating mer database for Quorum
Error correct PE.
Estimating genome size.
Estimated genome size: 697935949
Creating k-unitigs with k=83
Computing super reads from PE
Celera Assembler
ovlMerThreshold=75
Overlap/unitig success
recomputing A-stat for super-reads
recomputing A-stat for super-reads
Unitig consensus success
CA success
No gap closing possible.
Assembly complete, final scaffold sequences are in CA/final.genome.scf.fasta
All done
Thanks
I am running masurca for a (highly homozygote) plant genome of 1.3 Gb on a cluster with 2 Tb RAM and 90 cores. I have 100x Illumina coverage and ca 12x PacBio. Masurca is already running for 8 days and it predicted 89000 overlap jobs, that are running at the speed of ca 100/hour.
I have 2 questions:
I have encountered an error in mega_reads_assemble_cluster.sh in line 610:
$CA_PATH/runCA -s runCA.spec -p genome -d $CA stopAfter=consensusAfterUnitigger $COORDS.1.frg $SR_FRG $OTHER_FRG 1>> $CA.log 2>&1
The previous step leaves the folder genome.gkpStore, which leads to runCA failing because the folder already exists. I checked the folder and it's empty. It only contains empty directories.
Erasing the folder before that line solves the issue and executes the code.
Does Masurca have a module to detect the biotin stuffer sequence in Nextera mate-pair libraries and split the reads? Does, "IMPORTANT! Do not use third party tools top pre-process the Illumina data before providing it to MaSuRCA" apply to MP libraries. I'm guessing that Masurca uses kmer coverage data to compute lots of stuff and is sensitive to different trimming parameters. Since MP data is pretty biased and requires processing to be useful, I'm guessing it doesn't get used for this purpose. I have tried running Masurca with PE, PacBio and MP(unprocessed) data and the assemblies were very poor compared to using just PE and PacBio data (n50 of ~30 vs. 220kb, respectively), so I'm guessing there is at least something wrong with my raw MP data. Right not I'm running it again including the split MP data. This splitting was preceded by a trimming step recommended for bbtools splitnextera (https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/split-nextera-guide/).
Related question: Does the MP data just come into play for the assembly? Can I use an existing output directory and rerun the assembly process including the MP data? I deleted the PE, MP, PB masurca directory and don't remember if there were MP.cor files :(
Thanks,
Earl
(also, thanks for developing this assembler!)
I just tried to assembly the tetraploid with MaSuRca 3.2.7, but the file "PLOIDY.txt" shows "1", how can I fix that?
I received the error message below during an assembly with Masurca 3.2.4. The file with the incorrect array type is present in the 7-CGW folder and the 8-consensus step appears to have run successfully. Please let me know what I might do to correct this error. Thanks!
----------------------------------------START Wed Mar 21 09:50:44 2018
/clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/CA/Linux-amd64/bin/terminator -g /global/scratch/blackman/LRD/CA/genom
e.gkpStore -t /global/scratch/blackman/LRD/CA/genome.tigStore 13 -c /global/scratch/blackman/LRD/CA/7-CGW/genome 12 -o /global/scratch/blackm
an/LRD/CA/9-terminator/genome > /global/scratch/blackman/LRD/CA/9-terminator/genome.asm.err
/clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/CA/Linux-amd64/bin/terminator: AS_configure()-- AS_CGW_ERROR_RATE set t
o 0.15
====> Reading /global/scratch/blackman/LRD/CA/7-CGW/genome.ckp.12 at Wed Mar 21 09:50:44 2018
runCA failed.
Stack trace:
at /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/bin/../CA/Linux-amd64/bin/runCA line 1121.
main::caFailure('terminator failed', '/global/scratch/blackman/LRD/CA/9-terminator/terminator.err') called at /clusterfs/vector/home/gro
ups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/bin/../CA/Linux-amd64/bin/runCA line 4394
main::terminate() called at /clusterfs/vector/home/groups/software/sl-7.x86_64/modules/MaSuRCA-3.2.4/bin/../CA/Linux-amd64/bin/runCA lin
e 5348
Failure message:
terminator failed
masurca 3.2.5 writes 'CA8/Linux-amd64/bin' in assemble.sh for starting the CA run, but the install path is actually just 'CA/Linux-amd64/bin'
this is the error:
[Mon Mar 5 09:45:54 CST 2018] Using linking mates Using CABOG from is /media/data/software/MaSuRCA-3.2.5/bin/../CA8/Linux-amd64/bin runCA not found at /media/data/software/MaSuRCA-3.2.5/bin/../CA8/Linux-amd64/bin! cat: CA_dir.txt: No such file or directory [Mon Mar 5 09:45:54 CST 2018] Assembly stopped or failed, see .log
FYI: estimated genome size is 2.2 Gb and coverage of illumina ~ 70X and PacBio coverage: ~14X
Getting memory allocation error below (using MaSuRCA v3.2.6) :
terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 770000000000 bytes of memoryterminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 770000000000 bytes of memory
./assemble.sh: line 120: 59122 Aborted jellyfish count -m 31 -t 16 -C -s $JF_SIZE -o k_u_hash_0 pe.cor.fa sj.cor.fa
Failed to open input file 'k_u_hash_0'
Estimated genome size:
Creating k-unitigs with k=99
./assemble.sh: line 125: *2: syntax error: operand expected (error token is "*2")
Creating k-unitigs with k=31
./assemble.sh: line 130: *2: syntax error: operand expected (error token is "*2")
Super reads failed, check super2.err and files in ./work2/
Masurca-3.2.6b was running on a 144 CPUs workstation (Ubuntu 17.10) with 2 TB RAM available.
Unfortunately runCA and meryl steps failed in this command:
/nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/CA8/Linux-amd64/bin/meryl -B -C -v -m 22 -memory 65536 -threads 158 -c 0 -L 2 -s /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore:chain -o /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15
.17.0.029/0-mercounts/genome-C-ms22-cm0 > /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/0-mercounts/meryl.err 2>&1
----------------------------------------END Thu Apr 19 03:53:38 2018 (69 seconds)
ERROR: Failed with signal HUP (1)
================================================================================
runCA failed.
----------------------------------------
Stack trace:
at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 1613.
main::caFailure("meryl failed", "/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0"...) called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 2483
main::runMeryl(22, 0, "-C", undef, undef, undef, "obt", 1) called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 2698
main::meryl() called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 3667
main::createOverlapJobs("trim") called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 4062
main::overlapTrim() called at /nfs/fbn_fish_gen/Downloads/MaSuRCA-3.2.6b/bin/../CA8/Linux-amd64/bin/runCA line 6522
Here is the error log content
----------------------------------------
Last few lines of the relevant log file (/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/0-mercounts/meryl.err):
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
Thread exits.
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/uid (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qsb (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
Can fit 154190604 mers into table with prefix of 23 bits, using 414.000MB ( 0.000MB for positions)
Can fit 154190604 mers into table with prefix of 23 bits, using 414.000MB ( 0.000MB for positions)
Can fit 154190604 mers into table with prefix of 23 bits, using 414.000MB ( 0.000MB for positions)
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/snm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/snm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/uid (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/qnm (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/plc (r): Too many open files
openStore()-- Failed to open store /nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/fsb (r): Too many open files
failed to open gatekeeper store '/nfs/fbn_fish_gen/ASSEMBLIES/MaSuRCA-Results/CA.mr.41.15.17.0.029/genome.gkpStore/inf': Too many open files
----------------------------------------
Failure message:
meryl failed
I restarted masurca several times, but the same issue persists (runCA and meryl failed).
Do you have an idea what might causes that bug?
Thanks.
MaSuRCA 3.2.2 RC1
MaSuRCA was run before on a related organism "previous" without any particular problems.
It had PacBIO, Illumina PE, and 454 data. Another organism "current" with a similar data
mix is having all sorts of problems at overlapInCore. The data is summarized below:
previous current
-------------------------------- CA reports
numFrags 101696805 225978692 <- CA number of seqs
merThresh 131 173 <- it calculated this
ovl jobs 8839 58860 <- current 6.6X larger
-------------------------------- statistics for input files
pe N 54131150 99999732 <- current 1.8X larger
pe bp 7945611718 10027259444 <- current 1.3X larger
mr N 329516 1384768 <- current 4.2X larger
mr bp 807632269 5246427833 <- current 6.5X larger
mates N 200170 707934 <- current 3.5X larger
mates bp 80268170 283881534 <- current 3.5X larger
454 N 43616768 74925978 <- current 1.7X larger
454 bp 3093627853 5315577364 <- current 1.7X larger
total N 98277604 177018412 <- != CA count
total bp 11.9 Gbp 20.9 Gbp
Previous had 454 data which was only one each paired and single libraries. Current has several. Each library was processed similarly, and the frg file settings are the same. Note the discrepancy in the number of sequences in the input vs. the fragments that CA reports. The runCA.spec file produced by MaSuRCA
differs only slightly from the one used for current:
diff $CURRENT/runCA.spec $PREVIOUS/runCA.spec
1c1
< batOptions=-repeatdetect 41 41 41 -el 64
---
> batOptions=-repeatdetect 36 36 36 -el 63
The first problem is that the lines in ovlopt like:
-h 1-27573 -r 1-27573 --hashstrings 27573 --hashdatalen 100003008
apparently have a hashstrings value which is too large. When it runs with these values the jobs invariably just emit a series of messages:
ERROR: Hash table full
to their out files, and run forever. Some experimenting showed that reducing the --hashstrings value by 25% let jobs run. So a copy of overlap.sh was patched to reduce the hashstrings value by 25% at run time (also to only use a single thread, because -t 2 processes were often far below 200% CPU, but -t 1 processes never are), and a copy of runCA was patched to use that version of the script. So far it is up to 2250 jobs (in 18 hours) and none have hung completely. (The estimated ~20 day run time just for this phase is not wonderful, it seems pretty long for a 40CPU machine and an ~1Gbp organism.)
However....
There is a huge variation in run times. Some jobs complete in less than 5 minutes. Others have been running for more than 1000 minutes, and might run for who knows how much longer. That is at least a 200X difference in run times. All the overlapInCore jobs are using ~100% CPU.
Why is there such a huge variation?
Another issue is the amount of memory used, which is too little. Even the longest running jobs are only using 3045m VIRT and 2.7g RES in top. lscpu shows 40 "CPUs" and that is the number used, so around 120Gb of RAM is employed by these processes. The system has 512 Gb. I tried to make overlapInCore use more with the switch:
-M '8GB'
but it acted like the switch didn't exist. (Just checked the source code for OverlapInCore.C, the parameter reading section has no handling for "-M".) Is there some other way to induce this program to use more memory?
Thanks.
Good morning,
Masurca has generated the following pair of errors. I was wondering if they are related?
First,
[Tue Jul 3 18:41:15 EDT 2018] Processing pe library reads
[Tue Jul 3 19:19:34 EDT 2018] Average PE read length 250
[Tue Jul 3 19:19:34 EDT 2018] Using kmer size of 127 for the graph
[Tue Jul 3 19:19:36 EDT 2018] MIN_Q_CHAR: 33
[Tue Jul 3 19:19:36 EDT 2018] Creating mer database for Quorum
[Tue Jul 3 19:55:31 EDT 2018] Error correct PE.
[Tue Jul 3 22:25:02 EDT 2018] Estimating genome size.
[Tue Jul 3 22:45:18 EDT 2018] Estimated genome size: 2305482575
[Tue Jul 3 22:45:18 EDT 2018] Creating k-unitigs with k=127
[Wed Jul 4 00:37:37 EDT 2018] Computing super reads from PE
Using CABOG from is /opt/packages/masurca/3.2.7/bin/../CA8/Linux-amd64/bin
stat: cannot stat ‘work1/superReadSequences.fasta’: No such file or directory
/opt/packages/masurca/3.2.7/bin/mega_reads_assemble_cluster.sh: line 127: /2305482575/3: syntax error: operand expected (error token is "/2305482575/3")
/opt/packages/masurca/3.2.7/bin/mega_reads_assemble_cluster.sh: line 128: [: -lt: unary operator expected
/opt/packages/masurca/3.2.7/bin/mega_reads_assemble_cluster.sh: line 129: [: -gt: unary operator expected
It's unclear to me why the cannot stat
message appears. Indeed the superReadSequences.fasta
is missing, yet the work1
directory indeed was created and a createLengthStatisticsFiles.Failed
subdirectory exists.
The run continued despite this error message, with additional .log file information being produced:
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=15, d=0.02
Estimated Genome Size 2305482575
Estimated Ploidy
Using 24 threads
Output prefix mr.41.15.15.0.02
Using 30x of the longest ONT reads
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by ufasta)
ufasta: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by ufasta)
failed to extract the best long reads
I'm curious why masurca fails here. Could you specify what to search for in our system to recover the required gclib it seems to want?
Thanks
Hi,
I wish to confirm that the LHE_COVERAGE parameter is the estimated coverage of long reads given to the assembler. Currently its description is difficult to interpret.
Hello,
We seem to be getting "RuntimeError Usage: Options_minmatch(self,m)" errors in our MaSurca run. Please see brief logs below:
[Fri Jun 1 09:54:31 BST 2018] Estimated genome size: 536282973
[Fri Jun 1 09:54:31 BST 2018] Creating k-unitigs with k=41
[Fri Jun 1 10:06:19 BST 2018] Creating k-unitigs with k=31
[Fri Jun 1 10:25:34 BST 2018] Filtering mate pairs
..
..
Output prefix mr.41.15.17.0.029
Pacbio coverage >10x, using 10x of the longest reads
Reducing super-read k-mer size
Mega-reads pass 1
Running locally in 1 batch
compute_psa 6939941 1423963606
Processed 500000 super reads, irreducible 420469, processing 14285 super reads per second
Processed 1000000 super reads, irreducible 769474, processing 35714 super reads per second
..
..
Processed 3500000 super reads, irreducible 2410744, processing 38461 super reads per second
Processed 4000000 super reads, irreducible 2783819, processing 45454 super reads per second
Mega-reads pass 2
Running locally in 1 batch
compute_psa 3073547 2330298699
Refining alignments
Attempt to free unreferenced scalar: SV 0x32db870, Perl interpreter: 0x7d5010 at /cm/shared/apps/MaSuRCA/3.2.6/bin/../lib/perl/mummer.pm line 262, <STDIN> line 2002.
..
..
Attempt to free unreferenced scalar: SV 0x558fd80, Perl interpreter: 0x7d5010 at /cm/shared/apps/MaSuRCA/3.2.6/bin/../lib/perl/mummer.pm line 262, <STDIN> line 18346.
RuntimeError Usage: Options_minmatch(self,m); at /cm/shared/apps/MaSuRCA/3.2.6/bin/refine_alignments.pl line 51, <STDIN> line 18346.
Joining
refine/join alignments failed
[Sat Jun 2 00:20:26 BST 2018] Assembly stopped or failed, see CA.mr.41.15.17.0.029.log
Any thoughts on whats going on here? Our first guess was that the error is coming from mummer where the --minmatch limit is causing issues..
The log file referenced at the assembly failure doe snot seem to exist at all so we could not debug further.
Hi, I'm actually using MaSuRCA-3.2.6 to assemble my genome and a ran the fallowing script:
#PBS -S /bin/bash
#PBS -l nodes=1:ppn=8:bigmem,mem=100gb
#PBS -e /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.error
#PBS -o /pandata/ACG-0006_0027/LOGS/ACG-006_assembly.out
#PBS -N ACG-006
#PBS -q q1week
DATA
PE= pe 150 22 /pandata/LEPIWASP/ACG-0006_0027/frag_1.fastq /pandata/LEPIWASP/ACG-0006_0027/frag_2.fastq
END
PARAMETERS
#set this to 1 if your Illumina jumping library reads are shorter than 100bp
EXTEND_JUMP_READS=0
#this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
GRAPH_KMER_SIZE = auto
#set this to 1 for all Illumina-only assemblies
#set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
#otherwise keep at 0
USE_LINKING_MATES = 0
#specifies whether to run mega-reads correction on the grid
USE_GRID=0
#specifies queue to use when running on the grid MANDATORY
GRID_QUEUE=all.q
#batch size in the amount of long read sequence for each batch on the grid
GRID_BATCH_SIZE=300000000
#coverage by the longest Long reads to use
LHE_COVERAGE=30
#this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms
LIMIT_JUMP_COVERAGE = 300
#these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically.
#set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
CA_PARAMETERS = cgwErrorRate=0.15
#minimum count k-mers used in error correction 1 means all k-mers are used. one can increase to 2 if Illumina coverage >100
KMER_COUNT_THRESHOLD = 1
#whether to attempt to close gaps in scaffolds with Illumina data
CLOSE_GAPS=1
#auto-detected number of cpus to use
NUM_THREADS = 16
#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 200000000
#set this to 1 to use SOAPdenovo contigging/scaffolding module. Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data
SOAP_ASSEMBLY=0
END
Then, I got the asemble.sh file and I ran it as well and got the following .out:
[Sat Jun 16 22:32:45 CEST 2018] Processing pe library reads
[Sat Jun 16 22:49:04 CEST 2018] Average PE read length 150
[Sat Jun 16 22:49:05 CEST 2018] Using kmer size of 49 for the graph
[Sat Jun 16 22:49:06 CEST 2018] MIN_Q_CHAR: 33
WARNING: JF_SIZE set too low, increasing JF_SIZE to at least 1115876884, this automatic increase may be not enough!
[Sat Jun 16 22:49:06 CEST 2018] Creating mer database for Quorum
[Sat Jun 16 23:09:23 CEST 2018] Error correct PE.
[Sat Jun 16 23:11:49 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.
`and .error: `
/panhome/TOOLS/MaSuRCA-3.2.6/assemble.sh: line 102: 46750 Aborted quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)
) --contaminant=/panhome/TOOLS/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 16 -w 10 -e 3 -M quorum_mer_db.jf pe.re
named.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
Does someone have an idea of what is going on here? Thanks for your help.
The 2 fasta files are comming from an illumina Hiseq 3000 150bp and the genome size of my specie is around 1.5 GB.
Dear Dr. Zimin,
I used MaSuRCA to assembly a plant genome. The PE sequence predicted the genome have about 2% heterozygous and genome size is about 1.7 Gb. I used about four short read library and one PacBio library inculding about 50x PE data and about 30 x PacBio reads.
My experience is MaSuRCA will take very long time two steps:
One is mega-read cluster using PacBio read, more than three months.
Other one is overlap .frg files more than three months since I have about 240,000 overlaps.
I change somethings to fast finish the assembly. Maybe the change is not correct. One is the PacBio reads was 1000 chunks. I am not sure that this change is correct.
Other is overlap files. But I think this change is correct.
Finally,
I get the genome sequence as following results. I think my genome sequence have lots of duplication.
Could help me check the results and give me some suggestion for improvement my results?
Thanks,
Fuyou
#BUSCO was run in mode: genome
Summarized benchmarks in BUSCO notation:
C:83%[D:53%],F:3.4%,M:12%,n:1440
Representing: 1205
Complete Single-Copy BUSCOs 764
Complete Duplicated BUSCOs 49
Fragmented BUSCOs 186
Missing BUSCOs 1440
Total BUSCO groups searched
Non-gapped Ns Count: 75134
Hi,
I am assembling several closely related draft genomes with masurca v3.6.2 (not beta) based on illumina PE-only libs. Most assemblies finish without problems but one assembly fails to create one consensus unitig.
(from runCA3 and CA/7-0-CGW/cgw.out)
ERROR: Unitig 30217 has no placement; probably not run through consensus.
Segmentation fault (core dumped)
as I am using the official release v 3.6.2, CA attempts to fix this problem but fails
(from CA/fix_unitig_consensus/unitig_failures)
../5-consensus/genome_017.err:MultiAlignUnitig()-- Unitig 30217 FAILED. Could not align fragment 1034014.
I supose fragment 1034014 fails to align and as a consequence, no consensus unitig is produced (i.e. no UTG in either version 2 or 3 of the tigStore)
(from tigStore v1:)
...
FRG type R ident 18705479 container 1199472 parent 1199472 hang 217 -97 position 769 619
FRG type R ident 1011721 container 0 parent 1058008 hang 170 184 position 660 1174
FRG type R ident 1034014 container 0 parent 1472858 hang 90 189 position 660 1167
FRG type R ident 13210399 container 1011721 parent 1011721 hang 92 -272 position 752 902
FRG type R ident 13210403 container 1011721 parent 1011721 hang 92 -272 position 752 902
...
If I extract Unitig 30217 from the tigstore, manually remove fragment 1034014, replace the tigstore version 1 entry and try to generate a unitig consensus
tigStore -g genome.gkpStore -t genome.tigStore 1 -d layout -u 30217 > unitig30217.tmp
tigStore -g genome.gkpStore -t genome.tigStore 1 -R unitig30217.tmp
utgcns -g genome.gkpStore -t genome.tigStore 1 3 -u 30217
it claims to be successful
MultiAlignStore::dumpMASRfile()-- Writing 'genome.tigStore/seqDB.v002.p003.utg' partitioned.
NumColumnsInUnitigs = 0
NumGapsInUnitigs = 0
NumRunsOfGapsInUnitigReads = 0
NumColumnsInContigs = 0
NumGapsInContigs = 0
NumRunsOfGapsInContigReads = 0
NumAAMismatches = 0
NumVARRecords = 0
NumVARStringsWithFlankingGaps = 0
NumUnitigRetrySuccess = 0
Consensus finished successfully. Bye.
but does not produce the required UTG entry in tigStore version 3 (or 2):
unitig 30217
len 0
cns
qlt
data.unitig_coverage_stat -4.867138
data.unitig_microhet_prob 1.000000
data.unitig_status X
data.unitig_unique_rept X
data.contig_status U
data.num_frags 30
data.num_unitigs 0
FRG type R ident 1502113 container 0 parent 1214644 hang -208 -266 position 402 0
FRG type R ident 16305263 container 1502113 parent 1502113 hang 167 -85 position 317 167
FRG type R ident 10313351 container 1502113 parent 1502113 hang 173 -79 position 323 173
FRG type R ident 14227777 container 1502113 parent 1502113 hang 192 -60 position 342 192
FRG type R ident 11548311 container 1502113 parent 1502113 hang 202 -50 position 352 202
FRG type R ident 1214644 container 0 parent 1502113 hang 208 266 position 208 668
FRG type R ident 1864621 container 1214644 parent 1214644 hang 34 -99 position 571 242
FRG type R ident 1880555 container 1214644 parent 1214644 hang 50 -87 position 258 583
FRG type R ident 18705478 container 1214644 parent 1214644 hang 62 -248 position 270 420
FRG type R ident 1304841 container 0 parent 1214644 hang 96 76 position 304 744
FRG type R ident 18891280 container 1214644 parent 1214644 hang 99 -211 position 307 457
FRG type R ident 18895288 container 1214644 parent 1214644 hang 99 -211 position 307 457
FRG type R ident 9462916 container 1214644 parent 1214644 hang 106 -204 position 464 314
FRG type R ident 1771722 container 1214644 parent 1214644 hang 110 0 position 318 668
FRG type R ident 14883894 container 1214644 parent 1214644 hang 152 -158 position 510 360
FRG type R ident 12415492 container 1214644 parent 1214644 hang 169 -141 position 527 377
FRG type R ident 1653488 container 0 parent 1304841 hang 80 14 position 758 384
FRG type R ident 1199472 container 0 parent 1653488 hang 18 108 position 402 866
FRG type R ident 1199473 container 1199472 parent 1199472 hang 0 0 position 402 866
FRG type R ident 11474300 container 1199473 parent 1199473 hang 81 -233 position 633 483
FRG type R ident 11474474 container 1199473 parent 1199473 hang 81 -233 position 633 483
FRG type R ident 1058008 container 0 parent 1199472 hang 88 124 position 990 490
FRG type R ident 3571944 container 1058008 parent 1058008 hang 9 -341 position 649 499
FRG type R ident 9092394 container 1199473 parent 1199473 hang 111 -203 position 663 513
FRG type R ident 9092814 container 1199473 parent 1199473 hang 111 -203 position 663 513
FRG type R ident 18891281 container 1058008 parent 1058008 hang 31 -319 position 671 521
FRG type R ident 18895289 container 1058008 parent 1058008 hang 31 -319 position 671 521
FRG type R ident 1472858 container 0 parent 1199472 hang 168 112 position 570 978
FRG type R ident 18705479 container 1199472 parent 1199472 hang 217 -97 position 769 619
FRG type R ident 1011721 container 0 parent 1058008 hang 170 184 position 660 1174
I am totally happy to completely delete the problematic unitig because the assembly merely serves to error correct long reads, for which I use alternative approaches in parallel.
However, I don't seem to use the correct syntax and also don't know how to remove it from all versions of the tigstore (the following just prints the help function):
tigStore -g genome.gkpStore -t genome.tigStore 1 -D -u 30217
Any suggestion on how to fix the problematic unitig or kick it out completely would be much appreciated!
Thanks and best,
Evelien
Hello MaSuRCA team,
I used MaSuRCA (version: MaSuRCA-3.2.7) to de novo assemble Brassica genomes. However, I meet the following error messeage:
Running locally in 1 batch
compute_psa 2044593 5215127586
Refining alignments
Joining
Please file a bug report
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
xargs: nucmer: terminated by signal 6
Generating assembly input files
Coverage threshold for splitting unitigs is 20 minimum ovl 250
Running assembly
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
Dear Mr Zimin,
I'd like to compile Masurca on our server. I've followed all your steps, but once I run make, I receive the attached error message (
error.txt
). Please can you help me to solve my problem.
Cheers
Bastian Heimburger
Hello Aleksey,
I was wondering if I could use Nanopore and Pacbio together for the hybrid assembly. We have about 70x pacbio, 20x long-read Nanopore, and ~ 100x Illumina for a 650 Mb genome. What do you recon is the best strategy I can use with Masurca?
Thank you in advance!
Hello,
masurca-3.2.7 has stopped and generated the following error in super2.err file:
Error with file '/ceph/sge-tmp/jnguinka/fbn.fish.gen/Assemblies/MaSuRCA/guillaumeKUnitigsAtLeast32bases_all.jump.fasta'
Output file "work2/kUnitigLengths.txt" is of size 0, must be at least of size 1. Bye!
mv work2/numKUnitigs.txt work2/createLengthStatisticsFiles.Failed
mv work2/maxKUnitigNumber.txt work2/createLengthStatisticsFiles.Failed
mv work2/kUnitigLengths.txt work2/createLengthStatisticsFiles.Failed
Here is the content of quorum.err
[2018/07/27 17:43:53] Loading mer database
[2018/07/27 19:11:49] Loading contaminant sequences
[2018/07/27 19:11:49] Computing Poisson cutoff
[2018/07/27 19:25:46] distinct mers:5419768881 total mers:79782915223 estimated coverage:14.7207
[2018/07/27 19:25:46] lambda:0.0490691 collision_prob:0.00333333 poisson_threshold:0.0001
[2018/07/27 19:25:46] Using cutoff of 4
[2018/07/27 19:25:46] Correcting reads
[2018/07/28 01:21:11] Done
The following 03 filles are all empty:
guillaumeKUnitigsAtLeast32bases_all.jump.fasta.tmp
guillaumeKUnitigsAtLeast32bases_all.fasta.tmp
ESTIMATED_GENOME_SIZE.txt
Every second read in pe.cor.tmp.log and sj.cor.tmp.log is skipped
Here is their content of pe.cor.tmp.log:
tail -100 pe.cor.tmp.log
Skipped pe2761295299: No high quality mer
Skipped pe2761295053: No high quality mer
Skipped pe2761295055: No high quality mer
Skipped pe2761295057: No high quality mer
Skipped pe2761295059: No high quality mer
Skipped pe2761295061: No high quality mer
Skipped pe2761295063: No high quality mer
Skipped pe2761295065: No high quality mer
Skipped pe2761295067: No high quality mer
Skipped pe2761295069: No high quality mer
Skipped pe2761295071: No high quality mer
Skipped pe2761295073: No high quality mer
Skipped pe2761295075: No high quality mer
tail sj.cor.tmp.log
Skipped m2363681195: No high quality mer
Skipped m2363700910: No high quality mer
Skipped m2363700911: No high quality mer
Skipped m2363727470: No high quality mer
Skipped m2363727471: No high quality mer
Skipped m2363765722: No high quality mer
Skipped m2363765723: No high quality mer
Skipped m2363801496: No high quality mer
Skipped m2363801497: No high quality mer
Skipped m2363809641: No high quality mer
My OS is : Ubuntu 16.04
I have no clue what might have gone wrong and at which step.
I would appreciate any help,
Thanks
After running Masurca, I found out that the statistics from CA.mr.41.15.15.0.02.log are computed from the fasta file found in the 9-terminator folder.
In Masurca doc it is said the output is final.genome.scf.fasta, however the total length of this file is lower than the one found in the 9-terminator folder.
I'm a bit confused as to which assembly is the correct one, and why there is a different in length (and associated stats) with these two assemblies?
Cheers,
Martin Binet
make
autoreconf -fi
Fcntl.c: loadable library and perl binaries are mismatched (got handshake key 0xdb00080, needed 0xde00080)
make: *** [configure] Error 1
we use masurca the assembly the genome, we got the ESTIMATED_GENOME_SIZE.txt file, the value in ESTIMATED_GENOME_SIZE.txt were twice the size of the genome size
Hi,
I was using masurca to assemble a genome. Unfortunately, I got an error which I cannot fix, can you please help to fix it.
cat: write error: Broken pipe mkdir: cannot create directory `CA/fix_unitig_consensus': File exists INSERTING unitig 82475 MultiAlignStore::dumpMASRfile()-- Writing '../genome.tigStore/seqDB.v001.p005.utg' partitioned. INSERTING unitig 151276 MultiAlignStore::dumpMASRfile()-- Writing '../genome.tigStore/seqDB.v001.p008.utg' partitioned.
In runCA3.out, it says
ERROR: Failed with signal SEGV (11) ================================================================================ runCA failed. ---------------------------------------- Stack trace: at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 1121. main::caFailure("scaffolder failed", "/30days/GROUPS/Q0196RW/yyuan/myProject/masurca_20+30/"...) called at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 4066 main::CGW("7-0-CGW", undef, "/30days/GROUPS/Q0196RW/yyuan/myProject/masurca_20+30/"..., 2, undef, 1) called at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 4260 main::scaffolder() called at /gpfs1/sw1/Projects/MaSuRCA/3.2.4/bin/../CA/Linux-amd64/bin/runCA line 5346 ---------------------------------------- Last few lines of the relevant log file (/30days/GROUPS/Q0196RW/yyuan/myProject/masurca_20+30/CA/7-0-CGW/cgw.out): ...processed 61000000 fragments. ...processed 62000000 fragments. ...processed 63000000 fragments. ...processed 64000000 fragments. ...processed 65000000 fragments. ...processed 66000000 fragments. ...processed 67000000 fragments. ...processed 68000000 fragments. ...processed 69000000 fragments. ...processed 70000000 fragments. ...processed 71000000 fragments. ...processed 72000000 fragments. ...processed 73000000 fragments. ...processed 74000000 fragments. ...processed 75000000 fragments. ...processed 76000000 fragments. ...processed 77000000 fragments. ...processed 78000000 fragments. Reading unitigs. ERROR: Unitig 82475 has no placement; probably not run through consensus. ---------------------------------------- Failure message: scaffolder failed
I also looked at http://wgs-assembler.sourceforge.net/wiki/index.php/Unitig_Consensus_Failures_in_CA_6. However, it didn't give a clear solution.
Dear Team MaSuRCA,
I'm getting the below error in MaSuRCA - 3.2.2
"Assembly stopped or failed, see CA.mr.41.15.13.0.02.log"
CA.mr.41.15.13.0.02.log
config.txt
Please resolve ASAP.
Thanks
Regards,
Hithesh
Estimated genome size text file is showing the value that is half the genome size. Should I be concerned?
MaSuRCA 2.3.1 running on Ubuntu 14.04 LTS produces work1 directory and super1.err output file, then reports "Refining alignments". The next output is "ERROR: Could not parse delta file, /dev/stdin", followed by "error no: 402", then by "rm: cannot remove 't..matches.0.maximal_mr.fa' - this continues for a series of tmp files numbered up to about 850
Hi Aleksey,
thanks for the great program!
I've been testing MaSurCA with a few different genomes because we are evaluating which assembler will perform best for out data ( ~8-10x Nanopore, ~20-40x Illumina, 2-3GB genome size).
I'm a bit confused about what is your recommended best configuration file, because in your README on GitHub you seem to have two configuration files as examples, however with varying recommendations. For instance:
File 1:
#set this to 1 for all Illumina-only assemblies
#set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
#otherwise keep at 0
USE_LINKING_MATES = 0
File 2
• USE_LINKING_MATES=1
most of the paired end reads end up in the same super read and thus are not passed to the assembler. Those that do not end up in the same super read are called ”linking mates” . The best assembly results are achieved by setting this parameter to 1 for Illumina-only assemblies. If you have more than 2x coverage by long (454, Sanger, etc) reads, set this to 0.
Now our data falls in-between 2x and 20x long-read coverage, so you understand my confusion. Could you maybe edit the README so that it is less confusing? Thank you!
My masurca run failed in the runCA1 step with the message "Jellyfish failed". When I look at the logs and the scripts I think it is because the $ovlIT variable is not set. Interestingly, in this run this variable is set to "" (=it is empty) in the environment.sh file. After a bit of digging, I think this is because I provide a ESTIMATED_GENOME_SIZE.txt file. Due to that the line "jellyfish count -m 31 -t 36 -C -s $JF_SIZE -o k_u_hash_0 pe.cor.fa" in the assemble.sh script is not executed and therefore no k_u_hash_0 file is written. Later in the assemble.sh this k_u_hash_0 is needed to set the ovlMerThreshold variable with jellyfish. But because the file is not available, ovlMerThreshold ends up empty and therefore it is empty also in the environment.sh file.
Should I not provide an ESTIMATED_GENOME_SIZE.txt file? Or could the code be updated so that the assemble.sh always creates a k_u_hash_0 file?
And by the way, thanks for the great software!
Stefan
I'm trying to package masurca for bioconda, however the conda build process is failing to successfully extract masurca tarballs. This is a result of the tarballs being created with a preceding '/' on the path, which conda build tries to extract equivalent to 'tar -P' , and subsequently fails to create a /MaSuRCA-3.2.4
directory.
I've raised this issue with the bioconda team who will try to resolve the issue upstream with the conda developers, but in the meantime would it be possible to generate release tarballs with relative paths (and no preceding '/').
Many thanks
James
Given the fastq files of paired-end data and mate-pair data, how does one calculate the mean and standard deviation of the insert size?
Thanks
Hi, my run failed duringRunning assembly
. I've tried to rerun it, but got exactly the same error. Any hints?
Bests,
----------------------------------------START CONCURRENT Tue Mar 20 16:57:51 2018
/home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/ovlcorr.sh 1 > /home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/0001.err 2>&1
----------------------------------------END CONCURRENT Tue Mar 20 16:57:58 2018 (7 seconds)
Overlap correction job 1 (/home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/0001) failed.
================================================================================
runCA failed.
----------------------------------------
Stack trace:
at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 1613.
main::caFailure('1 overlap correction jobs failed; remove /home/lpryszcz/clust...', undef) called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 4514
main::overlapCorrection() called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 6526
----------------------------------------
Failure message:
1 overlap correction jobs failed; remove /home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/ovlcorr.sh (or run by hand) to try again
Overlap correction job 1 (/home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/0001) failed.
================================================================================
runCA failed.
----------------------------------------
Stack trace:
at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 1613.
main::caFailure('1 overlap correction jobs failed; remove /home/lpryszcz/clust...', undef) called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 4514
main::overlapCorrection() called at /home/lpryszcz/src/MaSuRCA-3.2.4/bin/../CA8/Linux-amd64/bin/runCA line 6526
----------------------------------------
Failure message:
1 overlap correction jobs failed; remove /home/lpryszcz/cluster/hybrids/mbizzarri/masurca/ATCC42981/CA.mr.41.15.15.0.02/3-overlapcorrection/ovlcorr.sh (or run by hand) to try again
Hi I am using masurca (v3.2.6) assembler for assembly of plat with 1Gbp genome and I can observer quite high memory usage in command assembly.sh in line:
create_k_unitigs_large_k -c $(($KMER-1)) -t 32 -m $KMER -n $(($ESTIMATED_GENOME_SIZE*2)) -l $KMER -f
perl -e 'print 1/'$KMER'/1e5' pe.cor.fa | grep --text -v '^>' | perl -ane '{$seq=$F[0]; $F[0]=~tr/ACTGactg/TGACtgac/;$revseq=reverse($F[0]); $h{($seq ge $revseq)?$seq:$revseq}=1;}END{$n=0;foreach $k(keys %h){print ">",$n++," length:",length($k),"\n$k\n"}}' > guillaumeKUnitigsAtLeast32bases_all.fasta.tmp && mv guillaumeKUnitigsAtLeast32bases_all.fasta.tmp guillaumeKUnitigsAtLeast32bases_all.fasta
specifically seccond perl command:
perl -ane '{$seq=$F[0]; $F[0]=~tr/ACTGactg/TGACtgac/;$revseq=reverse($F[0....
is using over 750G RAM, is this memory usage in this step normal? Do I have to use server with more RAM? Or is the a way how to decrease memory usage?
Best regards,
Petr
Dear Dr. Zimin,
Would it be possible to restart a run from version 3.2.6 using version 3.2.7 and if so, at what stage? Specifically, if at the megaread stage, do all the megaread (mr.41.15.17.0.029) files need to be recalculated, or can 3.2.7 pick up after mr.41.15.17.0.029.txt is created, but before the other mr files such as mr.41.15.17.0.029.all_mr.fa are created?
This would be a major time saver for comparing assemblies, as the mr.41.15.17.0.029.txt file took many days to create but the remaining mr files took only a short time.
Thanks for the update and software.
tsetsob
Hi,
The readme.md file lists two possible ways to estimate an appropriate value for the JF_SIZE
parameter. The first one listed on that page states:
#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 200000000
However a little later on in the document where an example is provided, the way proposed to derive that value is possibly a bit different:
JF_SIZE=2000000000
jellyfish hash size, set this to about 10x the genome size.
I have two questions related to this value:
JF_SIZE
parameter gets beyond a certain value?Thanks very much!
Hi there
Hybrid assembly gives following error:
Using CABOG from is MaSuRCA-3.2.6/bin/../CA8/Linux-amd64/bin
Running mega-reads correction/assembly
Using mer size 15 for mapping, B=15, d=0.02
Estimated Genome Size 368796815
Estimated Ploidy 1
Using 12 threads
Output prefix mr.41.15.15.0.02
Using 30x of the longest ONT reads
Reducing super-read k-mer size
Mega-reads pass 1
Running locally in 1 batch
compute_psa 4935290 1781966868
Processed 500000 super reads, irreducible 381804, processing 2222 super reads per second
Processed 1000000 super reads, irreducible 706011, processing 1818 super reads per second
Processed 1500000 super reads, irreducible 1042813, processing 1930 super reads per second
Processed 2000000 super reads, irreducible 1421777, processing 1779 super reads per second
Processed 2500000 super reads, irreducible 1876509, processing 1779 super reads per second
Mega-reads pass 2
Running locally in 1 batch
compute_psa 2280190 2472577921
Refining alignments
ERROR: failed to merge alignments at position 487
Please file a bug report
xargs: refine.sh: exited with status 255; aborting
Joining
awk: cmd. line:1: fatal: cannot open file `mr.41.15.15.0.02.all.txt' for reading (No such file or directory)
MaSuRCA-3.2.6/bin/mega_reads_assemble_cluster.sh: line 504: mr.41.15.15.0.02.all.txt: No such file or directory
mega-reads joining failed
[Tue Jul 24 22:56:40 UTC 2018] Assembly stopped or failed, see CA.mr.41.15.15.0.02.log
Can you give any advice?
Nick
Hi there,
I tried to use MaSuRCA-3.2.6 to assemble a genome(size about 500M). And we have 200x Illumina and 10x pacbio. But I meet a problem(log show below).
[Mon May 14 06:14:24 CEST 2018] Creating mer database for Quorum
terminate called after throwing an instance of 'jellyfish::large_hash::array_base<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, jellyfish::large_hash::array<jellyfish::mer_dna_ns::mer_base_static<unsigned long, 0>, unsigned long, atomic::gcc, allocators::mmap> >::ErrorAllocation'
what(): Failed to allocate 220000000000 bytes of memory
./assemble.sh: line 99: 47367 Exit 2 awk '{print substr($0,1,200)}' p1.renamed.fastq p2.renamed.fastq p3.renamed.fastq
47368 Aborted (core dumped) | quorum_create_database -t 40 -s
[Mon May 14 06:14:25 CEST 2018] Error correct PE.
./assemble.sh: line 106: 47371 Aborted (core dumped) quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)) --contaminant=/home/lin/liyanbo/Tools/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 40 -w 10 -e 3 -M quorum_mer_db.jf p1.renamed.fastq p2.renamed.fastq p3.renamed.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
[Mon May 14 06:14:25 CEST 2018] Error correction of PE reads failed. Check pe.cor.log.
And there is no pe.cor.log only have quorum.err which says that:
[2018/05/14 06:14:25] Loading mer database
terminate called after throwing an instance of 'std::runtime_error'
what(): Can't open 'quorum_mer_db.jf' for reading
Any advice to help me get up and running would be appreciated.
Hi,
I am running MaSurCa version 3.2.6 but I got numerous Broken pipe massages which probably lead that it failed.
Here is my config file:
# example configuration file
# DATA is specified as type {PE,JUMP,OTHER,PACBIO} and 5 fields:
# 1)two_letter_prefix 2)mean 3)stdev 4)fastq(.gz)_fwd_reads
# 5)fastq(.gz)_rev_reads. The PE reads are always assumed to be
# innies, i.e. --->.<---, and JUMP are assumed to be outties
# <---.--->. If there are any jump libraries that are innies, such as
# longjump, specify them as JUMP and specify NEGATIVE mean. Reverse reads
# are optional for PE libraries and mandatory for JUMP libraries. Any
# OTHER sequence data (454, Sanger, Ion torrent, etc) must be first
# converted into Celera Assembler compatible .frg files (see
# http://wgs-assembler.sourceforge.com)
DATA
#PE= pe 180 20 /FULL_PATH/frag_1.fastq /FULL_PATH/frag_2.fastq
#JUMP= sh 3600 200 /FULL_PATH/short_1.fastq /FULL_PATH/short_2.fastq
#read length 260
PE= pa 21 2 /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R1_001.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R2_001.fastq.gz
PE= pb 21 2 /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R1_002.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ004_NoIndex_L001_R2_002.fastq.gz
...
PE= pm 464 26 /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R1_001.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R2_001.fastq.gz
PE= pn 464 26 /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R1_002.fastq.gz /scratch/waterhouse_team/benth/illumina/SZ005_NoIndex_L002_R2_002.fastq.gz
...
#pacbio reads must be in a single fasta file! make sure you provide absolute path
PACBIO=/work/waterhouse_team/All_RawData/Benth/PacBio_gDNA/PacB_NbAll_Temp1R.fasta
#OTHER=/FULL_PATH/file.frg
END
PARAMETERS
#set this to 1 if your Illumina jumping library reads are shorter than 100bp
EXTEND_JUMP_READS=0
#this is k-mer size for deBruijn graph values between 25 and 127 are supported, auto will compute the optimal size based on the read data and GC content
GRAPH_KMER_SIZE = auto
#set this to 1 for all Illumina-only assemblies
#set this to 1 if you have less than 20x long reads (454, Sanger, Pacbio) and less than 50x CLONE coverage by Illumina, Sanger or 454 mate pairs
#otherwise keep at 0
USE_LINKING_MATES = 0
#specifies whether to run mega-reads correction on the grid
USE_GRID=0
#specifies queue to use when running on the grid MANDATORY
GRID_QUEUE=all.q
#batch size in the amount of long read sequence for each batch on the grid
GRID_BATCH_SIZE=300000000
#coverage by the longest Long reads to use
LHE_COVERAGE=30
#this parameter is useful if you have too many Illumina jumping library mates. Typically set it to 60 for bacteria and 300 for the other organisms
LIMIT_JUMP_COVERAGE = 300
#these are the additional parameters to Celera Assembler. do not worry about performance, number or processors or batch sizes -- these are computed automatically.
#set cgwErrorRate=0.25 for bacteria and 0.1<=cgwErrorRate<=0.15 for other organisms.
CA_PARAMETERS = cgwErrorRate=0.15
#minimum count k-mers used in error correction 1 means all k-mers are used. one can increase to 2 if Illumina coverage >100
KMER_COUNT_THRESHOLD = 1
#whether to attempt to close gaps in scaffolds with Illumina data
CLOSE_GAPS=0
#auto-detected number of cpus to use
NUM_THREADS = 8
#this is mandatory jellyfish hash size -- a safe value is estimated_genome_size*estimated_coverage
JF_SIZE = 544000000000
#set this to 1 to use SOAPdenovo contigging/scaffolding module. Assembly will be worse but will run faster. Useful for very large (>5Gbp) genomes from Illumina-only data
SOAP_ASSEMBLY=0
END
Here is the output:
Verifying PATHS...
jellyfish OK
runCA OK
createSuperReadsForDirectory.perl OK
nucmer OK
mega_reads_assemble_cluster.sh OK
creating script file for the actions...done.
execute assemble.sh to run assembly
[Tue Jun 26 12:54:26 AEST 2018] Processing pe library reads
gzip:
gzip: stdout: Broken pipe
stdout: Broken pipe
gzip: stdout: Broken pipe
...
[Tue Jun 26 13:36:55 AEST 2018] Average PE read length 250
[Tue Jun 26 13:37:02 AEST 2018] Using kmer size of 127 for the graph
cat: write error: Broken pipe
[Tue Jun 26 13:37:03 AEST 2018] MIN_Q_CHAR: 33
[Tue Jun 26 13:37:03 AEST 2018] Creating mer database for Quorum
awk: cmd. line:1: (FILENAME=pa.renamed.fastq FNR=680) fatal: print to "standard output" failed (Broken pipe)
./assemble.sh: line 143: 64061 Exit 2 awk '{print substr($0,1,200)}' pa.renamed.fastq pb.renamed.fastq pc.renamed.fastq pd.renamed.fastq pe.renamed.fastq pf.renamed.fastq pg.renamed.fastq ph.renamed.fastq pi.renamed.fastq pj.renamed.fastq pk.renamed.fastq pl.renamed.fastq pm.renamed.fastq pn.renamed.fastq po.renamed.fastq pp.renamed.fastq pq.renamed.fastq pr.renamed.fastq ps.renamed.fastq pt.renamed.fastq pu.renamed.fastq pv.renamed.fastq pw.renamed.fastq px.renamed.fastq py.renamed.fastq
64062 Killed | quorum_create_database -t 8 -s $JF_SIZE -b 7 -m 24 -q $((MIN_Q_CHAR + 5)) -o quorum_mer_db.jf.tmp /dev/stdin
[Tue Jun 26 13:40:57 AEST 2018] Error correct PE.
./assemble.sh: line 150: 64406 Aborted (core dumped) quorum_error_correct_reads -q $((MIN_Q_CHAR + 40)) --contaminant=/lustre/work-lustre/waterhouse_team/apps/MaSuRCA-3.2.6/bin/../share/adapter.jf -m 1 -s 1 -g 1 -a 3 -t 8 -w 10 -e 3 -M quorum_mer_db.jf pa.renamed.fastq pb.renamed.fastq pc.renamed.fastq pd.renamed.fastq pe.renamed.fastq pf.renamed.fastq pg.renamed.fastq ph.renamed.fastq pi.renamed.fastq pj.renamed.fastq pk.renamed.fastq pl.renamed.fastq pm.renamed.fastq pn.renamed.fastq po.renamed.fastq pp.renamed.fastq pq.renamed.fastq pr.renamed.fastq ps.renamed.fastq pt.renamed.fastq pu.renamed.fastq pv.renamed.fastq pw.renamed.fastq px.renamed.fastq py.renamed.fastq --no-discard -o pe.cor.tmp --verbose > quorum.err 2>&1
[Tue Jun 26 13:40:57 AEST 2018] Error correction of PE reads failed. Check pe.cor.log.
pe.cor.log
has not be generated.
What did I miss?
Thank you in advance.
Michal
I am running MaSuRCA on a cluster which supports SLURM. what should I do If I want to submit multiple jobs for hybrid assembly with illumina and pacbio?
Checking out the latest version 3.2.6b and building it at Ubuntu 16.4 produces the following error:
make[2]: Entering directory '/mnt/soft/masurca/build/global/MUMmer'
YAGGO tests/generate_sequences_cmdline.hpp
/mnt/soft/masurca/MUMmer/tests/generate_sequences_cmdline.yaggo:17: In Option genome-size|G: Option genome-size|G: Invalid unsigned integer '10M'
Makefile:2603: recipe for target 'tests/generate_sequences_cmdline.hpp' failed
make[2]: *** [tests/generate_sequences_cmdline.hpp] Error 1
make[2]: Leaving directory '/mnt/soft/masurca/build/global/MUMmer'
Makefile:849: recipe for target 'install-special' failed
make[1]: *** [install-special] Error 2
Yaggo has version number 1.5.10
Hi,
I am running a de novo hybrid assembly with illumina PE and pacbio long reads, and i'm facing a disk space issue. The intermediate files that MaSuRCA writes is consuming all my disk space and the assembly have failed several times because of it.
At this point, my assembly running the following script: /usr/local/MaSuRCA-3.2.4/bin/translateReduceFile.perl work1_mr1/superReadNames.txt work1_mr1/reduce.tmp > work1_mr1/reduce.tmp.renamed
I would like to know if there is any files that are safe to delete from the output files.
Thanks a lot,
Isabela
how to completely uninstall masurca?
thank you
I have a 1.2 GB plant genome with ~100X illumina plus ~5X pacbio. Overall 3.2.6b has been running on a 64 core machine for 25 days now, and has been in the stage 7 scaffolding for the past 18 days. Early stages made use of all the cores but cgw is running in only single thread. Is this expected behavior, and how long should I let it go before getting too worried and pulling the plug?
It does appear to be progressing, as the tail of cgw.out is showing both successes and failures for different scaffolds:
ExamineSEdgeForUsability_Interleaved()-- Interleaving failed, will not merge. isQualityScaffoldMergingEdge()-- Merge scaffolds 238878 (241194.0bp) and 246626 (8586182.0bp): gap -322821.5bp +- 6836.8bp weight 2 BA_AB edge isQualityScaffoldMergingEdge()-- Merge scaffolds 238878 (241194.0bp) and 246626 (8586182.0bp): FAIL LARGE NEGATIVE GAP isQualityScaffoldMergingEdge()-- NEW fail (545711/806589) isQualityScaffoldMergingEdge()-- Merge scaffolds 223313 (47479.4bp) and 246625 (8458792.0bp): gap -331469.1bp +- 5009.2bp weight 2 AB_BA edge isQualityScaffoldMergingEdge()-- scaffold 223313 instrumenter happy 154.0 gap 52.4 misorient close 0.0 correct 4.0 far 0.0 oriented close 0.0 far 6.0 missing 127.2 external 15.4 isQualityScaffoldMergingEdge()-- scaffold 246625 instrumenter happy 123185.0 gap 2799.6 misorient close 389.0 correct 838.0 far 2453.0 oriented close 81.0 far 5019.0 missing 119506.7 external 81.8 isQualityScaffoldMergingEdge()-- scaffold (new) instrumenter happy 123341.0 gap 2800.2 misorient close 389.0 correct 842.0 far 2453.0 oriented close 81.0 far 5025.0 missing 119697.1 external 81.8 isQualityScaffoldMergingEdge()-- before: 0.490 satisfied (123338/128423 good/bad mates) after: 0.490 satisfied (123340/128487 good/bad mates) isQualityScaffoldMergingEdge()-- ARE happy enough to merge 101 (0.490 >= 0.975) || (0.490 >= 0.490) || ((123340 > 123338) && (32.000 <= 0.300)) isQualityScaffoldMergingEdge()-- NEW pass (545712/806589) ExamineSEdgeForUsability_Interleaved()-- Interleaving failed, will not merge. isQualityScaffoldMergingEdge()-- Merge scaffolds 194762 (20141.8bp) and 246626 (8586182.0bp): gap -336435.0bp +- 5923.6bp weight 2 AB_AB edge isQualityScaffoldMergingEdge()-- scaffold 194762 instrumenter happy 108.0 gap 6.5 misorient close 1.0 correct 4.0 far 0.0 oriented close 0.0 far 6.0 missing 179.7 external 37.8 isQualityScaffoldMergingEdge()-- scaffold 246626 instrumenter happy 118815.0 gap 3212.7 misorient close 441.0 correct 951.0 far 2730.0 oriented close 98.0 far 5559.0 missing 118110.4 external 111.9 isQualityScaffoldMergingEdge()-- scaffold (new) instrumenter happy 118925.0 gap 3209.7 misorient close 442.0 correct 956.0 far 2733.0 oriented close 98.0 far 5573.0 missing 118309.4 external 111.9 isQualityScaffoldMergingEdge()-- before: 0.481 satisfied (118922/128080 good/bad mates) after: 0.481 satisfied (118924/128111 good/bad mates) isQualityScaffoldMergingEdge()-- ARE happy enough to merge 101 (0.481 >= 0.975) || (0.481 >= 0.481) || ((118924 > 118922) && (15.500 <= 0.300)) isQualityScaffoldMergingEdge()-- NEW pass (545713/806589) ExamineSEdgeForUsability_Interleaved()-- Interleaving failed, will not merge. isQualityScaffoldMergingEdge()-- Merge scaffolds 182770 (45783.6bp) and 246625 (8458792.0bp): gap -338272.0bp +- 4727.2bp weight 2 AB_BA edge
Hi,
The values in the meanAndStdevByPrefix.sj.txt file are different from what I provided in the config.txt file.
All of my mate-pair libraries are given a mean of 500 and a standard deviation of 100 in the meanAndStdevByPrefix.sj.txt file. Is this an expected behavior?
The values I provided for my paired-end data are correctly displayed in meanAndStdevByPrefix.pe.txt.
swig/perl5/swig_wrap.cpp:341:20: fatal error: string.h: No such file or directory
#include <string.h>
I'm wondering how robust Masurca is to restarting after getting killed during the assembly step. Basically, my cluster has a queue that is preemptable so jobs can be killed and restarted if a higher priority job gets assigned to the node it is running on.
I have three samples I'm assembling. The consensus step seems to take a ton of time, thus it has been preempted in 2/3 assemblies. The 5-consensus/ from the assembly that was not preempted has outputs like this:
[user@login3 assemblies]$ ls SAMPLE1_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus|tail -n 20
genome.129.iid
genome_129.success
genome_130.cns.err
genome.130.fa
genome_130.fix.err
genome_130.fixes
genome.130.iid
genome_130.success
genome_131.cns.err
genome.131.fa
genome_131.fix.err
genome_131.fixes
genome.131.iid
genome_131.success
genome.fixes
genome.fixes.err
genome.partitioned
genome.partitioned.err
genome.sampling
genome.sampling.dat
Another is missing .fa .iid .lay files for the last 2 iterations and has some extra files for earlier iterations:
[user@login3 assemblies]$ ls SAMPLE2_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus|tail -n 40
genome.130.iid
genome.130.lay
genome_130.success
genome.130.tmp.layout
genome_131.cns.err
genome.131.fa
genome.131.fasta
genome.131.fasta.qual
genome.131.fasta.qv
genome_131.fix.err
genome_131.fixes
genome.131.iid
genome.131.lay
genome_131.success
genome.131.tmp.layout
genome_132.cns.err
genome.132.fa
genome.132.fasta
genome.132.fasta.qual
genome.132.fasta.qv
genome_132.fix.err
genome_132.fixes
genome.132.iid
genome.132.lay
genome_132.success
genome.132.tmp.layout
genome_133.cns.err
genome_133.fix.err
genome_133.fixes
genome_133.success
genome_134.cns.err
genome_134.fix.err
genome_134.fixes
genome_134.success
genome.fixes
genome.fixes.err
genome.partitioned
genome.partitioned.err
genome.sampling
genome.sampling.dat
Finally, the third assembly is still running and has been stuck on making the final .cns.err for 40 hrs.
[earlm1@login3 assemblies]$ ll SAMPLE3_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus|tail -n 20
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_129.success
-rw-r--r-- 1 earlm1 schlenke 47970 May 1 16:49 genome_130.cns.err
-rw-r--r-- 1 earlm1 schlenke 269 May 1 16:49 genome_130.fix.err
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_130.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_130.success
-rw-r--r-- 1 earlm1 schlenke 43216 May 1 16:49 genome_131.cns.err
-rw-r--r-- 1 earlm1 schlenke 269 May 1 16:49 genome_131.fix.err
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_131.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_131.success
-rw-r--r-- 1 earlm1 schlenke 41547 May 1 16:49 genome_132.cns.err
-rw-r--r-- 1 earlm1 schlenke 269 May 1 16:49 genome_132.fix.err
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_132.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 16:49 genome_132.success
-rw-r--r-- 1 earlm1 schlenke 18943 May 1 17:06 genome_133.cns.err
-rw-r--r-- 1 earlm1 schlenke 1972 May 1 17:44 genome_133.fix.err
-rw-r--r-- 1 earlm1 schlenke 7951113 May 1 17:44 genome_133.fixes
-rw-r--r-- 1 earlm1 schlenke 0 May 1 17:44 genome_133.success
-rw-r--r-- 1 earlm1 schlenke 1142416 May 3 14:27 genome_134.cns.err
-rw-r--r-- 1 earlm1 schlenke 0 Apr 24 17:37 genome.partitioned
-rw-r--r-- 1 earlm1 schlenke 0 Apr 24 16:59 genome.partitioned.err
The file is largely just a list of alignment failures.
[user@login3 assemblies]$ tail SAMPLE3_masurca.nxtrim/CA.mr.41.15.17.0.029/5-consensus/genome_134.cns.err
MultiAlignUnitig()-- failed to align fragment 59091605 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 45901675 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 43236611 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 10799714 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 8139984 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 49529163 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 36254081 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 30386433 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 13041716 in unitig 6690241.
MultiAlignUnitig()-- failed to align fragment 39923494 in unitig 6690241.
Question 1: should I kill the SAMPLE3 assembly, remove the 5-consensus dir and restart (and prevent it from getting preempted)
Question 2: should i trust the assembly of SAMPLE2? It eventually finished the consensus step and continued through to produce a scaffold file that doesn't seem obviously messed up.
Thanks,
Earl
Hi. Are you willing to make a Masurca binary available for Ubuntu? I'm interested in using Masurca for transcript assembly as described in the StringTie paper, but I'm surrendering after spening a couple of hours trying to install it to no avail.
The first challenging dependency is gcc. I tried the Ubuntu default (7.2.0) and it complains about -V and -qversion. I commented those out but ran into the problems with boost (below). Because of the cryptic fatal error messages, I also tried installing gcc v4.7 for Ubuntu but it behaves the same way as 7.2.0 (doesn't recognize -V and -qversion). There doesn't seem to be an Ubuntu option at CERN.
The second challenging dependency is boost. FYI, it is not mentioned here. I installed boost and tried pointing install.h to that folder, but it reports that it can't find a working installation of boost:
configure: Detected BOOST_ROOT; continuing with --with-boost=/usr/include/boost/
checking for Boost headers version >= 1.46.0... no
configure: cannot find Boost headers version >= 1.46.0
## ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ ##
## Could not find a working installation of Boost. Set BOOST_ROOT to the path where the Boost headers are installed or set BOOST_ROOT=install to have it downloaded from the Internet and installed locally. For example: BOOST_ROOT=install ./install.sh ##
The same thing happens if I modify install.sh and pass that directory directly to configure (./configure ... --with-boost=...).
I copied the boost directory into if global-1 and got a different error:
checking for Boost headers version >= 1.46.0... /usr/include
checking for Boost's header version... 1_58
checking boost/icl/interval_set.hpp usability... no
checking boost/icl/interval_set.hpp presence... yes
configure: WARNING: boost/icl/interval_set.hpp: present but cannot be compiled
configure: WARNING: boost/icl/interval_set.hpp: check for missing prerequisite headers?
configure: WARNING: boost/icl/interval_set.hpp: see the Autoconf documentation
configure: WARNING: boost/icl/interval_set.hpp: section "Present But Cannot Be Compiled"
configure: WARNING: boost/icl/interval_set.hpp: proceeding with the compiler's result
configure: WARNING: ## ------------------------------- ##
configure: WARNING: ## Report this to [email protected] ##
configure: WARNING: ## ------------------------------- ##
checking for boost/icl/interval_set.hpp... no
configure: error: cannot find boost/icl/interval_set.hpp
Thanks.
bz
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.