Comments (5)
Oddly, after uncompressing the vcf file and re-running, I know get a std::out_of_range
error in the varMer.C file. It seems like an issue if there is nothing in kstrs
, so calling kstrs.at(0)
throws the out of range.
This attempt loaded the variants seemingly correctly with
Loaded 7017436 records with 683 unique contig(s) from 712 contig IDs.
while excluding 0 invalid records`
I also tried recompressing the vcf file with bcftools and gzip, and each time it gave different values for the contigs with most records invalid. Not sure what is causing that.
I cut the stacktrace ignoring the std lib allocator fails, but the relevant part is
Generating fasta index.
Processing 'NC_006853.1_RagTag'
terminate called after throwing an instance of 'std::out_of_range'
...
0x000000000041bf5b in varMer::getAvgAbsK (this=0x1a8a2ed80, idx=0)
at merfin/varMer.C:399
#10 0x00000000004068bc in varMers (
seqName=0x7fffffffcab6 "asm.scaffolds.fasta", sfile=0x1a8a02b10,
vfile=0x664650, rlookup=0x65fdd0, alookup=0x65ffb0,
out=0x7fffffffcb46 "test_merfin.fa", comb=15, nosplit=false,
copyKmerDict=..., bykstar=false, threads=8) at merfin/merfin.C:617
#11 0x0000000000408b98 in main (argc=18, argv=0x7fffffffc438)
at merfin/merfin.C:988
from merfin.
Hello @ASLeonard, thanks for tracking this down.
Could you try with an uncompressed vcf, filtering out RefCalls and try again?
from merfin.
Filtering out refcalls fixes the problem, it successfully generates the debug and polish.vcf files. Thanks for that tip.
I tried again by compressing the filtered vcf, and ran into the same weird issues of near random contig numbers and mostly invalid records.
from merfin.
I just tried out the new commits, and it now works correctly reading compressed vcf. I also tried using the original unfiltered vcf (containing RefCalls), and this also worked fine. The updated meryl version is great as well.
It is out of scope for this issue, but the varMer section still doesn't appear to be threading correctly. I added back the private(varMerId)
to see if that was causing the serialisation, but it didn't fix it. I added an extra call here to see which thread was being used, and it does use different threads, but only in serial.
Processing 'ptg000283l' on thread 6
Processing 'ptg000604l' on thread 12
Processing 'ptg000502l' on thread 10
Processing 'ptg000457l' on thread 9
Processing 'ptg000606l' on thread 12
I'll hold off on opening a new issue for this as it looks like a work in progress already.
Thanks!
from merfin.
Yes, there was an issue regarding compressed input vcf files, which is now fixed with commit 012a51f.
Feel free to try the latest version, and let me know if you see any additional issues!
from merfin.
Related Issues (13)
- -peak is haploid or diploid? HOT 1
- the choice of --peak HOT 2
- Seg Fault? HOT 3
- diploid assembly, haploid reads HOT 4
- Polish_genome HOT 1
- Failed with 'Aborted'; backtrace (libbacktrace): HOT 7
- Adjusted QV values are very low and not improved by merfin HOT 30
- null output in hist and dump due to missing seek operation HOT 5
- Input for cartesian plot HOT 2
- the usage and meaning of the plot in complele human genome article HOT 2
- merfin doesnt respect -threads option HOT 4
- Using gff file with merfin ? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from merfin.