gatb / mindthegap Goto Github PK

MindTheGap is a SV caller for short read sequencing data dedicated to insertion variants (all sizes and types). It can also be used as a local assembly tool.

License: GNU Affero General Public License v3.0

CMake 1.85% C++ 74.62% Python 15.90% Shell 7.12% Dockerfile 0.51%

bioinformatics debruijn-graph gatb genomics structural-variants

mindthegap's People

Contributors

Stargazers

Watchers

Forkers

ysard cguyomar shayandoust matt-shenton simoneperazzoli markkun seedpcseed anne-gcd daylily-informatics genostack

mindthegap's Issues

Config Files

Hi, I'm running MindTheGap in a cluster but it consumes up the space in my /home directory by writing a lot of trashme_* files. I'm using MindTheGap for 3k rice genomes. Are these files necessary? Can we disable them?

Memory usage of fill module

Is there a way to limit the memory consumption of the fill module? I have samples which are allocation ~40-50GB of RAM (for sample of only ~2 million 150 bp paired end reads).

Extremely Large Run-Time in 'Contig-Fill' Mode

Hi, I'm running MindTheGap version 2.2.2 in the contig gap-filling mode. I gave it 28 threads to run off of and started it on May 27. I logged into my computer to check for any updates, and it estimated a remaining time of 115,085 minutes (just under 80 days). Is this what should be expected?

As a note - I am threading this process, not parallelizing it.

result issue

I have a problem with the result of MindTheGap.
I simulated 1000 variants in chr15.fa including 524 insertions and 476 deletions with SURVIVOR and ART. I got the result with MindTheGap find and fill mode, just like the README shown.
MindTheGap find -in pair-end1.fq,pair-end2.fq -ref ../chr15/chr15.fa -out mindthegap MindTheGap fill -graph mindthegap.h5 -bkpt mindthegap.breakpoints -out mind-result
Finally, I got 507 insertions in mind-result.insertion.vcf. The breakpoints shown in vcf file is very diffenent from the simulated data. Does the points in vcf file correspond to the simulated insertion breakpoints?
Did I miss something or make something wrong?
Hope you reply ASAP and I'm grateful if you give me some clues.

No license information for src/CircularBuffer.hpp

Hello,

My name is Shayan Doust1, a contributor to the Debian-Med team2. I have packaged MindTheGap3, however uploading efforts are unsuccessful as there is no licensing information in src/CircularBuffer.hpp.

Could you please clarify the licensing information within this file? Right now, it only contains a copyright holder but no licensing information. Ideally, include the licensing information in this file (just like the other source files) and generate a new release when you are ready. That way, I can simply integrate the new changes within the package and try for another upload to the Debian repository.

Kind regards,
Shayan Doust

EXCEPTION: Failure because of unhandled kmer size 128

Thanks a lot for this great tool.

I wanted to try kmer size 128, but it seems not to work:

EXCEPTION: Failure because of unhandled kmer size 128

-kmer-size 96 works fine (using my real data, I can find a known breakpoint - although the insertion is too large to be assembled (~9kb))

I tried both MindTheGap-v2.2.1-bin-Linux.tar.gz
and cloning from github.

Using Centos 7.

Thanks in advance for your help

Best wishes

Matt Shenton

simple test fails on my machine

All tests pass except:

n-in-solid-stretch : FAILED
n-after-clean-insert : FAILED

Exception: Hash16: max size for this hash is 2^32, but ask for 33

Hello,

I ran MindTheGap on a high coverage (~200x) whole human genome data with a command like this:
./MindTheGap find -in S1_1.fastq.gz,S1_2.fastq.gz,S2_1.fastq.gz,S2_2.fastq.gz,...,S18_1.fastq.gz,S18_2.fastq.gz -ref human_g1k_v37.fasta -nb-cores 72 -max-memory 200000 -out SAMPLE

and got this exception after running for quite a while:

"EXCEPTION: Hash16: max size for this hash is 2^32, but ask for 33."

What might cause this problem. Did I misuse the computational parameters again? (The machine has
99 cores and 1TB memory.)

Thank you very much,
pinar

ERROR: Unknown parameter '-contig'

I am running MindTheGap in 'contig gap-filling' mode and am attempting to run this command:

'''
MindTheGap fill -nb-cores ${task.cpus} -in 18-01_reads.fq -contig 18-01_assembly.fa -kmer-size 51 -abundance-min 5 -max-nodes 300 -max-length 50000 -out 18-01_gapFilled
'''

However, I keep receiving this error:
'''
ERROR: Unknown parameter '-contig'
ERROR: Unknown parameter '18-01_assembly.fa'
'''
All my reads were gzipped at first to conserve space, and I initially thought that the program could only handle unzipped files, so I gunzipped all of them and re-ran the command, but I am still receiving this same error. I am running this program on an HPC through a container that I downloaded from https://quay.io/repository/biocontainers/mindthegap?tab=tags.

Thank you for your help,
Ashley

./MindTheGap 'fill' takes too long time for WGS data

Hi,
I ran MindTheGap on a whole genome sequence data(30x, paired-end 101bp data). The 'find' part has ended in a reasonable time. I ran the 'fill' part similar to the following command:

./MindTheGap fill -graph example.h5 -bkpt example.breakpoints -out example -max-memory 500000 -max-disk 1000000 -nb-cores 84

I increased the -max-memory, -max-disk, and -nb-cores just to speed up the process (The machine has 96 cores (I did not want to use all of the cores), 1TB memory, and more than 2TB disk space).

After ~4,5 - 5 hours, I get this message as time estimate:
[Filling breakpoints ] 1.03 % elapsed: 288 min 1 sec remaining: 27789 min 39 sec

which makes 19 more days! Am I doing something wrong? How can I speed up the 'fill' function?

Thank you very much for you help!

Readme issue

-no-[type]: to disable the detection of certain types of variants.

It's not clear that e.g. " -no-snp" is an option, as [type] is never defined

installation issue

When I install MindeTheGap it produces " /usr/lib64/libstdc++.so.6: version `GLIBCXX_3.4.15' not found", but I have no permission to install it on server. Is there any way to go forward?