Comments (5)
@DarioS How much memory are you providing to Java via the -Xmx
option, and how much physical memory do you have available? You can see how to pass the -Xmx
option in to GATK here: https://github.com/broadinstitute/gatk?tab=readme-ov-file#jvmoptions
from gatk.
-Xmx52g
was used. Compute node has 1.5 TB physical RAM. I use af-only-gnomad.hg38.vcf.gz
for -V
and -L
.
from gatk.
@DarioS You could try increasing the size of the Java heap (say, doubling it to 104g). Does your bam/cram have extremely high depth?
from gatk.
I copied 60× BAM file to an interactive Linux server with 768 GB physical RAM and eighty cores and used version 4.5.0.0.
%Cpu(s): 1.3 us, 0.0 sy, 0.1 ni, 98.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
GiB Mem : 754.5 total, 52.1 free, 107.3 used, 600.3 buff/cache
GiB Swap: 931.3 total, 924.9 free, 6.4 used. 647.3 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
171365 dario 20 0 35.0g 31.3g 23040 S 100.0 4.1 32:18.12 java
I removed -Xmx
and using top
to see the process is consistently at about 32 GB. So, -Xmx
is irrelevant to the problem.
12:15:04.531 INFO ProgressMeter - Current Locus Elapsed Minutes Loci Processed Loci/Minute
12:57:32.208 INFO GetPileupSummaries - Shutting down engine
[January 13, 2024 at 12:57:32 PM AEDT] org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 50.36 minutes.
Runtime.totalMemory()=20753416192
java.lang.OutOfMemoryError: Java heap space: failed reallocation of scalar replaced objects
What does "reallocation of scalar replaced objects" mean? I don't think it could possibly have run out of memory.
from gatk.
I am in a similar boat. Xmx has a default value which is small. Using a specified 448 GB limit shows that this module is inefficient.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
172833 thind 20 0 468.7g 378.7g 31360 S 99.9 50.2 29:16.31 java
The analysis dies a few seconds later because GATK tries to create impossibly-large Java array.
org.broadinstitute.hellbender.tools.walkers.contamination.GetPileupSummaries done. Elapsed time: 25.63 minutes.
Runtime.totalMemory()=481036337152
java.lang.OutOfMemoryError: Required array length 2147483640 + 16 is too large
at java.base/jdk.internal.util.ArraysSupport.hugeLength(ArraysSupport.java:649)
at java.base/jdk.internal.util.ArraysSupport.newLength(ArraysSupport.java:642)
I can independently reproduce Dario's problem on the same Linux server.
from gatk.
Related Issues (20)
- java.lang.IllegalArgumentException: the number of genotypes is too large for ploidy 8 and 55 alleles: approx. 3381098545 HOT 3
- Funcotator - WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr....
- several genes are reported in "PREDICTED_LOF" for a balanced translocation HOT 3
- Docker container should allow use by non-privileged user HOT 2
- Funcotator gnomAD incoherent number of output fields
- CombineGVCFs meet error HOT 2
- Empty BAM after running SplitNCigarReads HOT 4
- Troubleshooting VCF Output Truncation Issue during GATK CombineGVCFs Process HOT 1
- GATK Tutorial#11682 reproduce different results HOT 2
- SoftClippedReadFilter Shows Filtering Result Opposite to Description. HOT 1
- BwaSpark parameter optimization HOT 1
- About DP4 HOT 1
- MarkDuplicates results in Cannot invoke "htsjdk.samtools.SAMReadGroupRecord.getReadGroupId()" HOT 2
- What about this GATK 4 pipeline script, written by Chat-GPT HOT 1
- Follow up on CNN deprecation done in the update to python 3.10. HOT 2
- Problem with PathSeqPipelineSpark : Not generating bam
- [question] Are large files only required for tests, or also required at build and run-time ? HOT 3
- gatk Funcotator error HOT 1
- CreateSomaticPanelOfNormals: multiallelic sites wrongly added to PON despite --min-sample-count set to total input samples
- GenotypeGVCFs memory issues on GATK 4.6.0.0 HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gatk.