Comments (1)
It won't be able to run any faster than BWA mem does with a similar number of cores, since it is essentially just running bwamem. It's potentially faster as part of a spark pipeline so you can load and process data once instead of saving the data to disk and reloading it repeatedly.
The complete list of spark configuration parameters is available on the spark docs. Many of them are not relevant in local mode. From what I understand the local mode is going to execute as a single executor with the number of cores specified in the local[#]
block ( or the total number of system threads if it's set to *
) It will use the available memory that java is configured with. I'm pretty sure it's ignoring the memory and configuration parameters you've set. Those will be relevant if you configure a stand alone spark cluster (potentially one running exclusively on your local machine).
Our spark tools are not being actively developed for the most part. We've moved away from them to use single threaded tools widely sharded and managed by cromwell. The additional complexity of the spark environment made it hard to see much benefit when most of the tools are embarassingly parallel and easily shardable.
from gatk.
Related Issues (20)
- java.lang.IllegalArgumentException: the number of genotypes is too large for ploidy 8 and 55 alleles: approx. 3381098545 HOT 3
- Funcotator - WARN GencodeFuncotationFactory - Cannot create complete funcotation for variant at chr....
- several genes are reported in "PREDICTED_LOF" for a balanced translocation HOT 3
- Docker container should allow use by non-privileged user HOT 2
- Funcotator gnomAD incoherent number of output fields
- CombineGVCFs meet error HOT 2
- Empty BAM after running SplitNCigarReads HOT 4
- Troubleshooting VCF Output Truncation Issue during GATK CombineGVCFs Process HOT 1
- GATK Tutorial#11682 reproduce different results HOT 2
- SoftClippedReadFilter Shows Filtering Result Opposite to Description. HOT 1
- About DP4 HOT 1
- MarkDuplicates results in Cannot invoke "htsjdk.samtools.SAMReadGroupRecord.getReadGroupId()" HOT 2
- What about this GATK 4 pipeline script, written by Chat-GPT HOT 1
- Follow up on CNN deprecation done in the update to python 3.10. HOT 2
- Problem with PathSeqPipelineSpark : Not generating bam
- [question] Are large files only required for tests, or also required at build and run-time ? HOT 4
- gatk Funcotator error HOT 1
- CreateSomaticPanelOfNormals: multiallelic sites wrongly added to PON despite --min-sample-count set to total input samples
- GenotypeGVCFs memory issues on GATK 4.6.0.0 HOT 12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gatk.