Comments (13)
Please port CombineVariants. It is really really a good tool.
from gatk.
I wonder if users mainly just want something that adds the "set" annotation, so that you can do things like -select 'set == "Intersection"'
in SelectVariants
...
from gatk.
I didn't want to port CombineVariants because it has some really insidious behavior when it comes to choosing attributes that differ between/among the variants being combined. I did have to use it myself recently, so I appreciate its utility. Someone suggested I use bcftools merge instead. When I did I found that bcftools was at least two orders of magnitude faster and addressed my use case perfectly. Is there CombineVariants functionality that is not represented in bcftools?
from gatk.
@ldgauthier
Hi, can you share the bcftools merge
code for different callers' results here? Thanks a lot.
from gatk.
the requirement is to port the CombineVariants tool and tests. Current tests use Broad-visible data and that data needs to be reviewed and shared publicly if possible.
from gatk.
@lbergelson Can we close this issue since we now use MergeVcfs
in the picard package? 3 1/2 years is a long time to be high priority. . .
from gatk.
Users have shown interest in porting CombineVariants to GATK4. Features such as, combining vcfs with variants present in all or a fraction of samples exists in CombineVariants but not in MergeVcfs/SelectVariants. It would be helpful to look into this.
from gatk.
The behavior of the GATK3 CombineVariants was very inconsistent and the arguments weren't entirely clear. I also suspect that some operations weren't possible with the arguments given. Rather than port that old broken version, I would advocate for an overhaul or rewrite.
@bhanugandham it's going to be a big project to collect requirements and expected behavior for this tool. For example, what should the MQ be for the combined VCF for two different input VCFs with different MQ values? Much of the confusion stemmed from the old ability to merge VCFs containing the same sample. In the case where we take one genotype for each sample name (e.g. the old -genotypeMergeOptions PRIORITIZE
) then I believe the old behavior was wrong in some cases, taking the filter status from an input VCF at random. We also need to clarify FilteredRecordMergeType
options, e.g. https://github.com/broadinstitute/gsa-unstable/issues/935
from gatk.
@ldgauthier what do you suggest the course of action should be? I mean for when users ask for certain feature that were in CombineVariants but not in MergeVcfs? Should I just create a new issue for each argument request?
from gatk.
I have asked the users who requested porting CombineVariants for specific features that do not exist in MergeVcfs or other tools in gatk4, and are useful to them.
I will post results from that here, that should help make a decision.
from gatk.
In #2489 @vdauwera mentions that CombineVariants
will not be ported, but in this ticket @bhanugandham 's comment makes it sound like this decision is not final yet. We would like to use the functionality of CombineVariants
in a new pipeline we are implementing. What is our best option? Should we use the old GATK 3?
from gatk.
@pieterlukasse we are not looking into porting CombineVariants. If you have to use CombineVariants, it is available in the old GATK3 version. However, unfortunately we will not be able to provide support it.
from gatk.
Please do!
from gatk.
Related Issues (20)
- BwaSpark parameter optimization HOT 1
- About DP4 HOT 1
- MarkDuplicates results in Cannot invoke "htsjdk.samtools.SAMReadGroupRecord.getReadGroupId()" HOT 2
- What about this GATK 4 pipeline script, written by Chat-GPT HOT 1
- Follow up on CNN deprecation done in the update to python 3.10. HOT 2
- Problem with PathSeqPipelineSpark : Not generating bam
- [question] Are large files only required for tests, or also required at build and run-time ? HOT 4
- gatk Funcotator error HOT 1
- CreateSomaticPanelOfNormals: multiallelic sites wrongly added to PON despite --min-sample-count set to total input samples
- GenotypeGVCFs memory issues on GATK 4.6.0.0 HOT 12
- Tests should print per-test status, otherwise it is difficult to see what tests fail or are skipped
- GermlineCNVCaller - python exited with 2 HOT 1
- Tests fail to find libgkl libraries in /usr/local/lib
- 301 tests fail, 37 are skipped
- HaplotypeCaller is reporting DP in HOMREF region differently when ploidy is set to 1 with different Interval inputs HOT 2
- PreprocessIntervals missing results HOT 1
- SortSamSpark Required array length is too large HOT 5
- Convergence Error running GATK GermlineCNVCaller cohort mode HOT 5
- alt allels error HOT 1
- Discrepancy Between IGV and VCF File for HaplotypeCaller HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gatk.