Comments (19)
Is there a good reason this should remain a separate tool from CombineVariants(#17)? It sounds like it was mostly separate because of issues with interval handling.
from gatk.
There is likely common code between the two, but we would need to pre-process the inputs to determine whether they are GVCFs: the merging algorithm is different for the two cases even if all input variants at a site are non-reference-blocks: see ReferenceConfidenceVariantContextMerger.merge()
vs GATKVariantContextUtils.simpleMerge()
from gatk.
Another major difference is that CombineGVCFs assumes distinct samples, whereas CombineVariants has the ability to merge variant calls for the same samples, but they really never works how you expect on the first try.
from gatk.
@ldgauthier What do you mean by never really works on the first try? You get incorrect results?
from gatk.
The important part being "how you expect". The default behavior is a little
confusing. And then I got into a scenario where it took the filtering
status from one set of variants and the genotypes from the other and that
was horrific, so I guess that part is incorrect.
On Fri, Dec 12, 2014 at 11:46 AM, lbergelson [email protected]
wrote:
@ldgauthier https://github.com/ldgauthier What do you mean by never
really works on the first try? You get incorrect results?—
Reply to this email directly or view it on GitHub
#16 (comment)
.
from gatk.
We definitely need a way to combine vcfs that does the right thing and isn't horrible to use.
from gatk.
I'm in favor of doing a ruthless refactor involving changing some of the
default behavior and prohibiting contradictory options.
On Fri, Dec 12, 2014 at 11:59 AM, lbergelson [email protected]
wrote:
We definitely need a way to combine vcfs that does the right thing and
isn't horrible to use.—
Reply to this email directly or view it on GitHub
#16 (comment)
.
from gatk.
Yeah, it might be better to redo both of these mostly from scratch.
from gatk.
CombineGVCFs is not a long term solution. It'll be replaced by the likelihood store.
from gatk.
@eitanbanks I believe they're discussing CombineVariants. This just happens to be in the thread of CombineGVCFs bc there was some question of whether the two were redundant or not.
from gatk.
I know. My point was that it's not worth merging them since 1 will be going bye bye.
from gatk.
But what will users do who run GATK on their own machines? Will there be a way for them to set up their own likelihood stores? If not, they'll need CombineGVCFs.
from gatk.
We aren't building Prometheus for people to run on their own machines. If they want to run large projects in the future then they should theoretically do so through the Broad (and not necessarily for $; we are talking about making it free in many cases). This is the "analysis as a service" that Matter was talking about.
Obviously there are many, many details to work out. But rest assured knowing that you will get to help work out those details. :)
from gatk.
We're also not building Prometheus for people who use GATK on bacteria, or plants, or weird cave-dwelling fish, so I expect that the analysis service will not be available to them. Should they turn to another platform? I think it would be a big mistake (re: science, re: public perception and re: competition with other software) to make the new GATK effectively not runnable outside the Broad infrastructure. I understand the need to focus on primary objectives, but there is a real risk of shooting ourselves in the foot if the vision is too narrow.
So yes, I look forward to helping work out these details :)
from gatk.
Great points all around. This is the type of discussion to bring to @mmtrun (e.g. at a future town hall meeting) for him to digest and consider.
from gatk.
@eitanbanks here it is
from gatk.
The requirement is to port the CombineGVCFs tool and the tests (test data needs to be reviewed for sharability and made public if possible)
from gatk.
Feel free to bounce this back if you want to an we can talk about how to proceed.
from gatk.
this was mismarked as beta. It's an alpha goal
from gatk.
Related Issues (20)
- SoftClippedReadFilter Shows Filtering Result Opposite to Description. HOT 1
- BwaSpark parameter optimization HOT 1
- About DP4 HOT 1
- MarkDuplicates results in Cannot invoke "htsjdk.samtools.SAMReadGroupRecord.getReadGroupId()" HOT 2
- What about this GATK 4 pipeline script, written by Chat-GPT HOT 1
- Follow up on CNN deprecation done in the update to python 3.10. HOT 2
- Problem with PathSeqPipelineSpark : Not generating bam
- [question] Are large files only required for tests, or also required at build and run-time ? HOT 4
- gatk Funcotator error HOT 1
- CreateSomaticPanelOfNormals: multiallelic sites wrongly added to PON despite --min-sample-count set to total input samples
- GenotypeGVCFs memory issues on GATK 4.6.0.0 HOT 12
- Tests should print per-test status, otherwise it is difficult to see what tests fail or are skipped
- GermlineCNVCaller - python exited with 2 HOT 1
- Tests fail to find libgkl libraries in /usr/local/lib
- 301 tests fail, 37 are skipped
- HaplotypeCaller is reporting DP in HOMREF region differently when ploidy is set to 1 with different Interval inputs HOT 2
- PreprocessIntervals missing results HOT 1
- SortSamSpark Required array length is too large HOT 5
- Convergence Error running GATK GermlineCNVCaller cohort mode HOT 5
- alt allels error HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gatk.