Coder Social home page Coder Social logo

port CombineGVCFs about gatk HOT 19 CLOSED

broadinstitute avatar broadinstitute commented on August 23, 2024
port CombineGVCFs

from gatk.

Comments (19)

lbergelson avatar lbergelson commented on August 23, 2024

Is there a good reason this should remain a separate tool from CombineVariants(#17)? It sounds like it was mostly separate because of issues with interval handling.

from gatk.

jmthibault79 avatar jmthibault79 commented on August 23, 2024

There is likely common code between the two, but we would need to pre-process the inputs to determine whether they are GVCFs: the merging algorithm is different for the two cases even if all input variants at a site are non-reference-blocks: see ReferenceConfidenceVariantContextMerger.merge() vs GATKVariantContextUtils.simpleMerge()

from gatk.

ldgauthier avatar ldgauthier commented on August 23, 2024

Another major difference is that CombineGVCFs assumes distinct samples, whereas CombineVariants has the ability to merge variant calls for the same samples, but they really never works how you expect on the first try.

from gatk.

lbergelson avatar lbergelson commented on August 23, 2024

@ldgauthier What do you mean by never really works on the first try? You get incorrect results?

from gatk.

ldgauthier avatar ldgauthier commented on August 23, 2024

The important part being "how you expect". The default behavior is a little
confusing. And then I got into a scenario where it took the filtering
status from one set of variants and the genotypes from the other and that
was horrific, so I guess that part is incorrect.

On Fri, Dec 12, 2014 at 11:46 AM, lbergelson [email protected]
wrote:

@ldgauthier https://github.com/ldgauthier What do you mean by never
really works on the first try? You get incorrect results?


Reply to this email directly or view it on GitHub
#16 (comment)
.

from gatk.

lbergelson avatar lbergelson commented on August 23, 2024

We definitely need a way to combine vcfs that does the right thing and isn't horrible to use.

from gatk.

ldgauthier avatar ldgauthier commented on August 23, 2024

I'm in favor of doing a ruthless refactor involving changing some of the
default behavior and prohibiting contradictory options.

On Fri, Dec 12, 2014 at 11:59 AM, lbergelson [email protected]
wrote:

We definitely need a way to combine vcfs that does the right thing and
isn't horrible to use.


Reply to this email directly or view it on GitHub
#16 (comment)
.

from gatk.

jmthibault79 avatar jmthibault79 commented on August 23, 2024

Yeah, it might be better to redo both of these mostly from scratch.

from gatk.

eitanbanks avatar eitanbanks commented on August 23, 2024

CombineGVCFs is not a long term solution. It'll be replaced by the likelihood store.

from gatk.

vdauwera avatar vdauwera commented on August 23, 2024

@eitanbanks I believe they're discussing CombineVariants. This just happens to be in the thread of CombineGVCFs bc there was some question of whether the two were redundant or not.

from gatk.

eitanbanks avatar eitanbanks commented on August 23, 2024

I know. My point was that it's not worth merging them since 1 will be going bye bye.

from gatk.

vdauwera avatar vdauwera commented on August 23, 2024

But what will users do who run GATK on their own machines? Will there be a way for them to set up their own likelihood stores? If not, they'll need CombineGVCFs.

from gatk.

eitanbanks avatar eitanbanks commented on August 23, 2024

We aren't building Prometheus for people to run on their own machines. If they want to run large projects in the future then they should theoretically do so through the Broad (and not necessarily for $; we are talking about making it free in many cases). This is the "analysis as a service" that Matter was talking about.
Obviously there are many, many details to work out. But rest assured knowing that you will get to help work out those details. :)

from gatk.

vdauwera avatar vdauwera commented on August 23, 2024

We're also not building Prometheus for people who use GATK on bacteria, or plants, or weird cave-dwelling fish, so I expect that the analysis service will not be available to them. Should they turn to another platform? I think it would be a big mistake (re: science, re: public perception and re: competition with other software) to make the new GATK effectively not runnable outside the Broad infrastructure. I understand the need to focus on primary objectives, but there is a real risk of shooting ourselves in the foot if the vision is too narrow.

So yes, I look forward to helping work out these details :)

from gatk.

eitanbanks avatar eitanbanks commented on August 23, 2024

Great points all around. This is the type of discussion to bring to @mmtrun (e.g. at a future town hall meeting) for him to digest and consider.

from gatk.

akiezun avatar akiezun commented on August 23, 2024

@eitanbanks here it is

from gatk.

akiezun avatar akiezun commented on August 23, 2024

The requirement is to port the CombineGVCFs tool and the tests (test data needs to be reviewed for sharability and made public if possible)

from gatk.

cmnbroad avatar cmnbroad commented on August 23, 2024

Feel free to bounce this back if you want to an we can talk about how to proceed.

from gatk.

akiezun avatar akiezun commented on August 23, 2024

this was mismarked as beta. It's an alpha goal

from gatk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.