broadinstitute / gatk Goto Github PK
View Code? Open in Web Editor NEWOfficial code repository for GATK versions 4 and up
Home Page: https://software.broadinstitute.org/gatk
License: Other
Official code repository for GATK versions 4 and up
Home Page: https://software.broadinstitute.org/gatk
License: Other
The requirement is to port the GenotypeGVCFs tool and tests. Test data should be made public whenever possible.
Similarly to ReadFilters/Transformers it would be great to have a VariantFilter/VariantTransformer system
ReadFilter system needs to be ported from GATK. It should be available to tools by "request". The specifics are to be figured out as part of addressing this issue.
if we implement them, CalculateGenotypePosteriors may be a good user for VariantTransformers.
We we don't just port as it is in GATK now
DepthOfCoverage
DiagnoseTargets
From @vdauwera
It would probably make sense to write one really good tool for coverage analysis to replace these two. DoC is great at providing per-locus coverage counts, and the main output table is straightforward and easy for downstream scripts to consume. The summary results table is also ok. But the functionality for aggregating results over intervals and refseq gene lists is super confusing; intervals and genelists interact in an counterintuitive way, and the refseq format requirements are a little vague. In contrast, DT is great at providing per-interval results that give you a go/no-go for callability (+ a culprit metric e.g. MAPQ0, DP etc. in case of a no-go call), but the output is terrible (a pseudo-VCF, which users dont like) and it does not give any per-site counts. There are related tools that users find somewhat useful like CallableLoci, CompareCallableLoci, QualifyMissingIntervals, FindCoveredIntervals, CoveredByNSamplesSites etc. which have overlapping functionality with the coverage tools, and whose functionalities could perhaps be folded into a pan-coverage tool.
we need a simple abstract map/reduce tool that would just loop over data and call map, reduce in a sequence. It'll make it easier to migrate walkers that way.
Not all walkers will move to hellbender, only those that are most useful and we want to keep supporting. This issue will collect them - they will come from best practices, help forum, talking to dev team members, common sense.
good user of ReadTransformers
this may be merged with GenotypeGVCFs
We need to understand the data access patterns in the existing engines: Picard/GATK/Foghorn
@lbergelson and @kshakir already started doing it. Can you move the list here?
no filters, no -L, just count reads and make a regression test for it (with a small bam)
Can we 'register' a hellbender tool as a picard tool if we have it in a different package?
the requirement is to port the VariantFiltration tool and the tests. Tests use Broad-only data but the data seems sharable (please review when porting) and should be made public.
Additionally, Picard tool FilterVCF
should be removed in favor of this tool.
ReadTransformer system needs to be available to hellbender tools. The mechanism of how it gets enabled needs to be coordinated with ReadFilters (issue #5 )
BaseRecalibrator is a read walker, should be easy to port to the CLP paradigm
a Bam file is to be traversed in a sliding window fashion - 2 parameters are widow size and skip length. All reads in the window are then made available to the user. HaplotypeCaller will be the main user of this functionality
Implement -L system, enable access to it for tools that request it (define how they 'request it' - maybe by implementing an interface or calling a function or overriding some generic hook - part of this issue is to design it).
This should be as simple as PrintReads + VariantFilters , see issue #7
We need to look into Java 8 java.util.stream for accessing Read and Variant data
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.