Coder Social home page Coder Social logo

gnomad_chets's Introduction

gnomad_chets

Phase of compound heterozygotes in gnomAD

This repository serves as a home for the pipeline used to infer the phase of rare variants in the gnomAD v2 exomes, as reported in our corresponding manuscript (see https://www.nature.com/articles/s41588-023-01608-3), and is coded in Hail 0.2.

The main components of the pipeline can be found in “phasing.py”. Briefly, to infer variant phase, we generate haplotype frequency estimates from genotype counts by applying the expectation-maximization (EM) algorithm (see “get_em_expressions” function) and calculate the probability of two variants being in trans (compound heterozygous, “p_chet”), using the haplotype frequency estimates in a simple equation (“get_em_expr” function).

The remaining scripts in the repository serve to compute the phase of rare variant pairs specifically in the gnomAD and Center for Mendelian Genetics rare disease datasets, and to generate the gnomAD variant co-occurrence look-up tool (see https://gnomad.broadinstitute.org/variant-cooccurrence) and variant co-occurrence counts by gene resource (see https://gnomad.broadinstitute.org/news/2023-03-variant-co-occurrence-counts-by-gene-in-gnomad/). These scripts cannot be run outside of the gnomAD team, as they require access to the individual level data, and are provided for reference only.

DOI

gnomad_chets's People

Contributors

konradjk avatar lfrancioli avatar slstenton avatar gtiao avatar jkgoodrich avatar

Stargazers

 avatar Najeeb Ashraf Syed avatar Vahid Elyasigomari avatar  avatar

Watchers

James Cloos avatar  avatar  avatar William Phu avatar Joshua Nadeau avatar Katherine Chao avatar  avatar Frederik Heymann avatar  avatar  avatar

gnomad_chets's Issues

Perform additional analyses to enhance v1 of variant co-occurrence feature

Posted in Slack yesterday by Michael:

  1. Benchmarking our phasing accuracy against phasing using programs like Eagle. Perhaps sufficient to just do one chromosome to reduce computational burden.
  2. Should we add in gnomAD v2 WGS data to our browser/dataset? Essentially consider only coding regions of the WGS (i.e., treat the WGS as exomes) to help boost sample size.
  3. Take a deeper dive into the CMG samples that were incorrectly phased and try to better understand why
  4. Test our accuracy against multinucleotide variants as a truth set. This would be particularly interesting since MNVs should be sufficiently close to each other such that recombination doesn’t play a role
  5. Better understand how we perform in the scenario of singletons, particularly those that are phased as compound hets.
  6. Compile list of genes with compound heterozygous pLOFs. Can we generate new insights into which genes can’t tolerate bi-allelic LOFs?

With additional request by Heidi:
7. Consider including variants deeper into the flanking intron

We'll need to define what we want to have done for v4.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.