Coder Social home page Coder Social logo

xinhe-lab / uk_biobank_gwas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nealelab/uk_biobank_gwas

0.0 5.0 0.0 109 KB

Overview of the data QC, code, and GWAS summary output from the 2017 UK Biobank data release

Python 91.22% Shell 0.35% R 8.42%

uk_biobank_gwas's Introduction

Table of Contents

Updates

With the re-release of UK Biobank genotype imputation (which we term imputed-v3), we have generated an updated set of GWAS summary statistics for the genetics community.

  • Increased the number of phenotypes with application UKB31063 and addtl. custom curated phenotypes (see imputed-v3 Phenotypes)
  • More liberal inclusion of samples (see imputed-v3 Sample QC)
  • Inclusion of more SNPs (see imputed-v3 Variant QC)
  • Updates to our association model (imputed-v3 Association model) Our largest change is that for all phenotypes, we have run a female-only and male-only GWAS along with the full set.

Information and scripts from the previous round of GWAS are available in the imputed-v2-gwas subdirectory

imputed-v3 Phenotypes

  • 2455 auto-curated phenotypes using PHESANT:

  • 806 ICD10 codes

  • 694 curated phenotypes in collaboration with the FinnGen consortium

  • Summary files:

    • phenotypes.both_sexes.tsv.gz
    • phenotypes.female.tsv.gz
    • phenotypes.male.tsv.gz
    • phenotype - phenotype ID
    • description - short description of phenotype
    • source - PHESANT auto-curation, ICD10, or FinnGen
    • n_controls - number of QC positive samples responding negatively to phenotype designation (NA if quantitative)
    • n_cases - number of QC positive samples responding affirmatively to phenotype designation (NA if quantitative)
    • n_missing - number of missing QC positive samples
    • n_non_missing - number of non-missing QC positive samples

imputed-v3 Sample QC

  • imputed-v3 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • Use provided PCs for European sample selection to determine British ancestry
      • Use 7 standard deviations away from the 1st 6 PCs
      • Further Filter to self-reported 'white-British' / 'Irish' / 'White'
    • QCed sample count: 361194 samples
  • imputed-v2 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • White.british.ancestry filter
    • QCed sample count: 337199 samples

imputed-v3 Variant QC

  • imputed-v3 parameters
    • Autosomes and X chromosome (but not pseudo-autosomal region or XY)
    • SNPs from HRC, UK10K, and 1KG imputation (~90 million)
    • INFO score > 0.8
    • MAF > 0.0001
      • Exception: VEP annotated Missense and PTV MAF > 1e-6
    • HWE p-value > 1e-10
    • QCed SNP count: 13.7 million
  • imputed-v2 parameters
    • Autosomes only
    • SNPs from HRC imputation (~40 million)
    • INFO score > 0.8
    • MAF > 0.0001
    • QCed SNP count: 10.9 million

imputed-v3 Association model

  • imputed-v3 model
    • Linear regression model in Hail (linreg)
    • Three GWAS per phenotype
      • Both sexes
      • Female only
      • Male only
    • Covariates: 1st 20 PCs + sex + age + age^2 + sexage + sexage2
    • Sex-specific covariates: 1st 20 PCs + age + age^2
    • Extra column for variant confidence in case/control phenotypes
      • column name: expected_case_minor_AC
      • Used to filter out false-positive SNPs when case count is low
      • Blog details here
  • imputed-v2 model
    • Linear regression model in Hail (linreg)
    • Covariates: 1st 10 PCs + sex

uk_biobank_gwas's People

Contributors

astheeggeggs avatar howrigan avatar liameabbott avatar rkwalters avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.