Coder Social home page Coder Social logo

pristineliving / uk_biobank_gwas Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nealelab/uk_biobank_gwas

0.0 0.0 0.0 191 KB

Overview of the data QC, code, and GWAS summary output from the 2017 UK Biobank data release

Python 89.12% R 10.66% Shell 0.22%

uk_biobank_gwas's Introduction

Table of Contents

Updates

With the re-release of UK Biobank genotype imputation (which we term imputed-v3), we have generated an updated set of GWAS summary statistics for the genetics community.

  • Increased the number of phenotypes with application UKB31063 and addtl. custom curated phenotypes (see imputed-v3 Phenotypes)
  • More liberal inclusion of samples (see imputed-v3 Sample QC)
  • Inclusion of more SNPs (see imputed-v3 Variant QC)
  • Updates to our association model (imputed-v3 Association model) Our largest change is that for all phenotypes, we have run a female-only and male-only GWAS along with the full set.

Information and scripts from the previous round of GWAS are available in the imputed-v2-gwas subdirectory

Finally, the 0.1 and 0.2 script repositories refer to the version of Hail used to run the GWAS

Change log

Updates to the Rapid GWAS summary statistics or download Manifest will be recorded here:

  • Oct 17th, 2019

    • 89 summary stat files affected by mis-applied low confidence filter have been updated and uploaded to the public release (File Manifest Release 20180731)
  • Oct 9, 2019

    • Summary statistics identified where low confidence filter was mis-applied
    • Issue details here
    • List of files affected (111 files): (GWAS_list_low_confidence_filter_update.txt.gz)[https://github.com/Nealelab/UK_Biobank_GWAS/blob/master/GWAS_list_low_confidence_filter_update.txt.gz]
    • Of these 111 files, 89 require updating, as 22 files are unchanged with the application of updated filter
    • File column description:
      • phenotype = phenotype number
      • description = UK Biobank description of phenotype
      • min_category = smallest category defined by PHESANT
      • max_category = largest category defined by PHESANT
      • category_distribution = sample counts across categories split by '|'
      • additive_tsvs_list_name = GWAS summary statistic filename
      • n_missing = number of samples without phenotype information
      • tsv_requires_update = TRUE/FALSE does the file require updating of low confidence filter? (phenotypes where min_category < 12500 requires updating)
    • R script to update summary statistic files (Rapid_GWAS_low_confidence_filter_update.R)[https://github.com/Nealelab/UK_Biobank_GWAS/blob/master/Rapid_GWAS_low_confidence_filter_update.R]
      • Requires data.table 1.12.2 R package
      • Requires GWAS_list_low_confidence_filter_update.txt.gz file (or subsetted version with only files you want to update)
  • Sept 16, 2019

    • GWAS summary statistics of Biomarkers now available
    • 34 biomarker meaurements tested
    • Blog details here

imputed-v3 Phenotypes

  • Auto-curated phenotypes using PHESANT:

  • ICD10 codes (all non-coded individuals treated as controls)

  • Curated phenotypes in collaboration with the FinnGen consortium

  • Phenotypes in both sexes

    • PHESANT: 2891 total (274 continuous / 271 ordinal / 2346 binary)
    • ICD10: 633 binary
    • FinnGen curated: 559
  • Phenotypes in females

    • PHESANT: 2393 total (259 continuous / 257 ordinal / 1877 binary)
    • ICD10: 482 binary
    • FinnGen curated: 412
  • Phenotypes in males

    • PHESANT: 2305 total (262 continuous / 259 ordinal / 1784 binary)
    • ICD10: 439 binary
    • FinnGen curated: 400
  • Unique PHESANT phenotypes: 3011, of which 274 are continuous

  • 4203 total unique phenotypes: 3011 PHESANT + 559 finngen + 633 ICD10

  • Summary files:

    • phenotypes.both_sexes.tsv.gz
    • phenotypes.female.tsv.gz
    • phenotypes.male.tsv.gz
    • phenotype - phenotype ID
    • description - short description of phenotype
    • source - PHESANT auto-curation, ICD10, or FinnGen
    • n_controls - number of QC positive samples responding negatively to phenotype designation (NA if quantitative)
    • n_cases - number of QC positive samples responding affirmatively to phenotype designation (NA if quantitative)
    • n_missing - number of missing QC positive samples
    • n_non_missing - number of non-missing QC positive samples

imputed-v3 Sample QC

  • imputed-v3 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • Use provided PCs for European sample selection to determine British ancestry
      • Use 7 standard deviations away from the 1st 6 PCs
      • Further Filter to self-reported 'white-British' / 'Irish' / 'White'
    • QCed sample count: 361194 samples
  • imputed-v2 parameters
    • Used.in.pca.calculation filter (unrelated samples)
    • sex chromosome aneuploidy filter
    • White.british.ancestry filter
    • QCed sample count: 337199 samples

imputed-v3 Variant QC

  • imputed-v3 parameters
    • Autosomes and X chromosome (including pseudo-autosomal region or XY)
    • SNPs from HRC, UK10K, and 1KG imputation (~90 million)
    • INFO score > 0.8
    • MAF > 0.001
      • Exception: VEP annotated coding (PTV/Missense/Synonymous) MAF > 1e-6
    • HWE p-value > 1e-10
    • QCed SNP count: 13.7 million
  • imputed-v2 parameters
    • Autosomes only
    • SNPs from HRC imputation (~40 million)
    • INFO score > 0.8
    • MAF > 0.001
    • HWE p-value > 1e-10
    • QCed SNP count: 10.9 million

imputed-v3 Association model

  • imputed-v3 model
    • Linear regression model in Hail (linreg)
    • Three GWAS per phenotype
      • Both sexes
      • Female only
      • Male only
    • Covariates: 1st 20 PCs + sex + age + age^2 + sex*age + sex*age2
    • Sex-specific covariates: 1st 20 PCs + age + age^2
    • Extra column for variant confidence in case/control phenotypes
      • column name: expected_case_minor_AC
      • Used to filter out false-positive SNPs when case count is low
      • Blog details here
  • imputed-v2 model
    • Linear regression model in Hail (linreg)
    • Covariates: 1st 10 PCs + sex

uk_biobank_gwas's People

Contributors

howrigan avatar liameabbott avatar rkwalters avatar astheeggeggs avatar lfrancioli avatar hammer avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.