Coder Social home page Coder Social logo

ens-lgil / pgsc_calc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pgscatalog/pgsc_calc

0.0 1.0 0.0 5.48 MB

The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation

Home Page: https://pgsc-calc.readthedocs.io/en/latest/

License: Apache License 2.0

Shell 1.34% Python 6.87% Groovy 35.80% Makefile 0.84% HTML 2.04% Dockerfile 0.40% Nextflow 52.72%

pgsc_calc's Introduction

The Polygenic Score Catalog Calculator (pgsc_calc)

Documentation Status pgscatalog/pgsc_calc CI

Nextflow run with conda run with docker run with singularity

Introduction

pgsc_calc is a bioinformatics best-practice analysis pipeline for calculating polygenic [risk] scores on samples with imputed genotypes using existing scoring files from the Polygenic Score (PGS) Catalog and/or user-defined PGS/PRS.

Pipeline summary

  1. Downloading scoring files using the PGS Catalog API in a specified genome build (GRCh37 and GRCh38).
  2. Reading custom scoring files (and performing a liftover if genotyping data is in a different build).
  3. Automatically combines and creates scoring files for efficient parallel computation of multiple PGS
    • Matching variants in the scoring files against variants in the target dataset (in plink bfile/pfile or VCF format)
  4. Calculates PGS for all samples (linear sum of weights and dosages)
  5. Creates a summary report to visualize score distributions and pipeline metadata (variant matching QC)

Features in development

  • Genetic Ancestry: calculate similarity of target samples to populations in a reference dataset (e.g. 1000 Genomes (1000G), Human Genome Diversity Project (HGDP)) using principal components analysis (PCA).
  • PGS Normalization: Using reference population data and/or PCA projections to report individual-level PGS predictions (e.g. percentiles, z-scores) that account for genetic ancestry.

Quick start

  1. Install Nextflow (>=21.04.0)

  2. Install Docker or Singularity (v3.8.3 minimum) (please only use Conda as a last resort)

  3. Download the pipeline and test it on a minimal dataset with a single command:

    nextflow run pgscatalog/pgsc_calc -profile test,<docker/singularity/conda>
  4. Start running your own analysis!

    nextflow run pgscatalog/pgsc_calc -profile <docker/singularity/conda> --input samplesheet.csv --pgs_id PGS001229

See getting started for more details.

Documentation

Full documentation is available on Read the Docs

Credits

pgscatalog/pgsc_calc is developed as part of the PGS Catalog project, a collaboration between the University of Cambridge’s Department of Public Health and Primary Care (Michael Inouye, Samuel Lambert) and the European Bioinformatics Institute (Helen Parkinson, Laura Harris).

The pipeline seeks to provide a standardized workflow for PGS calculation and ancestry inference implemented in nextflow derived from an existing set of tools/scripts developed by Inouye lab (Rodrigo Canovas, Scott Ritchie, Jingqin Wu) and PGS Catalog teams (Samuel Lambert, Laurent Gil).

The adaptation of the codebase, nextflow implementation, and PGS Catalog features are written by Benjamin Wingfield, Samuel Lambert, Laurent Gil with additional input from Aoife McMahon (EBI). Development of new features, testing, and code review is ongoing including Inouye lab members (Rodrigo Canovas, Scott Ritchie) and others. A manuscript describing the tool is in preparation. In the meantime if you use the tool we ask you to cite the repo and the paper describing the PGS Catalog resource:

This pipeline is distrubuted under an Apache License amd uses code and infrastructure developed and maintained by the nf-core community (Ewels et al. Nature Biotech (2020) doi:10.1038/s41587-020-0439-x), reused here under the MIT license.

Additional references of open-source tools and data used in this pipeline are described in CITATIONS.md.

This work has received funding from EMBL-EBI core funds, the Baker Institute, the University of Cambridge, Health Data Research UK (HDRUK), and the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016775 INTERVENE.

pgsc_calc's People

Contributors

nebfield avatar smlmbrt avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.