Coder Social home page Coder Social logo

gyd1990 / scgps Goto Github PK

View Code? Open in Web Editor NEW

This project forked from imb-computational-genomics-lab/scgps

0.0 1.0 0.0 59.99 MB

A framework for clustering (CORE) and estimation of relationship between pairs of clusters (scGPS) from single cell data

R 78.41% CSS 9.64% JavaScript 3.28% C++ 8.67%

scgps's Introduction

scGPS - Single Cell Global fate Potential of Subpopulations

The scGPS package website is available at: https://imb-computational-genomics-lab.github.io/scGPS/index.html

The usage instruction can be found at: https://imb-computational-genomics-lab.github.io/scGPS/articles/vignette.html

scGPS general description

scGPS is a complete single cell RNA analysis framework from decomposing a mixed population into clusters (SCORE) to analysing the relationship between clusters (scGPS). scGPS also performs unsupervised selection of predictive genes defining a subpopulation and/or driving transition between subpopulations.

The package implements two new algorithms SCORE and scGPS.

Key features of the SCORE clustering algorithm

  • Unsupervised (no prior number of clusters), stable (with automated selection of stability and resolution parameters through scanning a range of search windows for each run, together with a boostrapping aggregation approach to determine stable clusters), fast (with Rcpp implementation)
  • SCORE first builds a reference cluster (the highest resolution) and then runs iterative clustering through 40 windows (or more) in the dendrogram
  • Resolution is quantified as the divergence from reference by applying adjusted Rand index
  • Stability is the proportional to the number of executive runs without Rand index change while changing the cluster search space
  • Optimal resolution is the combination of: stable and high resolution
  • Bagging algorithm (bootstrap aggregation) can detect a rare subpopulation, which appears multiple times during different decision tree runs

Key features of the scGPS algorithm

  • Estimates transition scores between any two subpopulations
  • scGPS prediction model is based on Elastic Net procedure, which enables to select predictive genes and train interpretable models to predict each subpopulation
  • Genes identified by scGPS perform better than known gene markers in predicting cell subpopulations
  • Transition scores are percents of target cells classified as the same class to the original subpopulation
  • For cell subtype comparision, transition scores are similarity between two subpopulations
  • The scores are average values from 100 bootstrap runs
  • For comparison, a non-shrinkage procedure with linear discriminant analysis (LDA) is used

scGPS workflow

scGPS takes scRNA expression dataset(s) from one or more unknown sample(s) to find subpopulations and relationship between these subpopulations. The input dataset(s) contains mixed, heterogeous cells. scGPS first uses SCORE (or CORE V2.0) to identify homogenous subpopulations. scGPS contains a number of functions to verify the subpopulations identified by SCORE (e.g. functions to compare with results from PCA, tSNE and the imputation method CIDR). scGPS also has options to find gene markers that distinguish a subpopulation from the remaining cells and performs pathway enrichment analysis to annotate subpopulation. In the second stage, scGPS applies a machine learning procedure to select optimal gene predictors and to build prediction models that can estimate between-subpopulation transition scores, which are the probability of cells from one subpopulation that can likely transition to the other subpopulation.


Figure 1. scGPS workflow. Yellow boxes show inputs, and green boxes show main scGPS analysis.

scgps's People

Contributors

michaelrthompson98 avatar quanaibn avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.