Coder Social home page Coder Social logo

dotscore's Introduction

dotscore

dotscore is a python module created to enable easy computation of DoT-scores (Direction of Transition scores) as presented in Kucinski et al. 2020 (link). DoT-score is method aiding interpretation of transcriptional changes (e.g. lists of differentially expressed genes) using scRNA-Seq landscapes as a reference. The module is built on top of the scanpy module (https://scanpy.readthedocs.io/en/stable/)

DoT-score concept

In a scRNA-Seq landscape gene expression values can be scaled to a chosen point of origin. This creates a set of vectors connecting each cell to the point of origin - cell state vectors in the multidimensional gene expression space. Now consider a treatment (gene knockout, chemical perturbation etc.) to a given cell population followed by a transcriptomic readout. The changes in gene expression (e.g. log2(Fold Change) from differential expression) are also a vector - the positive and negative values indicating shifts in the many dimensions of gene expression space.

DoT-score computes how well the two vectors (cell state change and treatment change) are aligned and enable visualisation on the scRNA-Seq landscape. A positive DoT-score for a cell on the landscape indicates that the treatment causes cells to shift towards that state (starting from the point of origin), while a negative DoT-score indicates a shift away from that state. We provide a diagram below to illustrate this concept and a concrete example here: examples/dotscore_example.ipynb

images/DoT_diagram.png

More specifically, DoT score (s) calculates the dot product (proportional to the angle) between the treatment vector and the vector of gene expression (scaled) for each cell on the landscape: s = X v, where X is a matrix cells x genes with scaled expression values and v is a vector of weights (e.g. log2(Fold Change) for each gene). To provide a measure of statistical significance, we simulate the DoT-score by randomly choosing weights and genes and computing a z-score.

Usage

Dotscore uses AnnData objects as inputs. Core functions are listed below:

get_DoTscore - calculating DoT-score for each cell, arguments:
  • adata - an AnnData object

  • de - pandas DataFrame, which contains one column with gene names and one column with weights (e.g. log2(Fold Change) coming from differential expression analysis)

  • allfolds - numpy array or pandas Series, which contain a set of weights (e.g. log2FoldChanges) used for simulations. This is the background distribution of weights to be drawn from.

  • allgenes - numpy array or pandas Series, which contains all gene names expressed in the assayed cell population, for instance if using log2FoldChange these are all genes used for differential expression analysis.

  • simno - number of simulations to run for z-score estimation (if zscore is True)

  • id_col - name of the column in de, which contains the gene names (default: "target")

  • weight_col - name of the column in de, which contains the weights (default: "log2FoldChange")

    Returns: Pandas Series with the respective DoT-scores per cell.

custom_scale - equivalent to the scaling function from the scanpy package, but allows scaling using a vector of chosen mean expression values - thus allowing choice of point of origin.
  • adata - an AnnData object

  • mean - numpy array with mean gene expression used for scaling. Needs to match the genes in the AnnData object. If None, the global mean will be used.

    Caution: The AnnData object is modified in place.

get_genescore_pergroup - Computes the contributions coming from each gene, which when summed up generate the DoT-score. To help interpretation contributions are averages per group of cell specified (group argument). This tool helps identifying the genes with the strongest influence on the DoT-score (positive or negative) in chosen areas of the scRNA-Seq landscape.
  • adata - an AnnData object

  • de - pandas DataFrame, which contains one column with gene names and one column with weights (e.g. log2(Fold Change) coming from differential expression analysis)

  • id_col - name of the column in de, which contains the gene names (default: "target")

  • weight_col - name of the column in de, which contains the weights (default: "log2FoldChange")

  • group - name of the column in the .obs slot of the AnnData object which contains cell groups (needs to be categorical), default: 'leiden')

  • sortby - name of the cell group by which the values will be sorted, default: '0'

  • gene_symbols - Optional: name of the column in the .var slot, which contains gene symbol annotations.

    Returns: pandas DataFrame with each cell group as a column and genes as rows. Values correspond to the average contributions coming from each gene to the DoT-score to the respective cell group.

Some convenience functions:
  • cmap_RdBu - creates an asymmetric red/blue color scale for provided values (i.e. white value is fixed at 0)
  • qfilt - Returns quantile-filtered values. Changes all values above a certain quantile to the value equal to that quantile. Useful for handling outlier in noisy scRNA-Seq data.

Installation

Python > 3.4 and pip are required. To install the package:

  1. Clone the repository:
git clone https://github.com/Iwo-K/dotscore
  1. Install the dependencies
pip install -r ./dotscore/requirements.txt
  1. Install the package
pip install -e ./dotscore/

dotscore's People

Contributors

iwo-k avatar

Stargazers

Ward D avatar Jiansen Lu avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.