Coder Social home page Coder Social logo

elviscuihan / scgtm Goto Github PK

View Code? Open in Web Editor NEW
10.0 2.0 3.0 686 KB

Single-cell generalized trendmodel (scGTM): a flexible and interpretable model for gene expression along cell pseudotime. This is a Python package for modeling the statistical relationship between pseudotime and gene expression data.

License: MIT License

Python 100.00%
bioinformatics optimization python

scgtm's Introduction

scGTM Logo

License: MIT Code style: black

scGTM: Single-cell generalized trend model

scGTM (orignally named as scKGAM) is the abbreviation for Single-cell Gene Expression Generalized Trend Model. This is a Python package for modeling the statistical relationship between pseudotime and gene expression data. The paper is published in Bioinformatics and is also available at bioRXiv.

It is intended for bioinformatic scientists, applied statisticians, and students who prefer using Metaheuristic algorithms in solving their own bioinformatic optimization problems. scGKM is able to provide various marginal gene distributions with interpretable regression functions. Check out more features!

  • Free software: MIT license
  • Python versions: 3.6 and above

Installation

To install the bleeding-edge version of scGTM, clone this repo:

$ git clone -b [email protected]:ElvisCuiHan/scGTM.git

and then run

$ cd scGTM
$ python run_scGTM.py --model.iter 100 --model.marginal 'ZIP' --model.save_dir "your/path/to/save" --data.dir "your/path/file.csv" --gene.start 3 --gene.end 4

Usage

scGTM provides a high-level implementation of various marginal distributions including Poisson, negative binomial (NB), zero-inflated Poisson (ZIP) and zero-inflatd negative binomial (ZINB). Further, it utilizes particle swarm optimization algorithm in the package pyswarms to optimize the objective function. Thus, it aims to be user-friendly and customizable.

The data should be a cell-by-gene matrix where the first column corresponding to the pseudotime:

Index Pseudotime Gene1 Gene2 ...
1. t1 y11 y12 ...
2. t2 y21 y22 ...
3. t3 y31 y32 ...
4. t4 y41 y42 ...

A typical data structure will be of the following form:

All-in-one function

Suppose we want to regress Gene 1 on pseudotime using the scGTM, simply we run the run_scGTM file in shell:

python run_scGTM.py --model.iter {# of iterations} --model.marginal 'ZIP' --model.save_dir "your/path/to/save" --data.dir "your/path/file.csv" --gene.start {START INDEX} --gene.end {END INDEX} 

and we can replace run_scGTM.py with either run_scGTM_Hill_Only.py or run_scGTM_Valley_Only.py if we are only interested in one of the two trends.

Using the data in our demo folder, the command is:

python run_scGTM_Valley_Only.py --model.iter 120 --model.marginal 'ZIP' --model.save_dir "Demo/Results/" --data.dir "Demo/simu_nb_scGTM_input.csv" --gene.start 1 --gene.end 60
  • gene_index: The index of gene that we want to model.
  • model.marginal: The marginal distribution of the gene expression, should be one of ["NB", "ZINB", "Poisson", "ZIP"].
  • model.iter: Number of iterations run by PSO, usually 150 suffices.
  • model.save_dir: The directory to save our results.
  • data.dir: The path to our data file.
  • gene.start: Index of the first gene to fit.
  • gene.end: Index of the last gene to fit.

In the scGTM.py file (and the other two), we can modify the arguments to let the model outputs user-defined colors.

  • plot_args: A dictionary with keys color and cmap. color is a 4x1 vector and cmap is a string. For example:
plot_args={
             'color': ['red', 'tomato', 'orange', 'violet'],
             'cmap': 'Blues',
         }

If one wants to estimate many genes with different marginals, we can first change the data directory in the function parallel and then use the command in terminal:

python run_scGTM_Hill_Only.py  --gene.start {START INDEX} --gene.end {END INDEX} --model.marginal "NB" --model.save_dir "YourTargetPath" --model.iter 150

Note the data should be in .csv format. The main function will return a .json file and .png figure.

Example

The following figure has shown a typical output by the main function in scGTM.py.

  • Red line: fitted log mean expression (log(tau_c) in the paper).
  • Blue line: Red line minus -log(1-p_c) so that the zero-inflation part is removed from expectation.
  • Orange vertical line: Estimated t0, i.e., the turning point of the model.
  • Purple line: fitted zero-inflation parameter, for details, see paper.
  • Scatters/Points: observed log expression value (log(y+1)).

The confidence intervals of {t0, k1, k2, mu} are saved in a .json file in the same directory.

scgtm's People

Contributors

elviscuihan avatar songdongyuan1994 avatar

Stargazers

 avatar WPZ avatar  avatar Michael Vinyard avatar Peter Clarke avatar  avatar Marius Lange avatar BigNur avatar Kane avatar  avatar

Watchers

Kane avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.