Coder Social home page Coder Social logo

harissa-framework / harissa Goto Github PK

View Code? Open in Web Editor NEW
1.0 0.0 0.0 8.18 MB

Simulation and inference of gene regulatory networks based on transcriptional bursting

Home Page: https://harissa-framework.github.io/harissa/

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%
mechanistic-modeling gene-regulatory-networks single-cell-rna-seq

harissa's Introduction

Harissa 🌶

GitHub Release PyPI - Version Conda - Version CI Status Coveralls GitHub Pages status

This is a Python package for both simulation and inference of gene regulatory networks from single-cell data. Its name comes from ‘HARtree approximation for Inference along with a Stochastic Simulation Algorithm.’ It was implemented in the context of a mechanistic approach to gene regulatory network inference from single-cell data, based upon an underlying stochastic dynamical model driven by the transcriptional bursting phenomenon.

Main functionalities:

  1. Network inference interpreted as calibration of a dynamical model;
  2. Data simulation (typically scRNA-seq) from the same dynamical model.

Other available tools:

  • Basic GRN visualization (directed graphs with positive or negative edge weights);
  • Binarization of scRNA-seq data (using gene-specific thresholds derived from the calibrated dynamical model).

The current version of Harissa has benefited from improvements introduced within Cardamom, which can be seen as an alternative method for the inference part. The two inference methods remain complementary at this stage and may be merged into the same package in the future. They were both evaluated in a recent benchmark.

Installation

Harissa can be installed using pip:

$ pip install harissa

This command will also check for all required dependencies (see below) and install them if necessary. If the installation is successful, all scripts in the tests folder should run smoothly (note that :code:network4.py must be run before :code:test_binarize.py).

Basic usage

from harissa import NetworkModel
model = NetworkModel()

# Inference
model.fit(data)

# Simulation
sim = model.simulate(time)

Here data should be a two-dimensional array of single-cell gene expression counts, where each row represents a cell and each column represents a gene, except for the first column, which contains experimental time points. A toy example is:

import numpy as np
from harissa import Dataset

# List of time points
time_points = np.array([0.0, 0.0, 1.0, 1.0, 1.0])

# Matrix of mRNA counts
count_matrix = np.array([
    #s g1 g2 g3
    [0, 4, 1, 0], # Cell 1
    [0, 5, 0, 1], # Cell 2
    [1, 1, 2, 4], # Cell 3
    [1, 2, 0, 8], # Cell 4
    [1, 0, 0, 3], # Cell 5
], dtype=np.uint)

data = Dataset(time_points, count_matrix)

The time argument for simulations is either a single time or a list of time points. For example, a single-cell trajectory (not available from scRNA-seq) from t = 0h to t = 10h can be simulated using:

time = np.linspace(0, 10, 1000)

The sim output stores mRNA and protein levels as a Simulation.Result object, with attributes sim.time_points, sim.rna_levels and sim.protein_levels (each row is a time point and each column is a gene).

About the data

The inference algorithm specifically exploits time-course data, where single-cell profiling is performed at a number of time points after a stimulus (see this paper for an example with real data). Each group of cells collected at the same experimental time t k forms a snapshot of the biological heterogeneity at time tk. Due to the destructive nature of the measurement process, successive snapshots are made of different cells. Such data is therefore different from so-called ‘pseudotime’ trajectories, which attempt to reorder cells according to some smoothness hypotheses.

Tutorial

Please see the notebooks for introductory examples, or the tests folder for basic usage scripts. To get an idea of the main features, you can start by running the notebooks in order:

  • Notebook 1: simulate a basic repressilator network with 3 genes;
  • Notebook 2: perform network inference from a small dataset with 4 genes;
  • Notebook 3: compare two branching pathways with 4 genes from both ‘single-cell’ and ‘bulk’ viewpoints.

Dependencies

The package depends on standard scientific libraries numpy and scipy. Optionally, it can load numba for accelerating the inference procedure (used by default) and the simulation procedure (not used by default). It also depends optionally on matplotlib and networkx for network visualization.

Citation

If you use Harissa in your work, please cite this paper (also available on arXiv).

harissa's People

Contributors

nseyler1 avatar ulysseherbach avatar

Stargazers

 avatar

harissa's Issues

Simulation with time-dependent stimulus

  • update simulation methods (run) to optionally use a time-dependent stimulus (array of same size as time_points)
  • if not provided, create internally the stimulus array of same size as time_points with constant values (= initial_state) which will keep the current behaviour

Finalize inference run interface

Split "data" into to arguments:

  • time_points : array of float64 (T,) = list of time points
  • count_table : array of int64 (T,G) = expression count table (including stimulus = gene 0)

Add shortcut attributes to `NetworkParameter`

Edit: In fact backward compatibility was not a good idea: better to keep the new implementation as simple as possible

The new idea is simply to add shortcuts to the NetworkParameter attributes (so not in the NetworkModel class)

  • If param = NetworkParameter(n):
    • param.d[0] = param.degradation_rna
    • param.d[1] = param.degradation_protein
    • param.s[0] = param.creation_rna
    • param.s[1] = param.creation_protein
    • param.k[0] = param.burst_frequency_min
    • param.k[1] = param.burst_frequency_max
    • param.b = param.burst_size_inv
    • param.beta = param.basal
    • param.theta = param.interaction

Note: these correspond to the mathematical notation used in related publications (e.g., Ventre2023 and Herbach2023)

CLI for template generation

Inference and Simulation templates, for example :

harissa template simulation "path/my_simulation"

→ will generate a file my_simulation.py containing the class MySimulation(Simulation)

This class should be working "out of the box" and simply generate trivial simulation (constant = initial_state)

Set up global tests

Global tests located in the tests folder at the root of harissa (so included in the distribution)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.