Coder Social home page Coder Social logo

theislab / moslin Goto Github PK

View Code? Open in Web Editor NEW
9.0 1.0 1.0 97.66 MB

Code, data and analysis for moslin.

Home Page: https://moscot-tools.org

License: BSD 3-Clause "New" or "Revised" License

Python 0.23% R 0.07% Jupyter Notebook 99.69% Shell 0.01%
cell genomics lineage optimal reproducible-research single tracing transport

moslin's Introduction

Mapping lineage-traced cells across time points with moslin

moslin is an algorithm to map lineage-traced single cells across time points. Our algorithm combines gene expression with lineage information at all time points to reconstruct precise differentiation trajectories in complex biological systems. See the tutorial or read the preprint learn more.

moslin is an algorithm to map lineage-traced single cells across time points. Our algorithm combines gene expression with lineage information at all time points to reconstruct precise differentiation trajectories in complex biological systems. See the tutorial or read the preprint learn more.

moslin's key applications

  • Probabilistically map cells across time points in lineage-traced single-cell RNA-sequencing (scRNA-seq) studies.
  • Infer ancestors and descendants of rare or transient cell types or states.
  • Combine with CellRank to compute putative driver genes, expression trends, activation cascades, and much more.

Manuscript

Our manuscript is available as a preprint on bioRxiv.

The moslin algorithm

High-diversity lineage relationships can be recorded using evolving barcoding systems (Wagner and Klein, Nat Rev Genet 2020); when applied in-vivo, these record independent lineage relationships in each individual. To infer the molecular identity of putative ancestor states, samples need to be related from early to late time points.

Mapping independent clonal evolution: evolving lineage recording systems, based on, e.g., Cas-9 induced genetic scars (Alemany et al., Nature 2018, Raj et al., Nature Biotech 2018, Spanjaard et al., Nature Biotech 2018), record independent clonal evolution in each individual.

Mapping independent clonal evolution: evolving lineage recording systems, based on, e.g., Cas-9 induced genetic scars (Alemany et al., Nature 2018, Raj et al., Nature Biotech 2018, Spanjaard et al., Nature Biotech 2018), record independent clonal evolution in each individual.

In our setting, each individual corresponds to a different time point, and we wish to relate cells across time to infer precise differentiation trajectories ( Forrow and Schiebinger, Nature Comms 2021). While gene expression is directly comparable across time points, lineage information is not: individual lineage trees may be reconstructed at each time point (Alemany et al., Nature 2018, Raj et al., Nature Biotech 2018, Spanjaard et al., Nature Biotech 2018, Jones et al., Genome Biology 2020), but these do not uncover the molecular identity of putative ancestors or descendants.

The moslin algorithm: the grey outline represents a simplified state manifold, dots and triangles illustrate early and late cells, respectively, and colors indicate cell states.

The moslin algorithm: the grey outline represents a simplified state manifold, dots and triangles illustrate early and late cells, respectively, and colors indicate cell states.

Critically, moslin uses two sources of information to map cells across time in an optimal transport (OT) formulation (Peyré and Cuturi, arXiv 2019):

  • gene expression: directly comparable across time points, included in a Wasserstein (W)-term (Schiebinger et al., Cell 2019). The W-term compares individual early and late cells and seeks to minimize the distance cells travel in phenotypic space.
  • lineage information: not directly comparable across time points, included in a Gromov-Wasserstein (GW)-term (Nitzan et al., Nature 2019, Peyré et al., PMLR 2016). The GW-term compares pairwise early and late cells and seeks to maximize lineage concordance.

We combine both sources of information in a Fused Gromov-Wasserstein (FGW) problem (Vayer et al., Algorithms 2020), a type of OT-problem. Additionally, we use entropic regularization (Cuturi 2013) to speed up computations and to improve the statistical properties of the solution (Peyré and Cuturi, arXiv 2019).

Code, tutorials and data

Under the hood, moslin is based on moscot to solve the optimal transport problem of mapping lineage-traced cells across time points. Specifically, we implement moslin via the LineageClass , we demonstrate a use case in our tutorial and we showcase how to work with tree distances in an example. Downstream analysis, like visualizing the inferred cell-cell transitions, is available via moscot's API.

Raw published data is available from the Gene Expression Omnibus (GEO) under accession codes:

Additionally, we simulated data using LineageOT and TedSim. Processed data is available on figshare. To ease reproducibility, our data examples can also be accessed through moscot's dataset interface.

Reproducibility

To ease reproducibility of our preprint results, we've organized this repository along the categories below. Each folder contains notebooks and scripts necessary to reproduce the corresponding analysis. We read data from data and write figures to figures. Please open an issue should you experience difficulties reproducing any result.

Results

Application Folder path
Simulated data (Fig. 2) analysis/simulations/
C elegans embryogenesis (Fig. 3) analysis/packer_c_elegans/
Zebrafish heart regeneration (Fig. 4) analysis/hu_zebrafish_linnaeus/

The concept figures in this README have been created with BioRender.

moslin's People

Contributors

bastiaanspanjaard avatar bspanjaard avatar marius1311 avatar michalk8 avatar zoepiran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

manuelgander

moslin's Issues

Update the README

Point to the moslin code, the moslin tutorial, and our figshare collection with the data files. Give an overview of the different analyses in a table, and link to the corresponding folders.

Improve zebrafish analysis documentation

Various small updates that should lead to better pipeline readability:

  • Improved readme to explain the full zebrafish analysis from downloadable (expression + lineage tree) data to paper analysis and results. Guides the reader first through Zoe's scripts that use moslin to calculate transition matrix and then through Bastiaan's script that use the transition matrices for downstream analysis.
  • Small code fix to the zebrafish analysis script: boolean that can be set to save datasets and figures.
  • Better commenting in the zebrafish analysis script, essentially echoing the first bullet: which steps come before and which datasets are expected to be present.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.