Coder Social home page Coder Social logo

bayer-group / cellenium Goto Github PK

View Code? Open in Web Editor NEW
26.0 5.0 5.0 12.74 MB

Cellenium is a FAIR and scalable interactive visual analytics app for scRNA-Seq data (single-cell RNA sequencing).

License: MIT License

Makefile 0.05% HTML 0.03% CSS 0.01% TypeScript 12.80% Python 1.94% Jupyter Notebook 81.97% PLpgSQL 2.14% Dockerfile 0.06% Shell 0.22% JavaScript 0.10% HCL 0.69%
bioinformatics dataviz scrna-seq transcriptomics

cellenium's Introduction

cellenium

Cellenium is a FAIR and scalable interactive visual analytics app for scRNA-Seq data. It allows to:

  • organize and semantically find scRNA studies with ontologized metadata for tissues and diseases
  • explore cell types and other cell annotations in UMAP space
  • find differentially expressed genes based on clusters of annotated cells
  • view the expression of a single gene (or a few selected genes) in the UMAP plot or as grouped violin plots
  • draw coexpression plots for pairs of genes, explore the cell types contained in the plots
  • add new cell annotations based on plot selections, see differentially expressed genes for a selected group of cells
  • find genes which expression is highly correlated to a query gene
  • find marker genes in all imported studies and qualitatively compare gene expression across studies

Link to publication: https://doi.org/10.1093/bioinformatics/btad349

Link to showcase: https://youtu.be/U71qIK-Mqlc

UMAP projection cell type plot of the public study example blood_covid.ipynb

System Overview

Cellenium imports scRNA expression data and cell annotations in H5AD format. We provide jupyter notebooks for downloading some publicly available scRNA studies, normalize the data if necessary, and calculate differentially expressed genes, a UMAP projection and other study data that is needed for Cellenium's features to work.

Cellenium is a web application that accesses a PostgreSQL database via GraphQL API. Some API features, like server-side rendered plots, depend on Python stored procedures. The graphql_api_usage folder contains a couple of example queries to illustrate the API capabilities.

Cellenium architecture

The setup steps below automate the download and creation of appropriate H5AD files, docker image build, database schema setup and data ingestion.

Setting up

Preparation of CellO data files (workaround for deweylab/CellO#29 ):

mkdir scratch/cello_resources
curl https://deweylab.biostat.wisc.edu/cell_type_classification/resources_v2.0.0.tar.gz >scratch/cello_resources/resources_v2.0.0.tar.gz
tar -C scratch/cello_resources -zxf scratch/cello_resources/resources_v2.0.0.tar.gz

Cellenium setup, including execution of study data processing notebooks (initially, this will take a couple of hours to run).

# builds docker images and runs the whole stack
# until you run the "make reset_database" step below, error messages about the missing "postgraphile" user pile up... you can ignore them for now.
docker compose up
conda env create -f data_import/environment.yml
conda activate cellenium_import
# 'test_studydata' should contain data to cover all application features, but is small enough to be imported in a few minutes
make reset_database test_studydata_import

# 'normal_studydata': real life studies (i.e. with full amount of cells and genes)
make normal_studydata_import

# we have one for atac
make atac_studydata_import

# and one for cite
make cite_studydata_import

The GraphQL API explorer is available at http://localhost:5000/postgraphile/graphiql . Postgraphile will listen to changes in the database schema and the updated API is visible immediately.

The cellenium webapp 'production build' static site is hosted in the 'client' container, see http://localhost:6002/ . For development, you run (cd client && yarn && yarn start) to install the webapp's dependencies and have a hot-reloaded webapp.

Before you process and import the huge example study (there are two additional make targets for that), edit the beginning of heart_failure_reichart2022*.ipynb and define the download URL as described in the notebooks.

manually executing the study data preparation jupyter notebooks

The notebooks are run in headless mode by make. To create new notebooks and explore datasets:

(cd data_import && PYTHONPATH=$(pwd) jupyter-lab)

cellenium's People

Contributors

andreassteffen avatar carsten-jahn avatar danplischke avatar dependabot[bot] avatar mahmoudibrahim avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

cellenium's Issues

user prepared study upload

  • S3 as h5ad study import source, in addition to local file system
  • mechanism for granting temp credentials to S3 prefix to users, for upload
  • simple admin UI for maintaining study metadata after initial import, and associating user groups for access rights
  • discovery of uploaded S3 files
  • docker container that can run unattended, imports discovered studies

extend gene IDs, symbols, descriptions with further species

Using the Biomart client, we could add data for drosophila, macaca fascicularis and maybe more. This could be done by invoking the get_gene_mappings for these Tax IDs in addition, so that data for these species is added to omics_gene and omics_base just like the other species.

In case species aren't covered by Biomart and we want an "on the fly" definition of any gene IDs, symbols and gene descriptions from the gene annotation dataframe of an imported h5ad file, the import process needs to be extended. Currently, it would just raise any error message for unknown species or unknown genes.

small support python package for study preparation

Users who are preparing h5ad files for cellenium should use the code in h5ad_preparation.py. There should be a python library built, and being installable via PyPi, for this use case. While the library isn't there, users can copy and update h5ad_preparation.py on their own

multiple data layers (e.g. another matrix with imputed values)

Database, API and client already work with study_layer_id to distinguish multiple layers per study. For scRNA data, multiple layers get imported but only the first one is displayed in the UI.

  • UI selection for layers (at least in Distribution Analysis view)
  • for multiple modalities (h5mu files), the layers are imported from all modalities where they exist
  • decide if multiple modalities can be in the same layer - currently it is done that way and it is convenient, but the database column study_layer.omics_type is out of line with the implementation

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.