Coder Social home page Coder Social logo

omicsml / dance Goto Github PK

View Code? Open in Web Editor NEW
335.0 7.0 31.0 2.12 MB

DANCE: a deep learning library and benchmark platform for single-cell analysis

Home Page: https://pydance.readthedocs.io

License: BSD 2-Clause "Simplified" License

Python 95.55% Shell 0.21% Dockerfile 0.06% Jupyter Notebook 4.17%
bioinformatics dance data-science deep-learning graph-neural-networks machine-learning multimodality python single-cell spatial-transcriptomics

dance's Introduction


PyPI version License Documentation Status Test Examples

Slack Twitter URL

DANCE is a Python toolkit to support deep learning models for analyzing single-cell gene expression at scale. Our goal is to build up a deep learning community and benchmark platform for computational models in single-cell analysis. It includes three modules at present:

  1. Single-modality analysis
  2. Single-cell multimodal omics
  3. Spatially resolved transcriptomics

Useful links

OmicsML Homepage: https://omicsml.ai
DANCE Open Source: https://github.com/OmicsML/dance
DANCE Documentation: https://pydance.readthedocs.io/en/latest/
DANCE Tutorials: https://github.com/OmicsML/dance-tutorials
DANCE Package Paper: https://www.biorxiv.org/content/10.1101/2022.10.19.512741v2
Survey Paper: https://arxiv.org/abs/2210.12385

Join the Community

Slack: https://join.slack.com/t/omicsml/shared_invite/zt-1hxdz7op3-E5K~EwWF1xDvhGZFrB9AbA
Twitter: https://twitter.com/OmicsML
Wechat Group Assistant: 736180290
Email: [email protected]

Contributing

Community-wide contribution is the key for a sustainable development and continual growth of the DANCE package. We deeply appreciate any contribution made to improve the DANCE code base. If you would like to get started, please refer to our brief guidelines about our automated quality controls, as well as setting up the dev environments.

Citation

If you find our work useful in your research, please consider citing our DANCE package or survey paper:

@article{ding2024dance,
  title={DANCE: A deep learning library and benchmark platform for single-cell analysis},
  author={Ding, Jiayuan and Liu, Renming and Wen, Hongzhi and Tang, Wenzhuo and Li, Zhaoheng and Venegas, Julian and Su, Runze and Molho, Dylan and Jin, Wei and Wang, Yixin and others},
  journal={Genome Biology},
  volume={25},
  number={1},
  pages={1--28},
  year={2024},
  publisher={BioMed Central}
}
@article{molho2022deep,
  title={Deep learning in single-cell analysis},
  author={Molho, Dylan and Ding, Jiayuan and Tang, Wenzhuo and Li, Zhaoheng and Wen, Hongzhi and Wang, Yixin and Venegas, Julian and Jin, Wei and Liu, Renming and Su, Runze and others},
  journal={ACM Transactions on Intelligent Systems and Technology},
  year={2022},
  publisher={ACM New York, NY}
}

Usage

Overview

In release 1.0, the main usage of the DANCE is to provide readily available experiment reproduction (see detail information about the reproduced performance below). Users can easily reproduce selected experiments presented in the original papers for the computational single-cell methods implemented in DANCE, which can be found under examples/.

Motivation

Computational methods for single-cell analysis are quickly emerging, and the field is revolutionizing the usage of single-cell data to gain biological insights. A key challenge to continually developing computational single-cell methods that achieve new state-of-the-art performance is reproducing previous benchmarks. More specifically, different studies prepare their datasets and perform evaluation differently, and not to mention the compatibility of different methods, as they could be written in different languages or using incompatible library versions.

DANCE addresses these challenges by providing a unified Python package implementing many popular computational single-cell methods (see Implemented Algorithms), as well as easily reproducible experiments by providing unified tools for

  • Data downloading
  • Data (pre-)processing and transformation (e.g. graph construction)
  • Model training and evaluation

Example: run cell-type annotation benchmark using scDeepSort

  • Step0. Install DANCE (see Installation)
  • Step1. Navigate to the folder containing the corresponding example scrtip. In this case, it is examples/single_modality/cell_type_annotation.
  • Step2. Obtain command line interface (CLI) options for a particular experiment to reproduce at the end of the script. For example, the CLI options for reproducing the Mouse Brain experiment is
    python scdeepsort.py --species mouse --tissue Brain --train_dataset 753 3285 --test_dataset 2695
  • Step3. Wait for the experiment to finish and check results.

Installation

Quick install

The full installation process might be a bit tedious and could involve some debugging when using CUDA enabled packages. Thus, we provide an install.sh script that simplifies the installation process, assuming the user have conda set up on their machines. The installation script creates a conda environment dance and install the DANCE package along with all its dependencies with a apseicifc CUDA version. Currently, two options are accepted: cpu and cu118. For example, to install the DANCE package using CUDA 11.8 in a dance-env conda environment, simply run:

# Clone the repository via SSH
git clone [email protected]:OmicsML/dance.git && cd dance
# Alternatively, use HTTPS if you have not set up SSH
# git clone https://github.com/OmicsML/dance.git  && cd dance

# Run the auto installation script to install DANCE and its dependencies in a conda environment
source install.sh cu118 dance-env

Note: the first argument for cuda version is mandatory, while the second argument for conda environment name is optional (default is dance).

Custom install


Step1. Setup environment

First create a conda environment for dance (optional)

conda create -n dance python=3.11 -y && conda activate dance

Then, install CUDA enabled packages (PyTorch, PyG, DGL):

pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cu118
pip install torch_geometric==2.4.0
pip install dgl==1.1.3 -f https://data/dgl.ai/wheels/cu118/repo.html

Alternatively, install these dependencies for CPU only:

pip install torch==2.1.1 torchvision==0.16.1 --index-url https://download.pytorch.org/whl/cpu
pip install torch_geometric==2.4.0
pip install dgl==1.1.3 -f https://data/dgl.ai/wheels/repo.html

For more information about installation or other CUDA version options, check out the installation pages for the corresponding packages

Step2. Install DANCE

Install from PyPI

pip install pydance

Or, install the latest dev version from source

git clone https://github.com/OmicsML/dance.git && cd dance
pip install -e .

Implemented Algorithms

P1 not covered in the first release

Single Modality Module

1)Imputation

BackBone Model Algorithm Year CheckIn
GNN GraphSCI Imputing Single-cell RNA-seq data by combining Graph Convolution and Autoencoder Neural Networks 2021
GNN scGNN (2020) SCGNN: scRNA-seq Dropout Imputation via Induced Hierarchical Cell Similarity Graph 2020 P1
GNN scGNN (2021) scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses 2021
GNN GNNImpute An efficient scRNA-seq dropout imputation method using graph attention network 2021 P1
Graph Diffusion MAGIC MAGIC: A diffusion-based imputation method reveals gene-gene interactions in single-cell RNA-sequencing data 2018 P1
Probabilistic Model scImpute An accurate and robust imputation method scImpute for single-cell RNA-seq data 2018 P1
GAN scGAIN scGAIN: Single Cell RNA-seq Data Imputation using Generative Adversarial Networks 2019 P1
NN DeepImpute DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data 2019
NN + TF Saver-X Transfer learning in single-cell transcriptomics improves data denoising and pattern discovery 2019 P1
Model Evaluation Metric Mouse Brain (current/reported) Mouse Embryo (current/reported) PBMC (current/reported)
DeepImpute RMSE 0.87 / N/A 1.20 / N/A 2.30 / N/A
GraphSCI RMSE 1.55 / N/A 1.81 / N/A 3.68 / N/A
scGNN2.0 MSE 1.04 / N/A 1.12 / N/A 1.22 / N/A

Note: scGNN2.0 is evaluated on 2,000 genes with highest variance following the original paper.

2)Cell Type Annotation

BackBone Model Algorithm Year CheckIn
GNN ScDeepsort Single-cell transcriptomics with weighted GNN 2021
Logistic Regression Celltypist Cross-tissue immune cell analysis reveals tissue-specific features in humans. 2021
Random Forest singleCellNet SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species. 2019
Neural Network ACTINN ACTINN: automated identification of cell types in single cell RNA sequencing. 2020
Hierarchical Clustering SingleR Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage. 2019 P1
SVM SVM A comparison of automatic cell identification methods for single-cell RNA sequencing data. 2018
Model Evaluation Metric Mouse Brain 2695 (current/reported) Mouse Spleen 1759 (current/reported) Mouse Kidney 203 (current/reported)
scDeepsort ACC 0.542/0.363 0.969/0.965 0.847/0.911
Celltypist ACC 0.824/0.666 0.908/0.848 0.823/0.832
singleCellNet ACC 0.693/0.803 0.975/0.975 0.795/0.842
ACTINN ACC 0.727/0.778 0.657/0.236 0.762/0.798
SVM ACC 0.683/0.683 0.056/0.049 0.704/0.695

3)Clustering

BackBone Model Algorithm Year CheckIn
GNN graph-sc GNN-based embedding for clustering scRNA-seq data 2022
GNN scTAG ZINB-based Graph Embedding Autoencoder for Single-cell RNA-seq Interpretations 2022
GNN scDSC Deep structural clustering for single-cell RNA-seq data jointly through autoencoder and graph neural network 2022
GNN scGAC scGAC: a graph attentional architecture for clustering single-cell RNA-seq data 2022 P1
AutoEncoder scDeepCluster Clustering single-cell RNA-seq data with a model-based deep learning approach 2019
AutoEncoder scDCC Model-based deep embedding for constrained clustering analysis of single cell RNA-seq data 2021
AutoEncoder scziDesk Deep soft K-means clustering with self-training for single-cell RNA sequence data 2020 P1
Model Evaluation Metric 10x PBMC (current/reported) Mouse ES (current/reported) Worm Neuron (current/reported) Mouse Bladder (current/reported)
graph-sc ARI 0.72 / 0.70 0.82 / 0.78 0.57 / 0.46 0.68 / 0.63
scDCC ARI 0.82 / 0.81 0.98 / N/A 0.51 / 0.58 0.60 / 0.66
scDeepCluster ARI 0.81 / 0.78 0.98 / 0.97 0.51 / 0.52 0.56 / 0.58
scDSC ARI 0.72 / 0.78 0.84 / N/A 0.46 / 0.65 0.65 / 0.72
scTAG ARI 0.77 / N/A 0.96 / N/A 0.49 / N/A 0.69 / N/A

Multimodality Module

1)Modality Prediction

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022
GNN ScMoLP Link Prediction Variant of ScMoGCN 2022 P1
GNN GRAPE Handling Missing Data with Graph Representation Learning 2020 P1
Generative Model SCMM SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS 2021
Auto-encoder Cross-modal autoencoders Multi-domain translation between single-cell imaging and sequencing data using autoencoders 2021
Auto-encoder BABEL BABEL enables cross-modality translation between multiomic profiles at single-cell resolution 2021
Model Evaluation Metric GEX2ADT (current/reported) ADT2GEX (current/reported) GEX2ATAC (current/reported) ATAC2GEX (current/reported)
ScMoGCN RMSE 0.3885 / 0.3885 0.3242 / 0.3242 0.1778 / 0.1778 0.2315 / 0.2315
SCMM RMSE 0.6264 / N/A 0.4458 / N/A 0.2163 / N/A 0.3730 / N/A
Cross-modal autoencoders RMSE 0.5725 / N/A 0.3585 / N/A 0.1917 / N/A 0.2551 / N/A
BABEL RMSE 0.4335 / N/A 0.3673 / N/A 0.1816 / N/A 0.2394 / N/A

2) Modality Matching

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022
GNN/Auto-ecnoder GLUE Multi-omics single-cell data integration and regulatory inference with graph-linked embedding 2021 P1
Generative Model SCMM SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS 2021
Auto-encoder Cross-modal autoencoders Multi-domain translation between single-cell imaging and sequencing data using autoencoders 2021
Model Evaluation Metric GEX2ADT (current/reported) GEX2ATAC (current/reported)
ScMoGCN Accuracy 0.0827 / 0.0810 0.0600 / 0.0630
SCMM Accuracy 0.005 / N/A 5e-5 / N/A
Cross-modal autoencoders Accuracy 0.0002 / N/A 0.0002 / N/A

3) Joint Embedding

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022
Auto-encoder scMVAE Deep-joint-learning analysis model of single cell transcriptome and open chromatin accessibility data 2020
Auto-encoder scDEC Simultaneous deep generative modelling and clustering of single-cell genomic data 2021
GNN/Auto-ecnoder GLUE Multi-omics single-cell data integration and regulatory inference with graph-linked embedding 2021 P1
Auto-encoder DCCA Deep cross-omics cycle attention model for joint analysis of single-cell multi-omics data 2021
Model Evaluation Metric GEX2ADT (current/reported) GEX2ATAC (current/reported)
ScMoGCN ARI 0.706 / N/A 0.702 / N/A
ScMoGCNv2 ARI 0.734 / N/A N/A / N/A
scMVAE ARI 0.499 / N/A 0.577 / N/A
scDEC(JAE) ARI 0.705 / N/A 0.735 / N/A
DCCA ARI 0.35 / N/A 0.381 / N/A

4) Multimodal Imputation

BackBone Model Algorithm Year CheckIn
GNN ScMoLP Link Prediction Variant of ScMoGCN 2022 P1
GNN scGNN scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses 2021 P1
GNN GRAPE Handling Missing Data with Graph Representation Learning 2020 P1

5) Multimodal Integration

BackBone Model Algorithm Year CheckIn
GNN ScMoGCN Graph Neural Networks for Multimodal Single-Cell Data Integration 2022 P1
GNN scGNN scGNN is a novel graph neural network framework for single-cell RNA-Seq analyses (GCN on Nearest Neighbor graph) 2021 P1
Nearest Neighbor WNN Integrated analysis of multimodal single-cell data 2021 P1
GAN MAGAN MAGAN: Aligning Biological Manifolds 2018 P1
Auto-encoder SCIM SCIM: universal single-cell matching with unpaired feature sets 2020 P1
Auto-encoder MultiMAP MultiMAP: Dimensionality Reduction and Integration of Multimodal Data 2021 P1
Generative Model SCMM SCMM: MIXTURE-OF-EXPERTS MULTIMODAL DEEP GENERATIVE MODEL FOR SINGLE-CELL MULTIOMICS DATA ANALYSIS 2021 P1

Spatial Module

1)Spatial Domain

BackBone Model Algorithm Year CheckIn
GNN SpaGCN SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network 2021
GNN STAGATE Deciphering spatial domains from spatially resolved transcriptomics with adaptive graph attention auto-encoder 2021
Bayesian BayesSpace Spatial transcriptomics at subspot resolution with BayesSpace 2021 P1
Pseudo-space-time (PST) Distance stLearn stLearn: integrating spatial location, tissue morphology and gene expression to find cell types, cell-cell interactions and spatial trajectories within undissociated tissues 2020
Heuristic Louvain Fast unfolding of community hierarchies in large networks 2008
Model Evaluation Metric 151673 (current/reported) 151676 (current/reported) 151507 (current/reported)
SpaGCN ARI 0.51 / 0.522 0.41 / N/A 0.45 / N/A
STAGATE ARI 0.59 / N/A 0.60 / 0.60 0.608 / N/A
stLearn ARI 0.30 / 0.36 0.29 / N/A 0.31 / N/A
Louvain ARI 0.31 / 0.33 0.2528 / N/A 0.28 / N/A

2)Cell Type Deconvolution

BackBone Model Algorithm Year CheckIn
GNN DSTG DSTG: deconvoluting spatial transcriptomics data through graph-based artificial intelligence 2021
logNormReg SpatialDecon Advances in mixed cell deconvolution enable quantification of cell types in spatial transcriptomic data 2022
NNMFreg SPOTlight SPOTlight: seeded NMF regression to deconvolute spatial transcriptomics spots with single-cell transcriptomes 2021
NN Linear + CAR assumption CARD Spatially informed cell-type deconvolution for spatial transcriptomics 2022
Model Evaluation Metric GSE174746 (current/reported) CARD Synthetic (current/reported) SPOTlight Synthetic (current/reported)
DSTG MSE .1722 / N/A .0239 / N/A .0315 / N/A
SpatialDecon MSE .0014 / .009 .0077 / N/A .0055 / N/A
SPOTlight MSE .0098 / N/A .0246 / 0.118 .0109 / .16
CARD MSE .0012 / N/A .0078 / 0.0062 .0076 / N/A

dance's People

Contributors

adiark avatar dependabot[bot] avatar helloworldlty avatar jdevenegas avatar jiayuanding100 avatar pre-commit-ci[bot] avatar remylau avatar szhorvat avatar wehos avatar wenzhuotang avatar xingzhongyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

dance's Issues

Random `segfault` when running pipeline tuning with `wandb` on certain machines

When running the tuning examples recently introduced in #398 (and #406), there appears to be some random chance of having segfault. The issue was later observed to be machine specific. I have only been getting this random segfault on MSU ICER HPCC (Python 3.8.16). Running the same example script on papermachine does not throw this segfault.

Looking at the core dump file (using pystack), it appears that the issue was related to Python's threading. More particularly when calling sklearn's randomized svd func (maybe some other similar packages as well). See detail core dump log below.

(dance) bash-4.2$ pystack core core.113134
Using executable found in the core file: /mnt/home/liurenmi/software/anaconda3/envs/dance/bin/python

Core file information:
state: D zombie: True niceness: 0
pid: 113134 ppid: 112816 sid: 112816
uid: 790872 gid: 2362 pgrp: 113134
executable: python arguments: python main.py

The process died due a segmentation fault accessing address: 0xffffffffffffff70
Traceback for thread 114928 [] (most recent call last):
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 890, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 300, in check_internal_messages
        self._loop_check_status(
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 251, in _loop_check_status
        join_requested = self._join_event.wait(timeout=wait_time)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 558, in wait
        signaled = self._cond.wait(timeout)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 306, in wait
        gotit = waiter.acquire(True, timeout)

Traceback for thread 114927 [] (most recent call last):
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 890, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 268, in check_network_status
        self._loop_check_status(
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 251, in _loop_check_status
        join_requested = self._join_event.wait(timeout=wait_time)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 558, in wait
        signaled = self._cond.wait(timeout)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 306, in wait
        gotit = waiter.acquire(True, timeout)

Traceback for thread 114926 [] (most recent call last):
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 890, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 286, in check_stop_status
        self._loop_check_status(
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/wandb_run.py", line 251, in _loop_check_status
        join_requested = self._join_event.wait(timeout=wait_time)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 558, in wait
        signaled = self._cond.wait(timeout)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 306, in wait
        gotit = waiter.acquire(True, timeout)

Traceback for thread 114874 [] (most recent call last):
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 890, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/interface/router.py", line 70, in message_loop
        msg = self._read_message()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/interface/router_sock.py", line 27, in _read_message
        resp = self._sock_client.read_server_response(timeout=1)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 285, in read_server_response
        data = self._read_packet_bytes(timeout=timeout)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/sdk/lib/sock_client.py", line 255, in _read_packet_bytes
        data = self._sock.recv(self._bufsize)

Traceback for thread 114845 [Has the GIL] (most recent call last):
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 890, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/agents/pyagent.py", line 298, in _run_job
        self._function()
    (Python) File "main.py", line 83, in evaluate_pipeline
        preprocessing_pipeline(data)
    (Python) File "/mnt/ufs18/home-026/liurenmi/repo/dance/dance/pipeline.py", line 238, in __call__
        func(*args, **kwargs)
    (Python) File "/mnt/ufs18/home-026/liurenmi/repo/dance/dance/transforms/cell_feature.py", line 56, in __call__
        gene_feat = gene_pca.fit_transform(feat.T)  # decompose into gene features
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
        data_to_wrap = f(self, X, *args, **kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/sklearn/base.py", line 1152, in wrapper
        return fit_method(estimator, *args, **kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 460, in fit_transform
        U, S, Vt = self._fit(X)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 512, in _fit
        return self._fit_truncated(X, n_components, self._fit_svd_solver)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/sklearn/decomposition/_pca.py", line 616, in _fit_truncated
        U, S, Vt = randomized_svd(
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/sklearn/utils/extmath.py", line 449, in randomized_svd
        Q = randomized_range_finder(
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/sklearn/utils/extmath.py", line 277, in randomized_range_finder
        Q, _ = linalg.lu(safe_sparse_dot(A, Q), permute_l=True)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/scipy/linalg/_decomp_lu.py", line 220, in lu
        p, l, u, info = flu(a1, permute_l=permute_l, overwrite_a=overwrite_a)

Traceback for thread 114844 [] (most recent call last):
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 890, in _bootstrap
        self._bootstrap_inner()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 932, in _bootstrap_inner
        self.run()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 870, in run
        self._target(*self._args, **self._kwargs)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/agents/pyagent.py", line 178, in _heartbeat
        time.sleep(5)

Traceback for thread 113134 [] (most recent call last):
    (Python) File "main.py", line 108, in <module>
        wandb.agent(sweep_id, function=evaluate_pipeline, count=3)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/wandb_agent.py", line 581, in agent
        return pyagent(sweep_id, function, entity, project, count)
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/agents/pyagent.py", line 348, in pyagent
        agent.run()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/agents/pyagent.py", line 326, in run
        self._run_jobs_from_queue()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/site-packages/wandb/agents/pyagent.py", line 220, in _run_jobs_from_queue
        thread.join()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 1011, in join
        self._wait_for_tstate_lock()
    (Python) File "/mnt/home/liurenmi/software/anaconda3/envs/dance/lib/python3.8/threading.py", line 1027, in _wait_for_tstate_lock
        elif lock.acquire(block, timeout):

More sysinfo below.

Machine that failed:

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Machine that did not fail:

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Skipping for now but might come back later to fix this issue if it appears to be happening to more users other than myself.

Running scmogcn.py example - fatal error

Hello,

I am trying to reproduce the scmogcn.py example from the DANCE documentation. I run this script:
Screenshot 2023-04-19 at 6 47 49 PM

in the same folder where scmogcn.py is located. I'm using MSU's HPCC and a conda environment. After submitting the job, my slurm file gives this error: "dgl._ffi.base.DGLError: [18:00:11] /opt/dgl/src/random/random.cc:36: Check failed: e == CURAND_STATUS_SUCCESS: CURAND Error: CURAND_STATUS_INITIALIZATION_FAILED at /opt/dgl/src/random/random.cc:36"

Do you know what I might be doing wrong or how I can fix this error so that I can successfully reproduce this example?

Some questions about general wrapper for datasets

Hi, I intend to apply this model to different datasets rather than the competition datasets, and I wonder if you have any general loading data structure to load public datasets or not. Moreover, is it possible for me to use a lighter structure comparing the jointembedding structure if I have already processed the given dataset? Thanks.

Release date

This work is quite interesting. Do you have a plan for the code release date?

pip install pydance

When running pip install pydance on a local environment running on an M1 macbook air, the following exception is raised. I believe the issue stems from "tables":

Screenshot 2023-04-21 at 3 41 29 PM

ImportError when directly running the code in the terminal

(base) [zhan2210@gateway-03 ~]$ conda activate dance-env
(dance-env) [zhan2210@gateway-03 ~]$ export LD_LIBRARY_PATH=/mnt/home/zhan2210/
ENTER/lib: $LD_LIBRARY_PATH
(dance-env) [zhan2210@gateway-03 ~]$ python
Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:53:32) [GCC 12
.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from dance.datasets.multimodality import Joint EmbeddingNIPSDataset
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/mnt/ufs18/home-249/zhan2210/dance/dance/datasets/__init__.py", line 1,
in <module>
from dance.datasets.multimodality import JointEmbeddingNIPSDataset, Modalit
yMatchingDataset, ModalityPrediction Dataset
File "/mnt/ufs18/home-249/zhan2210/dance/dance/datasets/multimodality.py", li
ne 12, in <module>
from dance import logger
ImportError: cannot import name 'logger' from 'dance' (unknown location)
>>>

Convention for tensor device

Problem

Currently, there is no convention for which device the data is stored when a function, e.g., fit(), is called. For example, even though the computation device to use is cuda, the input graph may not be on the GPU and will need to be transferred later after subsampling the neighborhoods for mini-batch training. This inconsistency causes many issues for development.

One naive solution is to call tensor.to(device) using the correct device every time a computation is being performed, which is certainly unsatisfactory and makes the code base not as clean.

Solution

  • By default, all data should sit on cpu
  • When a compute function, e.g., fit(), is called, perform any necessary device conversion inside that function.
  • The only exceptions are predict() and score(), which can be more flexible, since they will be used in various places with various configurations, e.g., training on GPU with mini-batch training and evaluating on CPU with full-batch.

The following PR is a corresponding example of fixing the issue: #30

Need to check

grep "\.to(" -r dance | awk -F":" '{print $1}' | sort -u
  • dance/datasets/multimodality.py
  • dance/modules/multi_modality/joint_embedding/dcca.py
  • dance/modules/multi_modality/joint_embedding/jae.py
  • dance/modules/multi_modality/joint_embedding/scmogcn.py
  • dance/modules/multi_modality/joint_embedding/scmogcnv2.py
  • dance/modules/multi_modality/joint_embedding/scmvae.py
  • dance/modules/multi_modality/match_modality/scmm.py
  • dance/modules/multi_modality/match_modality/scmogcn.py
  • dance/modules/multi_modality/predict_modality/babel.py
  • dance/modules/multi_modality/predict_modality/scmm.py
  • dance/modules/multi_modality/predict_modality/scmogcn.py
  • dance/modules/single_modality/cell_type_annotation/actinn.py
  • dance/modules/single_modality/cell_type_annotation/scdeepsort.py
  • dance/modules/single_modality/clustering/graphsc.py
  • dance/modules/single_modality/clustering/scdcc.py
  • dance/modules/single_modality/clustering/scdeepcluster.py
  • dance/modules/single_modality/clustering/scdsc.py
  • dance/modules/single_modality/clustering/sctag.py
  • dance/modules/single_modality/imputation/deepimpute.py
  • dance/modules/single_modality/imputation/graphsci.py
  • dance/modules/single_modality/imputation/scgnn.py
  • dance/modules/spatial/cell_type_deconvo/dstg.py
  • dance/modules/spatial/cell_type_deconvo/spatialdecon.py
  • dance/modules/spatial/cell_type_deconvo/spotlight.py
  • dance/modules/spatial/spatial_domain/spagcn.py
  • dance/modules/spatial/spatial_domain/stagate.py
  • dance/transforms/graph_construct.py
  • dance/transforms/preprocess.py

DSTG link graph add connections between real spots?

According to the link graph construction description from the paper, the graph should also contain interaction between real-spots.

image

However, in the current implementation (modified from the original implementation, see Su-informatics-lab/DSTG#16), only the cross interactions between pseudo-spots and real-spots are used.

graph = construct_link_graph(pseudo_st_df, real_st_df, k_filter, num_cc)

Should we add the interactions between real-spots as described in the manuscript in the DANCE implementation? If so, it could be achieved quite easily by simply calling construct_link_graph one more time by passing real_st_df as both first two args and combine the edge list with the old graph.

Caution: be aware that one might need to modify the the new graph node indexes (or pass real_st_df and another copy of real_st_df with offsetted index)

Where can I find documenation?

I don't see any place with documentation for how I actually go about using the models in the package. Does any documentation exist?

__init__.py missing from dance/metadata

I work in an HPC environment where conda installation is not possible, pip installation results in a ModuleNotFoundError for dance.metadata. The directory appears to be missing its __init__.py file. Cloning the repo, adding the file and pip installing from the local clone worked for me.

Config support

Use omegaconf DictConfig object for storing and managing configurations. Update preprocessing pipelines construction with config parsing.

  1. Data selection
  2. Preprocessing pipelines
  3. Model params and pipelines
  4. Eval pipelines
  5. (Additional) Results report generation

Add random_state option to PCA transforms

It turns out that the random seed of the SVD solver impacts the PCA loadings (except for the top few) quite a bit, even when the tol option is set to 0. To enable reproducibility, set the random_state option.

Make utils for generating `dgl` or `pyg` graphs from processed data

Some current graph transforms methods directly save the processed dgl graphs in .uns. The limitations for doing so are

  • Limits the choice of the framework (dgl vs. pyg) to use in the downstream model.
  • .uns is not a good place to store large data

Solution

  • Create utils that generate dgl or pyg graphs given raw feature, edge data, or adjacency matrix.
  • Pass the raw data to method's fit function and construct the graph using the appropriate framework within the fit function.

GraphSCI implementation

For the imputation model GraphSCI, I think the AE part is wrong, the size_factor is cell_specific, so it should have the same number as cells, but in your implementation the size_factor number is gene-specific. For the inputs of AE, it should be the transpose of the input of GNN, so the input's shape should be cell*gene.

Replace model specific score functions with generic ones?

Currently, each model comes with a .score function for evaluating the model performance. Most of them do the same thing, e.g., compute model accuracy for cell-type annotation tasks. Why not just have one generic function that calculates the accuracy (and similarly for other tasks)?

libcusparse.so.11

I followed the install.sh file to install dance in a conda env. Now I am following the tutorial. I am getting this error
"OSError: libcusparse.so.11: cannot open shared object file: No such file or directory"
when running
"from dance.datasets.singlemodality import ClusteringDataset"
I don't know where the issue is coming from. I appreciate any suggestions.

Unifying base data object

Currently, there are several different dataset objects specialized for each task and model (e.g., CellTypeDataset, ClusteringDataset), each of them takes a variety of specialized arguments that are not directly related to the underlying data, e.g., save path, processing scheme, choice of tissue. This complexity makes it quite hard to maintain the code base and implement new methods/datasets.

To improve this situation, we need to isolate raw dataset objects from transformation/processing methods.

  • Base data object
    • Take AnnData as an input and save it as a private attribute (read-only?).
    • Construct data loaders that load g, x, y, etc., to be passed to the model for training/evaluation.
  • Dataset object
    • Download option
    • Transformation option
    • Dataset from paper (preprocessed) -> used to benchmark the reproducibility of the reimplemented model
  • Transformation
    • Leverage functionalities from scanpy (recall that now the base data object store an AnnData object as a (private) attribute

To fix

Single modality

  • examples/single_modality/clustering/scdsc.py (#95)
  • examples/single_modality/clustering/graphsc.py (#95)
  • examples/single_modality/clustering/scdcc.py (#95)
  • examples/single_modality/clustering/sctag.py (#95)
  • examples/single_modality/clustering/scdeepcluster.py (#95)
  • examples/single_modality/imputation/graphsci.py
  • examples/single_modality/imputation/deepimpute.py
  • examples/single_modality/imputation/scgnn.py
  • examples/single_modality/cell_type_annotation/singlecellnet.py (#77)
  • examples/single_modality/cell_type_annotation/celltypist.py (#72)
  • examples/single_modality/cell_type_annotation/scdeepsort.py (#75)
  • examples/single_modality/cell_type_annotation/actinn.py (#63)
  • examples/single_modality/cell_type_annotation/svm.py (#56, #57)

Spatial

  • examples/spatial/spatial_domain/stagate.py (#127)
  • examples/spatial/spatial_domain/louvain.py (#124)
  • examples/spatial/spatial_domain/stlearn.py (#126)
  • examples/spatial/spatial_domain/spagcn.py (#83)
  • examples/spatial/cell_type_deconvo/spotlight.py (#107)
  • examples/spatial/cell_type_deconvo/dstg.py (#103)
  • examples/spatial/cell_type_deconvo/card.py (#93)
  • examples/spatial/cell_type_deconvo/spatialdecon.py (#94)

Multi modality

  • examples/multi_modality/joint_embedding/scmvae.py
  • examples/multi_modality/joint_embedding/dcca.py
  • examples/multi_modality/joint_embedding/jae.py
  • examples/multi_modality/joint_embedding/scmogcnv2.py
  • examples/multi_modality/joint_embedding/scmogcn.py
  • examples/multi_modality/match_modality/cmae.py
  • examples/multi_modality/match_modality/scmm.py
  • examples/multi_modality/match_modality/scmogcn.py
  • examples/multi_modality/predict_modality/babel.py (#89)
  • examples/multi_modality/predict_modality/cmae.py
  • examples/multi_modality/predict_modality/scmm.py
  • examples/multi_modality/predict_modality/scmogcn.py

Model object abstraction and refactoring

  • Model metadata, e.g., description, training info, params
  • Abstract methods (standardize in/out format)
    • fit(*args, **kwargs) -> None
    • predict(*args, **kwargs) -> Union[np.ndarray, torch.Tensor, other?]
    • fit_predict shortcut function combining fit and predict
    • score(*args, **kwargs) -> float (determine the type of metric via metric: Union[str, Callable], default setting saved in class attr. Abstract class per type of tasks?)
    • __repr__ (show model metadata and info)
  • Model saving and loading (checkpointing)?

TODOs

  • single_modality/cell_type_annotation (#163, #164)
  • single_modality/clustering
  • single_modality/imputation
  • multi_modality/joint_embedding
  • multi_modality/match_modality
  • multi_modality/predict_modality
  • spatial/cell_type_deconvo
  • spatial/spatial_domain

Questions about conda installing

Hi, I have a question about conda installing process:

image

I think we should use

conda activate dance

rather than

conda activate dance-env

Is it correct? Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.