Coder Social home page Coder Social logo

single-cell-transformer-papers's Introduction

Transformers in Single-Cell Omics

Note ๐Ÿšง This repository is under construction. This note will disappear as soon as all the all the single-cell transformer paper tables are added.

This repository accompanies Transformers in Single-Cell Omics: A Review and New Perspectives. Please refer to the manuscript for the details.

We provide a curated list of single-cell transformers and their evaluation results. We skip models that work only on bulk or images of slides data and those where transformers are used only as a part of the model. Models focusing on sequential data, such as DNA or protein sequences are omitted too. New entries are added at the top of the corresponding table.

We welcome contributions to this repository. Please open a pull request or an issue if you want to add or edit an entry.

Single-cell transformers

Model Paper Code Omic Modalities Pre-training Dataset Input Embedding Architecture SSL Tasks Supervised Tasks Zero-shot Tasks
scMulan ๐Ÿ“Bian et al. 2024 ๐Ÿ”Github scRNA-Seq 10M / cross-tissue, human (hECA) Not specified Decoder Conditional cell generation cell type annotation, cell metadata annotation (both also used in training) Batch integration
BioFormers ๐Ÿ“Belgadi and Li et al. 2023 None scRNA-Seq 8K / single tissue, human (PBMC, Adamson et al. 2016) Value categorization: value binning Encoder MLM with CE loss None Cell clustering, gene expression imputation, genetic perturbation effect prediction, GRN inference
Geneformer ๐Ÿ“„(Nature)Theodoris et al. 2023 ๐Ÿ› ๐Ÿค— scRNA-Seq 36M / cross-tissue, human (Genecorpus) Ordering: rank-based Encoder MLM with CE loss, gene ID prediction Gene function prediction, cell annotation Cell clustering, GRN inference
Universal Cell Embedding ๐Ÿ“Rosen et al. 2023 ๐Ÿ”Github scRNA-Seq 36M / cross-tissue, cross-species (CELLxGENE and other) Other: ESM-2 based gene embeddings. Gene embeddings are sampled according to expression levels and order determined by position on chromosomes. Encoder Modified MLM, binary CE loss predicting whether a gene is expressed or not. Uses CLS embedding instead of token-embeddings. Cell annotation Cell clustering, cross-species integration
scGPT ๐Ÿ“„(Nature Meth)Cui et al. 2024 ๐Ÿ”GitHub scRNA-Seq, scATAC-Seq, CITE-Seq, Spatial transcriptomics 33M / cross-tissue, human, non-disease (CELLxGENE) Value categorization: value binning Other: attention masking in encoder Iterative MLM variant with MSE loss, cell token expression prediction, gene expression prediction Cell type annotation, genetic perturbation effect prediction, reverse perturbation prediction, cell clustering, multimodal embedding, gene function prediction Cell clustering, GRN inference, simulation, gene expression imputation
TOSICA ๐Ÿ“„(Nature Comms)Chen et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq None Value projection Encoder None Cell type annotation None
scMoFormer ๐Ÿ“„(ACM)Tang et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq, scATAC-Seq, CITE-Seq None Other, SVD-based Encoder and graph transformers None Cross-modality prediction None
tGPT ๐Ÿ“„(Cell iScience)Shen et al. 2023 ๐Ÿ› GitHub๏ธ scRNA-Seq 22M / cross-tissue, cross-species, disease and non-disease, organoids (list) Ordering Decoder NTP with CE loss, gene ID prediction None Cell clustering, trajectory inference
SpaFormer ๐Ÿ“Wen et al. 2023 ๐Ÿ› ๏ธGitHub Spatial transcriptomics None Cells as tokens, value projection Encoder Modified MLM with MSE loss, gene expression prediction Gene expression imputation Cell clustering
scFoundation ๐Ÿ“Hao et al. 2023 and Gong et al. 2023 ๐Ÿ”GitHub scRNA-Seq 50M / cross-tissue, human, disease and non-disease (GEO, Single Cell Portal, HCA, EMBL-EBI) Value projection Other: two encoders Modified MLM with MSE loss, gene expression prediction Drug response prediction, genetic perturbation effect prediction Read depth enhancement, cell clustering
CellLM ๐Ÿ“Zhao et al. 2023 ๐Ÿ”GitHub scRNA-Seq 1.8M / cross-tissue, human, disease and non-disease (PanglaoDB, CancerSCEM) Value categorization Encoder Contrastive loss, MLM with CE loss Non-disease vs cancer prediction, cell type annotation, drug response prediction None
scCLIP ๐Ÿ“Xiong et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq, scATAC-seq 377k / cross-tissue, human fetal (ATAC, RNA) Value projection Encoder Contrastive loss, CE matching modalities None Multimodal embedding
GeneCompass ๐Ÿ“Yang et al. 2023 GitHub, no code yet scRNA-Seq 126M / cross-tissue, human and mouse, disease and non-disease (GEO, SRA, CELLxGENE, GSA, Single Cell Portal, HCA, EMBL-EBI, 3CA, Cell BLAST, TEDD, and other) ? Other: two encoders MLM with CE and MSE loss, gene ID and expression prediction Cell type annotation, drug response prediction, gene function prediction Cross-species integration, genetic perturbation effect prediction, GRN inference
CellPLM ๐Ÿ“„(ICLR)Wen et al. 2023 Partial ๐Ÿ”GitHub scRNA-Seq, Spatial transcriptomics 11M / cross-tissue, human, disease and non-disease (HTCA, HCA, GEO) Cells as tokens, value projection Encoder Modified MLM with MSE loss and KL losses, gene expression prediction Gene expression imputation, cell type annotation, genetic perturbation effect prediction Cell clustering, scRNA-Seq denoising
scMAE ๐Ÿ“Kim et al. 2023 None single-cell flow cytometry 6.5M / human, disease and non-disease (source?) Other, concatenation of values with learnable protein embeddings Other: two encoders MLM with MSE loss, protein expression prediction Cell type annotation, protein expression imputation None
CAN/CGRAN ๐Ÿ“Wang et al. 2023 None scRNA-Seq None Value projection Encoder None Cell type annotation None
scTranslator ๐Ÿ“Liu et al. 2023 ๐Ÿ”๏ธGitHub scRNA-Seq, CITE-Seq None Value projection Other: two encoders None Cross-modality prediction (After cross-modality prediction training) GRN inference, cell clustering
scTransSort ๐Ÿ“„(MDPI)Jiao et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq None Value projection Encoder None Cell type annotation None
STGRNS ๐Ÿ“„(OUP)Xu et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq None Other Encoder None GRN inference None
CIForm ๐Ÿ“„(OUP)Xu et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq None Value projection Encoder None Cell type annotation None
scFormer ๐Ÿ“Cui et al. 2023 Incomplete ๏ธGitHub scRNA-Seq Task specific Value categorization: value binning Encoder Modified MLM with CE, cell token expression prediction, contrastive loss with cosine similarity, gene expression prediction Cell type annotation, genetic perturbation effect prediction Cell clustering
Exceiver ๐Ÿ“Connell et al. 2022 ๐Ÿ› ๏ธGitHub scRNA-Seq 0.5M / cross-tissue, human (Tabula Sapiens) Other: value scaled embeddings Encoder Modified MLM with MSE, gene expression prediction Cell type annotation, drug response prediction Cell clustering
TransCluster ๐Ÿ“„(Frontiers)Song et al. 2022 ๐Ÿ› ๏ธGitHub scRNA-Seq None Value projection with LDA Encoder None Cell type annotation None
scBERT ๐Ÿ“„(Nature MI)Yang et al. 2022 ๐Ÿ”GitHub scRNA-Seq 1M / cross-tissue, human (PanglaoDB) Value categorization, binning Encoder MLM with CE loss, gene expression prediction Cell type annotation, unseen cell type detection None
iSEEEK ๐Ÿ“„(OUP)Shen et al. 2022 ๐Ÿ”Github (dataset not public) scRNA-Seq 11.9M / cross-tissue, cross-species (list) Ordering: rank-based Encoder MLM with CE loss Marker gene classification Cell clustering, pseudotime analysis, GRN inference
Multitask learning ๐Ÿ“Pang et al. 2020 None scRNA-Seq 160k / brain, mouse (MBA) Value projection Other: autoencoder with two transformer encoders (?) Modified MLM with MSE loss, gene expression prediction None Cell clustering

Transformer LLMs for single-cell

Model Paper Code Omic Modalities Pre-training Dataset Input Embedding Architecture SSL Tasks Supervised Tasks Zero-shot Tasks
scInterpreter ๐Ÿ“Li et al. 2024 None scRNA-Seq Natural Language GPT-3.5 and Llama-13b Other: Ordering with embedding of the natural language representation Decoder, GPT-3.5 and Llama-13b NTP with CE loss and instruction finetuning (GPT-3.5 closed-source) None Cell type annotation (LLMs frozen, only small MLP trained)
ChatCell ๐Ÿ“โŒFang et al. 2024 ๐Ÿ› GitHub scRNA-Seq Natural Language T5 and natural language instructions Other: Ordering with embedding as natural language with additional terms Encoder-Decoder, T5 NTP with CE loss None (conditional sequence generation, prompting) Simulation, cell type annotation, drug sensitivity prediction
MarkerGeneBERT ๐Ÿ“Cheng et al. 2023 None scRNA-Seq Natural Language, PubMed and PubMed Central Other: Natural language preprocessed with SciBERT Encoder MLM Named Entity Recognition (NER), cell-biomarker sentence classification None
scELMo ๐Ÿ“Liu, Chen and Zheng 2023 Partial ๐Ÿ”GitHub scRNA-Seq, CITE-Seq Natural Language, Closed source Other: NLP model embeddings of features weighted by the feature level in a cell (e.g. expression level) Closed source (some open) Closed source (some open) Cell type annotation, Genetic perturbation effect prediction Cell and gene embeddings in other perturbation models
GenePT ๐Ÿ“Chen and Zou 2023 Partial ๐Ÿ”GitHub scRNA-Seq Natural Language, Closed source Ordering: embedding as natural language Closed source Closed source Gene function prediction Cell clustering, GRN inference
GPT-4 ๐Ÿ“Z. Ji and Hou 2023 None scRNA-Seq Natural Language, Closed source Ordering: embedding as natural language Closed source Closed source None (coditional sequence generation, prompting) Cell type annotation
Cell2Sentence ๐Ÿ“Levine et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq Natural Language (GPT2) and scRNA-Seq (40k / immune, human) Ordering: embedding as natural language Decoder NTP with CE loss None Simulation, cell type annotation

Single-cell transformer evaluation

Paper Code Omic Modalities Evaluated Transformers Tasks Notes
๐Ÿ“He et al. 2024 ๐Ÿ› ๏ธGitHub scRNA-Seq scGPT Cell type annotation Evaluation of Parameter-Efficient Fine-Tuning (PEFT) for scGPT. Indicates that PEFT not only is more compute-efficient, but also results in better cell type prediction.
๐Ÿ“„(Nature MI)Khan et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq scBERT Cell type annotation. Unseen cell type detection Focused on imbalanced cell type classification. scBERT is sensitive to class imbalance. scBERT outperforms Seurat. scBERT doesn't perform well in unseen cell type detection. It benefits from SSL pretraining.
๐Ÿ“Liu et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq, scATAC-Seq, Spatial transcriptomics scGPT, Geneformer, scBERT, tGPT, CellLM Cell clustering, cell type annotation, multimodal embedding, GRN inference, gene expression imputation, genetic perturbation effect prediction, simulation, gene function prediction Models aren't trained on the same datasets. scGPT is positioned as most versatile in terms of task diversity that it can tackle. Models other than transformer appear to be at least as good as transformers in most tasks. Transformers were shown to be sensitive to the choice of hyperparameters, such as learning rate and epochs.
๐Ÿ“Boiarsky et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq scBERT, scGPT Cell type annotation Logistic regression appears to be as good as transformers in cell type annotation, even in low-data scenarios.
๐Ÿ“Kedzierska et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq scGPT, Geneformer Cell clustering Zero-shot performance only. Both models appear unreliable.
๐Ÿ“Alsabbagh et al. 2023 ๐Ÿ› ๏ธGitHub scRNA-Seq scGPT, Geneformer, scBERT Cell type annotation Focused on imbalanced cell type classification. Geneformer appears to be outperformed by scGPT and scBERT, where the two latter perform similarly.

Legend

  • ๐Ÿ“ - Preprint
  • ๐Ÿ“„ - Peer-Reviewed Publication
  • ๐Ÿ› ๏ธ - Fully reproducible
  • ๐Ÿ” - Code for evaluation only
  • โŒ - Retracted or withdrawn

Citing this work

If you find the the data in this repository useful for your work, please cite:

@Article{TBA}

single-cell-transformer-papers's People

Contributors

szalata avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.