Manifold Alignment of Single-Cell Transcriptomes with Cell Triplets
MAT2 is designed to align multiple single-cell transcriptome datasets. The operation steps include:
- Manifold alignment based on contrastive learning: For a cell of interest C, MAT2 will select a cell Cp from the same cell type but a different dataset and a cell Cn from a different cell type to form a cell triplet (C, Cp, Cn). With contrastive learning, the distance between C and Cp in the latent manifold space will be much smaller than that between C and Cn, so as to achieve the alignment of single-cell transcriptome.
- Reconstruction of gene expression profile: With neural network decoders, consensus gene expression and batch-specific deviation will be reconstructed. Among them, consensus gene expression can be used for downstream analysis such as differential expression analysis and lineage tracing.
Firstly, please use git to clone the MAT2 repository.
git clone https://github.com/Zhang-Jinglong/MAT2.git
cd MAT2/
The Python packages that MAT2 depends on can be installed through conda. Run setup.py on the command line to install MAT2.
conda install --file requirements.txt --yes
python setup.py install
There is an example jupyter notebook demo/test.ipynb
in the source code of MAT2, which demonstrates the method of aligning single-cell transcriptome datasets using MAT2.
The following is a brief description of the usage of MAT2 in Python:
The test data can be found in the demo/
folder in the MAT2 repository.
import pandas as pd
from MAT2 import *
# MAT2 receives pandas DataFrame as input data.
# Multiple batches of data are concated into a matrix of size gene_num * cell_num.
data = pd.read_csv('data.csv', header=0, index_col=0)
# The row name of metadata should correspond to the cell name in data.
# Metadata must contain the 'batch' column, and must also contain the 'type' column when supervised.
metadata = pd.read_csv('metadata.csv', header=0, index_col=0)
# Anchor needs to be loaded only in unsupervised situations.
# Each record contains two cell numbers (cell in [0,cell_num-1]) and a score (score in [0.0,1.0]).
anchor = pd.read_csv('anchor.csv', header=0, index_col=0)
When providing cell type annotations for model building:
model = BuildMAT2(
data=data,
metadata=metadata,
num_workers=2,
use_gpu=True,
mode='supervised',
dropout_rate=0.3)
model.train(epochs=30)
When there is no cell type annotation but anchor is provided:
model = BuildMAT2(
data=data,
metadata=metadata,
anchor=anchor,
num_workers=2,
use_gpu=True,
mode='manual')
model.train(epochs=30)
When providing part of cell type annotations, run in semi-supervised mode:
model = BuildMAT2(
data=data,
metadata=metadata,
anchor=anchor,
num_workers=2,
use_gpu=True,
mode='semi-supervised')
model.train(epochs=30)
# test_data = data
# Calculate the reconstructed consensus gene expression.
rec = model.evaluate(test_data)
# Your own downstream analysis.