MAT²

Manifold Alignment of Single-Cell Transcriptomes with Cell Triplets

Overview
Installation
Usage

Overview

MAT² is designed to align multiple single-cell transcriptome datasets. The operation steps include:

Manifold alignment based on contrastive learning: For a cell of interest C, MAT² will select a cell C_p from the same cell type but a different dataset and a cell C_n from a different cell type to form a cell triplet (C, C_p, C_n). With contrastive learning, the distance between C and C_p in the latent manifold space will be much smaller than that between C and C_n, so as to achieve the alignment of single-cell transcriptome.
Reconstruction of gene expression profile: With neural network decoders, consensus gene expression and batch-specific deviation will be reconstructed. Among them, consensus gene expression can be used for downstream analysis such as differential expression analysis and lineage tracing.

Installation

Firstly, please use git to clone the MAT² repository.

git clone https://github.com/Zhang-Jinglong/MAT2.git
cd MAT2/

The Python packages that MAT² depends on can be installed through conda. Run setup.py on the command line to install MAT².

conda install --file requirements.txt --yes
python setup.py install

Usage

There is an example jupyter notebook demo/test.ipynb in the source code of MAT², which demonstrates the method of aligning single-cell transcriptome datasets using MAT².

The following is a brief description of the usage of MAT² in Python:

Loading datasets

The test data can be found in the demo/ folder in the MAT² repository.

import pandas as pd
from MAT2 import *

# MAT2 receives pandas DataFrame as input data.
# Multiple batches of data are concated into a matrix of size gene_num * cell_num.
data = pd.read_csv('data.csv', header=0, index_col=0)

# The row name of metadata should correspond to the cell name in data.
# Metadata must contain the 'batch' column, and must also contain the 'type' column when supervised.
metadata = pd.read_csv('metadata.csv', header=0, index_col=0)

# Anchor needs to be loaded only in unsupervised situations.
# Each record contains two cell numbers (cell in [0,cell_num-1]) and a score (score in [0.0,1.0]).
anchor = pd.read_csv('anchor.csv', header=0, index_col=0)

Building model & training

When providing cell type annotations for model building:

model = BuildMAT2(
    data=data,
    metadata=metadata,
    num_workers=2,
    use_gpu=True,
    mode='supervised',
    dropout_rate=0.3)
model.train(epochs=30)

When there is no cell type annotation but anchor is provided:

model = BuildMAT2(
    data=data,
    metadata=metadata,
    anchor=anchor,
    num_workers=2,
    use_gpu=True,
    mode='manual')
model.train(epochs=30)

When providing part of cell type annotations, run in semi-supervised mode:

model = BuildMAT2(
    data=data,
    metadata=metadata,
    anchor=anchor,
    num_workers=2,
    use_gpu=True,
    mode='semi-supervised')
model.train(epochs=30)

Testing

# test_data = data
# Calculate the reconstructed consensus gene expression.
rec = model.evaluate(test_data)
# Your own downstream analysis.

zhaoxm-lab / mat2 Goto Github PK

mat2's Introduction

MAT²

Overview

Installation

Usage

Loading datasets

Building model & training

Testing

mat2's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent