caokai1073 / uniport Goto Github PK

View Code? Open in Web Editor NEW

30.0 30.0 3.0 83.44 MB

a unified single-cell data integration framework by optimal transport

License: MIT License

Python 56.58% R 43.42%

uniport's People

Contributors

Stargazers

Watchers

Forkers

hai178912522 yupines mortunco

uniport's Issues

Error in batch_scale when the cell number can be divided by chunk_size

I encountered errors saying 'ValueError: Found array with 0 sample(s) (shape=(0, 2000)) while a minimum of 1 is required by MaxAbsScaler.' when my cell number can be divided by the default chunk_size. I think line 185 in function.py should be changed from 'for i in range(len(idx)//chunk_size+1):' to 'for i in range(int(np.ceil(len(idx)/chunk_size))):'.

Impute Merfish up to 2000 genes

Hello,

Thank you very much for being super responsive to all github issues.

I successfully followed the Impute genes for MERFISH tutorial. In the tutorial we have 155 genes and use 122 for training and impute 33 to test.

scRNAseq data comes with 21043 genes. I was wondering if uniPort could impute the remaining genes of MERFISH data up to 21043? Doesnt have to be 21043 but lets say 2000.

Is there a way to match cells in the MERFISH data with scRNAseq so that those cells in scRNASeq will be assigned with a spatial coordinate information.

Basically, I would like to map each MERFISH cell to scRNAseq so that I over come the limited gene number problem of MERFISH OR spatial information to scRNAseq data.

Thank you very much for your time and patience,

Best,

Bad Installation --> AttributeError: module 'numpy' has no attribute 'warnings'.

Hello.

I am having problems importing uniport. Basically, I created a fresh conda environment with pip and followed installation procedure. Here is the output.

(uniport-4) $ python
Python 3.10.9 (main, Jan 11 2023, 15:21:40) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import uniport as up
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ec2-user/synlico-efs/tme/st-sc-integration/uniport/code/uniPort/uniport/__init__.py", line 8, in <module>
    from .function import Run, get_prior, label_reweight, load_file, filter_data, batch_scale, TFIDF_LSI
  File "/home/ec2-user/synlico-efs/tme/st-sc-integration/uniport/code/uniPort/uniport/function.py", line 28, in <module>
    np.warnings.filterwarnings('ignore')
  File "/opt/conda/envs/uniport-4/lib/python3.10/site-packages/numpy/__init__.py", line 320, in __getattr__
    raise AttributeError("module {!r} has no attribute "
AttributeError: module 'numpy' has no attribute 'warnings'. Did you mean: 'hanning'?

Here is the relavent packages with their version.

numpy                     1.24.4                   pypi_0    pypi
pandas                    2.0.2                    pypi_0    pypi
scanpy                    1.9.3                    pypi_0    pypi

I think this is a numpy version problem. To solve this, I also created a fresh conda env and tried pip install -r requirements in the repo. Same problem. I also tried installing without "=>" and made the versions exact. Pip install failed during the installation.

I would appreciate if could help me with this. We loved the tool, we loved the paper. Would like to implement to our work.

Best regards,

uniPort also does the annotations in the spatial data?

Thanks for making a such a beautiful program. I read the paper and it impute the genes in spatial data using scRNA-sequencing data. I was reading following tutorial
https://uniport.readthedocs.io/en/latest/examples/MERFISH/MERFISH_and_scRNA.html

And it create a latent representation of combined spatial and sequencing data. I am more interested in following function
uniport.metrics.label_transfer(ref, query, rep='latent', label='celltype')
to transfer the annotation of single-cell sequencing into spatial data. So I am bit confused how to use label_transfer function for my data.
I have the following
(1) spatial data (adata_merfish): AnnData object with n_obs × n_vars = 7416 × 241
(2) sequencing data (adata_rna with HVG): AnnData object with n_obs × n_vars = 2239 × 2000
(3) adata_cm = adata_merfish.concatenate(adata_rna, join='inner', batch_key='domain_id')
(4) adata = up.Run(adatas=[adata_merfish, adata_rna], adata_cm=adata_cm)
AnnData object with n_obs × n_vars = 9655 × 188

Could you guide me that what should go in ref and query variable in the label_transfer because adata_merfish and adata_rna have different number of genes?
Thank you.

data_loader issue

Hi uniPort authors,

First, thanks your great work!

However, when I tried to run uniPort on the sample data provided by scvi-tools (from scvi.data import smfish, cortex ), there was error reported.

The error is due to the broadcast issue of the sparse matrix, which is led by the line 247 of the data_loader.py file.

x = scipy.sparse.vstack((x, scipy.sparse.hstack((x_c, x_s))))

I made the x_s to scipy.sparse.coo_matrix(x_s).

At least for now, it runs okay, but not sure if it was just for me.

Thanks,
Hongru

Outputting the OT matrix failed

adata, OT = up.Run(adatas=[spot,rna], adata_cm=adata_cm, save_OT=True)

(1) When using the CPU:
The code produced an error with the following message:
IndexError: index 37385 is out of bounds for dimension 0 with size 1

This error indicates that the index 37385 is invalid for the array being accessed. Further investigation is needed to determine the cause of the error and to fix it.

(2) When using the GPU from Colab, Colab Pro, Linux server and RTX3090 on windows, all reported the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling cublasLtMatmul( ltHandle, computeDesc.descriptor(), &alpha_val, mat1_ptr, Adesc.descriptor(), mat2_ptr, Bdesc.descriptor(), &beta_val, result_ptr, Cdesc.descriptor(), result_ptr, Cdesc.descriptor(), &heuristicResult.algo, workspace.data_ptr(), workspaceSize, at::cuda::getCurrentCUDAStream())

This error suggests a problem with the CUDA implementation, possibly due to a mismatch between the version of CUDA being used and the hardware or software environment. It may be necessary to consult the author of the code for assistance in resolving this issue.
(3) The error was initially thought to be related to memory usage, but changing the batch size to 100 or even 10 did not solve the issue. Therefore, the problem may not be related to memory limitations.

Is specific cuda version needed?

Joint analysis of paired and unpaired multiomic data

Good day!

I am interested in using your tool and would like to know whether it is suitable to integrate paired and unpaired multiomic data. In this case, I have samples with only scRNA and scATAC as well as paired multiome (RNA+ATAC).

In case it is possible, how to set up the pipeline to take into account the three 'batches'.

Thanks in advance!

Can't reproduce result of harmony

It a wonderful computational tool!! When I test the PBMC pair dataset(https://uniport.readthedocs.io/en/latest/examples/PBMC/pbmc_integration.html) with harmony, I can't reproduce result displayed in your paper.

I run code with

rm(list=ls())
library(Seurat)
rm(list=ls())
suppressPackageStartupMessages({
  library(SummarizedExperiment)
  library(ggplot2)
  library(Seurat)
  library(SeuratWrappers)
  library(patchwork)
  library(cowplot)
  library(harmony)
})

options(repr.plot.width = 12, repr.plot.height = 5)

data_seurat = readRDS("../../dataset/PBMC_pair/PBMC_pair_raw.rds")

print("===================Normalize SeuratObject=======")
data_seurat <- NormalizeData(data_seurat, verbose = FALSE)
print("===================Find HVG=====================")
data_seurat <- FindVariableFeatures(data_seurat, selection.method = "vst", nfeatures = 2000, verbose = FALSE)
print("===================scale Data===================")
data_seurat <- ScaleData(data_seurat, verbose = FALSE)
print("===================Running PCA===================")
data_seurat <- RunPCA(data_seurat, npcs = 30, verbose = F)

data_seurat <- data_seurat %>% RunHarmony("domain_id", plot_convergence = F,max.iter.harmony=50)

print("========================Running UMAP==================")
data_seurat <- RunUMAP(data_seurat, reduction = "harmony", dims = 1:30, verbose = F)
print("========================Visulize UMAP=================")
p1=DimPlot(data_seurat, reduction = "umap", group.by = "domain_id", label.size = 10)+ggtitle("Integrated Batch")
p2=DimPlot(data_seurat, reduction = "umap", group.by = "cell_type",label.size = 10)+ggtitle("Integrated Celltype")
p= p1 + p2 
print(p)

I get the result

It seems that the integration of harmony is very bad, which is different with the result

I also run harmony with python, whose result is very similar to harmony's result in R,

import scanpy.external as sce
import scanpy as sc
import pandas as pd 
labels = pd.read_csv('../UniPort/uniPort-main/PBMC/meta.txt', sep='\t')
celltype = labels['cluster'].values
print(labels.shape)
print(celltype.shape)

adata_rna = sc.read('../UniPort/uniPort-main/PBMC/rna.h5ad')
adata_atac = sc.read('../UniPort/uniPort-main/PBMC/atac_meastro.h5ad')
print(adata_rna)
print(adata_atac)

adata_atac.obs['cell_type'] = celltype
adata_atac.obs['domain_id'] = 0
adata_atac.obs['domain_id'] = adata_atac.obs['domain_id'].astype('category')
adata_atac.obs['source'] = 'ATAC'

adata_rna.obs['cell_type'] = celltype
adata_rna.obs['domain_id'] = 1
adata_rna.obs['domain_id'] = adata_rna.obs['domain_id'].astype('category')
adata_rna.obs['source'] = 'RNA'

adata = adata_atac.concatenate(adata_rna, join='inner', batch_key='domain_id')
adata.obs["celltype"]=adata.obs["cell_type"].copy()
adata.obs["BATCH"] = adata.obs["domain_id"].copy()

sc.pp.normalize_total(adata,target_sum=10000)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata, n_top_genes=2000, inplace=False, subset=True)
sc.pp.scale(adata)
sc.tl.pca(adata)
# sc.pp.neighbors(adata)
# sc.tl.umap(adata)
# sc.pl.umap(adata,color=["BATCH","celltype"],ncols=1)
# print(adata)
sce.pp.harmony_integrate(adata, 'BATCH')
sc.pp.neighbors(adata,use_rep="X_pca_harmony")
#sc.tl.louvain(adata,resolution=3.0)
sc.tl.umap(adata)
sc.pl.umap(adata,color=['BATCH','celltype'],ncols=1)

result is below

I felt confused with this problem, could you give me some suggestions?

weird trend for running time when increasing cell number

Hi Kai,

I observed a weird trend for the running time when I applied uniPort to datasets that included 1k, 3k, 5k, 10k, 15k, 20k and 50k cells (both RNA and ATAC). The running time first decreased until the sample size reached 15k and then increased. The longest time was observed when there were only 1k cells. Do you have any explanations about this observation?

Best,

Yuge

caokai1073 / uniport Goto Github PK

uniport's People

Contributors

Stargazers

Watchers

Forkers

uniport's Issues

Error in batch_scale when the cell number can be divided by chunk_size

Impute Merfish up to 2000 genes

Bad Installation --> AttributeError: module 'numpy' has no attribute 'warnings'.

uniPort also does the annotations in the spatial data?

data_loader issue

Outputting the OT matrix failed

Joint analysis of paired and unpaired multiomic data

Can't reproduce result of harmony

weird trend for running time when increasing cell number

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent