nvidia-genomics-research / rapids-single-cell-examples Goto Github PK

Examples of single-cell genomic analysis accelerated with RAPIDS

License: Apache License 2.0

Jupyter Notebook 99.25% Python 0.71% Dockerfile 0.01% Shell 0.03%

rapids-single-cell-examples's Introduction

GPU-Accelerated Single-Cell Genomics Analysis with RAPIDS

This repository contains example notebooks demonstrating how to use RAPIDS for GPU-accelerated analysis of single-cell sequencing data.

RAPIDS is a suite of open-source Python libraries that can speed up data science workflows using GPU acceleration.Starting from a single-cell count matrix, RAPIDS libraries can be used to perform data processing, dimensionality reduction, clustering, visualization, and comparison of cell clusters.

Several of our examples are inspired by the Scanpy tutorials and based upon the AnnData format. Currently, we provide examples for scRNA-seq and scATAC-seq, and we have scaled up to 1 million cells. We also show how to create GPU-powered interactive, in-browser visualizations to explore single-cell datasets.

Dataset sizes for single-cell genomics studies are increasing, presently reaching millions of cells. With RAPIDS, it becomes easy to analyze large datasets interactively and in real time, enabling faster scientific discoveries.

Installation

Docker container

A container with all dependencies, notebooks and source code is available at https://hub.docker.com/r/claraparabricks/single-cell-examples_rapids_cuda11.0.

Please execute the following commands to start the notebook and follow the URL in the log to open Jupyter web application.

docker pull claraparabricks/single-cell-examples_rapids_cuda11.0

docker run --gpus all --rm -v /mnt/data:/data claraparabricks/single-cell-examples_rapids_cuda11.0

conda

All dependencies for these examples can be installed with conda.

conda env create --name rapidgenomics -f conda/rapidgenomics_cuda11.5.yml
conda activate rapidgenomics
python -m ipykernel install --user --display-name "Python (rapidgenomics)"

After installing the necessary dependencies, you can just run jupyter lab. There are are a few different conda environment files which correspond to different notebooks. In addition to the one listed above, there is one for the CPU notebooks, one for the real-time visualization notebook, and one for the AtacSeq notebook.

Configuration

Unified Virtual Memory (UVM) can be used to oversubscribe your GPU memory so that chunks of data will be automatically offloaded to main memory when necessary. This is a great way to explore data without having to worry about out of memory errors, but it does degrade performance in proportion to the amount of oversubscription. UVM is enabled by default in these examples and can be enabled/disabled in any RAPIDS workflow with the following:

import cupy as cp
import rmm
rmm.reinitialize(managed_memory=True)
cp.cuda.set_allocator(rmm.rmm_cupy_allocator)

RAPIDS provides a GPU Dashboard, which contains useful tools to monitor GPU hardware right in Jupyter.

Citation

If you use this code, please cite our preprint:

Nolet C., Lal A., et al., (2022). Accelerating single-cell genomic analysis with GPUs. bioRxiv.

Example 1: Single-cell RNA-seq of 70,000 Human Lung Cells

We use RAPIDS to accelerate the analysis of a ~70,000-cell single-cell RNA sequencing dataset from human lung cells. This example includes preprocessing, dimension reduction, clustering, visualization and gene ranking.

Example Dataset

The dataset is from Travaglini et al. 2020. If you wish to run the example notebook using the same data, use the following command to download the count matrix for this dataset and store it in the data folder:

wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/krasnow_hlca_10x.sparse.h5ad

Example Code

Follow this Jupyter notebook for RAPIDS analysis of this dataset. In order for the notebook to run, the file rapids_scanpy_funcs.py needs to be in the same folder as the notebook.

We provide a second notebook with the CPU version of this analysis here.

Acceleration

We report the runtime of these notebooks on various GCP instances below. All runtimes are given in seconds. Acceleration is given in parentheses. Benchmarking was performed on Dec 16, 2020.

Step	CPU n1-standard-16 16 vCPUs	GPU n1-standard-16 T4 16 GB GPU (Acceleration)	GPU n1-highmem-8 Tesla V100 16 GB GPU (Acceleration)	GPU a2-highgpu-1g Tesla A100 40GB GPU (Acceleration)
Preprocessing	70	92 (0.76x)	53 (1.3x)	59 (1.2x)
PCA	10.6	5.0 (2.1x)	3.2 (3.3x)	2.7 (3.9x)
t-SNE	220	2.8 (79x)	1.4 (157x)	2.2 (100x)
k-means (single iteration)	14.3	0.31 (46x)	0.12 (119x)	0.08 (179x)
KNN	17.8	7.6 (2.3x)	6.8 (2.6x)	5.7 (3.1x)
UMAP	97	1.1 (88x)	0.55 (176x)	0.53 (x)
Louvain clustering	13.9	0.24 (58x)	0.15 (93x)	0.11 (126x)
Leiden clustering	12.8	0.17 (75x)	0.09 (142x)	0.08 (160x)
Differential Gene Expression	153	36 (4.3x)	7.5 (20.4x)	6.3 (24x)
Re-analysis of subgroup	29	7.6 (3.8x)	3.6 (8x)	3.5 (8x)
End-to-end notebook run	654	166	93	96
Price ($/hr)	0.760	1.110	2.953	3.673
Total cost ($)	0.138	0.051	0.076	0.098

Example 2: Single-cell RNA-seq of 1.3 Million Mouse Brain Cells

We demonstrate the use of RAPIDS to accelerate the analysis of single-cell RNA-seq data from 1.3 million cells. This example includes preprocessing, dimension reduction, clustering and visualization.

Compared to the previous example, here we make several adjustments to handle the larger dataset. We perform most of the preprocessing operations (e.g. filtering, normalization) while reading the dataset in batches. Further, we perform a batched PCA by training on a fraction of cells and transforming the data in batches.

This example relies heavily on UVM. While it should work on any GPU built on the Pascal architecture or newer, you will want to make sure there is enough main memory available. Oversubscribing a GPU by more than a factor of 2x can cause thrashing in UVM, which can ultimately lead to the notebook freezing.

Example Dataset

The dataset was made publicly available by 10X Genomics. Use the following command to download the count matrix for this dataset and store it in the data folder:

wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/1M_brain_cells_10X.sparse.h5ad

Example Code

Follow this Jupyter notebook for RAPIDS analysis of this dataset. In order for the notebook to run, the files rapids_scanpy_funcs.py and utils.py need to be in the same folder as the notebook.

We provide a second notebook with the CPU version of this analysis here.

Acceleration

We report the runtime of these notebooks on various GCP instances below. All runtimes are given in seconds. Acceleration is given in parentheses. Benchmarking was performed on Dec 16, 2020. Note: this section is out of date and will be revised shortly.

Step	CPU n1-highmem-64 64 vCPUs	GPU n1-highmem-16 T4 16 GB GPU (Acceleration)	GPU n1-highmem-16 Tesla V100 16 GB GPU (Acceleration)	GPU a2-highgpu-1g Tesla A100 40GB GPU (Acceleration)
Data load + Preprocessing	1120	1125 (1x)	967 (1.2x)	475 (2.4x)
PCA	44	45 (1x)	43 (1x)	17.8 (2.5x)
t-SNE	6509	196 (33x)	50 (130x)	37 (176x)
k-means (single iteration)	148	12.7 (12x)	2.6 (57x)	2 (74x)
KNN	154	141 (1.1x)	92 (1.7x)	62 (2.5x)
UMAP	2571	146 (18x)	32 (80x)	21 (122x)
Louvain clustering	1153	6.1 (189x)	3.9 (296x)	2.4 (480x)
Leiden clustering	6345	5.1 (1244x)	2.7 (2350x)	1.7 (3732x)
Re-analysis of subgroup	255	19.2 (13x)	15 (17x)	17.9 (14.2x)
End-to-end notebook run	18338	1759	1265	686
Price ($/hr)	3.786	1.296	5.906	3.673
Total cost ($)	19.285	0.633	2.075	0.700

Example 3: GPU-based Interactive Visualization of 70,000 Human Lung Cells (beta version)

We demonstrate how to use RAPIDS, Scanpy and Plotly Dash to create an interactive dashboard where we visualize a single-cell RNA-sequencing dataset. Within the interactive dashboard, we can cluster, visualize, and compare any selected groups of cells.

Installation

Additional dependencies are needed for this example. Follow these instructions for conda installation:

conda env create --name rapidgenomics-viz -f conda/rapidgenomics_cuda11.0.viz.yml
conda activate rapidgenomics-viz
python -m ipykernel install --user --display-name "Python (rapidgenomics-viz)"

After installing the necessary dependencies, you can just run jupyter lab.

Example Dataset

The dataset used here is the same as in example 1.

Example Code

Follow this Jupyter notebook to create the interactive visualization. In order for the notebook to run, the files rapids_scanpy_funcs.py and visualize.py need to be in the same folder as the notebook.

Example 4: Droplet Single-cell ATAC-seq of 60K Bone Marrow Cells

We demonstrate the use of RAPIDS to accelerate the analysis of single-cell ATAC-seq data from 60,495 cells. We start with the peak-cell matrix, then perform peak selection, normalization, dimensionality reduction, clustering, and visualization. We also visualize regulatory activity at marker genes and compute differential peaks.

Example Dataset

The dataset is taken from Lareau et al., Nat Biotech 2019. We processed the dataset to include only cells in the 'Resting' condition and peaks with nonzero coverage. Use the following command to download (1) the processed peak-cell count matrix for this dataset (.h5ad), (2) the set of nonzero peak names (.npy), and (3) the cell metadata (.csv), and store them in the data folder:

wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/dsci_resting_nonzeropeaks.h5ad; \
wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/dsci_resting_peaknames_nonzero.npy; \
wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/dsci_resting_cell_metadata.csv

Example Code

Follow this Jupyter notebook for RAPIDS analysis of this dataset. In order for the notebook to run, the files rapids_scanpy_funcs.py and utils.py need to be in the same folder as the notebook.

We provide a second notebook with the CPU version of this analysis here.

Acceleration

We report the runtime of these notebooks on various GCP instances below. All runtimes are given in seconds. Acceleration is given in parentheses. Benchmarking was performed on Dec 16, 2020.

Step	CPU n1-standard-16 16 vCPUs	GPU n1-standard-16 T4 16 GB GPU (Acceleration)	GPU n1-highmem-8 Tesla V100 16 GB GPU (Acceleration)	GPU a2-highgpu-1g Tesla A100 40GB GPU (Acceleration)
PCA	149	146 (1x)	68 (2.2x)	54 (2.8x)
KNN	19.7	19.3 (1x)	5.8 (3.4x)	5.3 (3.7x)
UMAP	69	1.1 (63x)	0.71 (97x)	0.69 (100x)
Louvain clustering	13.1	0.13 (100x)	0.11 (119x)	0.11 (119x)
Leiden clustering	15.7	0.08 (196x)	0.07 (224x)	0.06 (262x)
t-SNE	258	3.2 (81x)	1.5 (172x)	2.2 (117x)
Differential Peak Analysis	135	59 (2.3x)	14.8 (9x)	10.4 (13x)
End-to-end notebook run	682	263	107	92
Price ($/hr)	0.760	1.110	2.953	3.673
Total cost ($)	0.144	0.081	0.110	0.094

Example 5: Visualizing Chromatin Accessibility in 5,000 PBMCs with RAPIDS and AtacWorks (Beta version)

We analyze single-cell ATAC-seq data from 5000 PBMC cells as in example 4. Additionally, we use cuDF to calculate and visualize cluster-specific chromatin accessibility in selected marker regions. Finally, we use a deep learning model trained with AtacWorks, to improve the accuracy of the chromatin accessibility track and call peaks in individual clusters.

Example Data

The dataset was made publicly available by 10X Genomics. Use the following command to download the peak x cell count matrix and the fragment file for this dataset, and store both in the data folder:

wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/5k_pbmcs_10X.sparse.h5ad
wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/5k_pbmcs_10X_fragments.tsv.gz
wget -P <path to this repository>/data https://rapids-single-cell-examples.s3.us-east-2.amazonaws.com/5k_pbmcs_10X_fragments.tsv.gz.tbi

Example Model

We use a pre-trained deep learning model to denoise the chromatin accessibility track and call peaks. This model can be downloaded into the models folder:

wget -P <path to this repository>/models https://api.ngc.nvidia.com/v2/models/nvidia/atac_bulk_lowcov_5m_50m/versions/0.3/files/models/model.pth.tar

Example Code

Follow this Jupyter notebook for GPU analysis of this dataset. In order for the notebook to run, the files utils.py, and coverage.py need to be in the same folder as the notebook.

Adapting these examples to another dataset

For our examples, we stored the count matrix in a sparse .h5ad format. To convert a different count matrix into this format, follow the instructions in this notebook.

rapids-single-cell-examples's People

Contributors

Stargazers

Watchers

Forkers

avantikalal ismms-himc hhuuggoo cjnolet rmovva ylyizhen geovacek kirimaru-jp michael-kotliar zozo123 maxcodextc stjordanis hweej naveenluke deepbody-me ragnardanneskjold genomex frsteiner loucerac slowkow yaqiangcao intron7 asif7adil milad-bastami gkawata jcchai-snu trivialfis chelsea-mar jcchai wmjpillow bradmiro hopesean natallah lyc-1995 genomicsnx asener1 snapbuy yanglq-bio meseretb mmasoud1 ryankim3gilead aaa7260 animesh yqyuhao shokoufeh-monjezi doctormobile lixt314 terrylintingyi ccwu0918 hasihays dhtc andryrajoelimanana ahendriksen bioamber5599 feigeliudan01 zeh-joe cailiangliang765 mduan-gt nine-sarayut wlzhdtk dfgao shubhiambast aquamono ellenketter cytofrank drmasato ncrna

rapids-single-cell-examples's Issues

error in rapids_scanpy_funcs.highly_variable_genes

Hi, when I followd the jupyter notebook and tried use rapids_scanpy_funcs.highly_variable_genes, I came across the error:

AttributeError Traceback (most recent call last)
File :1, in

File ~/autodl-tmp/rapids_scanpy_funcs.py:753, in highly_variable_genes(sparse_gpu_array, genes, n_top_genes)
751 mean = sparse_gpu_array.sum(axis=0).flatten() / n_cells
752 mean_sq = sparse_gpu_array.multiply(sparse_gpu_array).sum(axis=0).flatten() / n_cells
--> 753 variable_genes = _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes)
755 return variable_genes

File ~/autodl-tmp/rapids_scanpy_funcs.py:702, in _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes)
700 df = pd.DataFrame()
701 # Note - can be replaced with cudf once 'cut' is added in 21.08
--> 702 df['genes'] = genes.to_array()
703 df['means'] = mean.tolist()
704 df['dispersions'] = dispersion.tolist()

AttributeError: 'Series' object has no attribute 'to_array'

And all the commands before this one could run well.
Any suggestion? Thank you.

Getting a scalar from an IntervalIndex is not yet supported

I am facing this error when i try to reproduce notebooks/hlca_lung_gpu_analysis-visualization.ipynb , ani ideas how to overcome this error? Had to reduce the data in order to fit in the memory of RTX 2070 super otherwise i get cuda-memory issues. Could that be the reason BTW? My version is there at hlca_lung_gpu_analysis-visualization

[BUG] Many of cuML's models that use cublas/cusolver/cusparse are limited in the number of cells that can be processed

Opening this issue for tracking purposes and linking rapidsai/cuml#2597.

At the time of writing this, the major effect of this bug has been on the PCA step and the LogisticRegression from rank_genes_groups

Bug in preprocess_in_batches

In the preprocess_in_batches function, we filter genes first, before filtering / subsetting cells.
https://github.com/clara-parabricks/rapids-single-cell-examples/blob/master/notebooks/rapids_scanpy_funcs.py#L779

This leads to two (minor) discrepancies -

first, it is different from the CPU version of filtering, where we filter cells first and genes second.
Second, if we are selecting a subset of cells (e.g. 50K, 250K) from the 1M cell dataset, the genes will be filtered based on their occurrence in the whole dataset, instead of only in the subset of selected cells.

gpu memory issue - 8M cell 24G GPU out of memory during umap

Code I used:
sc.tl.umap(ad,min_dist=0.1,method='rapids',neighbors_key='neighbors')

AWS G5 4xlarge 1 GPU 24G GPU memory.

During the process the GPU usage is 4G/24G but at the end, it crashed saying out of memory.

---------------------------------------------------------------------------

MemoryError                               Traceback (most recent call last)
Cell In[11], line 1
----> 1 sc.tl.umap(ad,min_dist=0.1,method='rapids',neighbors_key='neighbors')

File ~/mambaforge/envs/rapids-23.08/lib/python3.10/site-packages/scanpy/tools/_umap.py:237, in umap(adata, min_dist, spread, n_components, maxiter, alpha, gamma, negative_sample_rate, init_pos, random_state, a, b, copy, method, neighbors_key)
    222     X_contiguous = np.ascontiguousarray(X, dtype=np.float32)
    223     umap = UMAP(
    224         n_neighbors=n_neighbors,
    225         n_components=n_components,
   (...)
    235         random_state=random_state,
    236     )
--> 237     X_umap = umap.fit_transform(X_contiguous)
    238 adata.obsm['X_umap'] = X_umap  # annotate samples with UMAP coordinates
    239 logg.info(
    240     '    finished',
    241     time=start,
    242     deep=('added\n' "    'X_umap', UMAP coordinates (adata.obsm)"),
    243 )

File ~/mambaforge/envs/rapids-23.08/lib/python3.10/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    185     set_api_output_dtype(output_dtype)
    187 if process_return:
--> 188     ret = func(*args, **kwargs)
    189 else:
    190     return func(*args, **kwargs)

File ~/mambaforge/envs/rapids-23.08/lib/python3.10/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    391 if hasattr(self, "dispatch_func"):
    392     func_name = gpu_func.__name__
--> 393     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    394 else:
    395     return gpu_func(self, *args, **kwargs)

File ~/mambaforge/envs/rapids-23.08/lib/python3.10/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    188         ret = func(*args, **kwargs)
    189     else:
--> 190         return func(*args, **kwargs)
    192 return cm.process_return(ret)

File base.pyx:665, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:658, in cuml.manifold.umap.UMAP.fit_transform()

File ~/mambaforge/envs/rapids-23.08/lib/python3.10/site-packages/cuml/internals/api_decorators.py:188, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    185     set_api_output_dtype(output_dtype)
    187 if process_return:
--> 188     ret = func(*args, **kwargs)
    189 else:
    190     return func(*args, **kwargs)

File ~/mambaforge/envs/rapids-23.08/lib/python3.10/site-packages/cuml/internals/api_decorators.py:393, in enable_device_interop.<locals>.dispatch(self, *args, **kwargs)
    391 if hasattr(self, "dispatch_func"):
    392     func_name = gpu_func.__name__
--> 393     return self.dispatch_func(func_name, gpu_func, *args, **kwargs)
    394 else:
    395     return gpu_func(self, *args, **kwargs)

File ~/mambaforge/envs/rapids-23.08/lib/python3.10/site-packages/cuml/internals/api_decorators.py:190, in _make_decorator_function.<locals>.decorator_function.<locals>.decorator_closure.<locals>.wrapper(*args, **kwargs)
    188         ret = func(*args, **kwargs)
    189     else:
--> 190         return func(*args, **kwargs)
    192 return cm.process_return(ret)

File base.pyx:665, in cuml.internals.base.UniversalBase.dispatch_func()

File umap.pyx:595, in cuml.manifold.umap.UMAP.fit()

MemoryError: std::bad_alloc: out_of_memory: CUDA error at: /home/ec2-user/mambaforge/envs/rapids-23.08/include/rmm/mr/device/cuda_memory_resource.hpp

can't run inference with atacworks pretrained model within the container on A100

Hi!
I'm trying to run this notebook https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-examples/blob/master/notebooks/5k_pbmc_coverage_gpu.ipynb within the container https://hub.docker.com/r/claraparabricks/single-cell-examples_rapids_cuda11.0 on A100 GPU.

Everything works until executing this line

atacworks_results = coverage.atacworks_denoise(noisy_coverage, model, gpu, interval_size)

which gives error:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<timed exec> in <module>

~/run_singlecell_rapids/rapids-single-cell-examples/notebooks/coverage.py in atacworks_denoise(coverage, model, gpu, interval_size, pad)
    353         input_arr = torch.tensor(input_arr, dtype=float)
    354         input_arr = input_arr.unsqueeze(1)
--> 355         input_arr = input_arr.cuda(gpu, non_blocking=True).float()
    356         # Run model inference
    357         pred = model(input_arr)

RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I tried to update the pytorch installation in the container by

conda install pytorch torchvision torchaudio pytorch-cuda=11.6 -c pytorch -c nvidia

But it takes forever to solving environment. Any idea? Thanks!

CuSparseError: CUSPARSE_STATUS_ALLOC_FAILED

Getting the error:

CuSparseError Traceback (most recent call last)
in

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csr.py in tocsc(self, copy)
254 """
255 # copy is ignored
--> 256 return cusparse.csr2csc(self)
257
258 def tocsr(self, copy=False):

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cupy/cusparse.py in csr2csc(x)
648 data.data.ptr, indices.data.ptr, indptr.data.ptr,
649 cusparse.CUSPARSE_ACTION_NUMERIC,
--> 650 cusparse.CUSPARSE_INDEX_BASE_ZERO)
651 return cupyx.scipy.sparse.csc_matrix(
652 (data, indices, indptr), shape=x.shape)

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cupy/cusparse.py in _call_cusparse(name, dtype, *args)
61 raise TypeError
62 f = getattr(cusparse, prefix + name)
---> 63 return f(*args)
64
65

cupy/cuda/cusparse.pyx in cupy.cuda.cusparse.scsr2csc()

cupy/cuda/cusparse.pyx in cupy.cuda.cusparse.scsr2csc()

cupy/cuda/cusparse.pyx in cupy.cuda.cusparse.check_status()

CuSparseError: CUSPARSE_STATUS_ALLOC_FAILED

after the block:
%%time
tmp_norm = normalized.tocsc()
ACE2_raw = tmp_norm[:, genes[genes == "ACE2"].index[0]].todense().ravel()
TMPRSS2_raw = tmp_norm[:, genes[genes == "TMPRSS2"].index[0]].todense().ravel()
EPCAM_raw = tmp_norm[:, genes[genes == "EPCAM"].index[0]].todense().ravel()

del tmp_norm

This is using all of the code in the Jupyter notebook. (Got the same error in Spyder and Jupyter notebook.)

A few quick questions that may relate to this:

the max I can set the POOL_SIZE_GB is 6. (1080 Ti). Can you explain a bit more how to set this appropriately?
Is there a reason this code is present twice within 4 blocks of code?

input_file = "krasnow_hlca_10x_UMIs.sparse.h5ad" (i changed to my local path).

Thanks in advance!

[FEA] Add batching option to `filter_cells` and `filter_genes` in `rapids_scanpy_funcs`

Because the cusparse API uses 32-bit integers to specify the size of the underlying workspaces in GPU memory, and because the Scipy/Cupy sparse APIs use them to specify the size of the underlying matrices, very large datasets run into problems during the filtering of cells and genes. We can get around this constraint in two ways- we can chunk the data across different GPUs using Dask or we can batch the filters on a single GPU.

We should do this specifically for the 1M cells notebook, so that we can remove the on_device argument.

Residual GPU Memory usage

hi! i am trying to use the scanpy rapids functions to run multiple parallel operations on a server.

the problem i am running into is that after running any scanpy function with rapids enabled, there is some residual memory usage after the function call has ended, and I am assuming this is either because of a memory leak, or because the result itself is stored on the gpu.

during scanpy.tl.neighbors + scanpy.tl.umap call:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   51C    P0    60W /  70W |   7613MiB / 15109MiB |      100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

post function run:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
| N/A   51C    P0    35W /  70W |   1564MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

we arent running any gpu load besides the umap function, and idle memory usage is ~75Mib.

happy to elaborate more + help find a fix for this. not sure if i am missing something really easy (maybe a cupy.asnumpy somewhere?), so any info would be super helpful!

conda import environment not working

When I try creating a conda env from the provided yml file, I get an error report that ends with the following:

File "C:\Users\tnnan\anaconda3\lib\site-packages\ruamel_yaml\scanner.py", line 651, in fetch_value
raise ScannerError(
ruamel_yaml.scanner.ScannerError: mapping values are not allowed here
in "", line 28, column 66:
... le" content="{"groups": [], "environmentKey" ...
^ (line: 28)

Is this something that has been observed and resolved before?

Illegal Memory Access: 1.3M cells RTX A6000 48GB out of memory on scaling step

Hello! I am currently trying to run the 1.3M mouse brain example notebook on my local server with a RTX A6000. On the scaling step where the cupy mean operation is run, the system runs out of VRAM due to the sparse_gpu_array already taking up ~41GB of VRAM. I was wondering how you got this step to run on 16GB cards? Is there a way to batch the scaling? Or perhaps off load it to CPU and load the final sparse matrix back on to the GPU?

Thank you in advance!
Joe

jupyter notebooks difference between in the repo and within docker container

Hi!
The jupyter notebooks in the repo here https://github.com/NVIDIA-Genomics-Research/rapids-single-cell-examples/tree/master/notebooks are not exactly the same as the ones within the container https://hub.docker.com/r/claraparabricks/single-cell-examples_rapids_cuda11.0 (which has more comments). Which version is better? Thanks!

OverflowError: value too large to convert to int

Could I ask if you might have any tips on how to overcome this error?

I'm running your 1M cell code, but I tried it on my own set of 2.8M cells.

Here's my matrix:

sparse_gpu_array.shape
# (2886934, 33567)

sparse_gpu_array.nnz
# 4128695018

Let's try to run this:

sparse_gpu_array, genes = rapids_scanpy_funcs.filter_genes(sparse_gpu_array, genes, min_cells=1000)

---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<timed exec> in <module>

~/work/github.com/slowkow/rapids-single-cell-examples/notebooks/rapids_scanpy_funcs.py in filter_genes(sparse_gpu_array, genes_idx, min_cells)
    269         Genes containing a number of cells below this value will be filtered
    270     """
--> 271     thr = np.asarray(sparse_gpu_array.sum(axis=0) >= min_cells).ravel()
    272     filtered_genes = cp.sparse.csr_matrix(sparse_gpu_array[:, thr])
    273     genes_idx = genes_idx[np.where(thr)[0]]

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in sum(self, axis, dtype, out)
    388 
    389         if axis == 0:
--> 390             ret = self.T.dot(cupy.ones(m, dtype=self.dtype)).reshape(1, n)
    391         else:  # axis == 1
    392             ret = self.dot(cupy.ones(n, dtype=self.dtype)).reshape(m, 1)

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/base.py in dot(self, other)
    307     def dot(self, other):
    308         """Ordinary dot product"""
--> 309         return self * other
    310 
    311     def getH(self):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in __mul__(self, other)
    111                 return self._with_data(self.data * other)
    112             elif other.ndim == 1:
--> 113                 self.sum_duplicates()
    114                 if cusparse.check_availability('csrmv'):
    115                     csrmv = cusparse.csrmv

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/compressed.py in sum_duplicates(self)
    333             self._has_canonical_format = True
    334             return
--> 335         coo = self.tocoo()
    336         coo.sum_duplicates()
    337         self.__init__(coo.asformat(self.format))

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csc.py in tocoo(self, copy)
    214 
    215         """
--> 216         return self.T.tocoo(copy).T
    217 
    218     def tocsc(self, copy=None):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupyx/scipy/sparse/csr.py in tocoo(self, copy)
    268             indices = self.indices
    269 
--> 270         return cusparse.csr2coo(self, data, indices)
    271 
    272     def tocsc(self, copy=False):

~/.conda/envs/rapidgenomics/lib/python3.7/site-packages/cupy/cusparse.py in csr2coo(x, data, indices)
    900     cusparse.xcsr2coo(
    901         handle, x.indptr.data.ptr, nnz, m, row.data.ptr,
--> 902         cusparse.CUSPARSE_INDEX_BASE_ZERO)
    903     # data and indices did not need to be copied already
    904     return cupyx.scipy.sparse.coo_matrix(

cupy/cuda/cusparse.pyx in cupy.cuda.cusparse.xcsr2coo()

OverflowError: value too large to convert to int

ImportError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short

Hello RAPIDS,
Thanks for developing this amazing package.
I met 'ImportError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short' when importing packages.

import scanpy as sc
import anndata

import time
import os, wget

import cudf
import cupy as cp

from cuml.decomposition import PCA
from cuml.manifold import TSNE
from cuml.cluster import KMeans
from cuml.preprocessing import StandardScaler

import rapids_scanpy_funcs

import warnings
warnings.filterwarnings('ignore', 'Expected ')
warnings.simplefilter('ignore')

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-40576262d873> in <module>
      5 import os, wget
      6 
----> 7 import cudf
      8 import cupy as cp
      9 

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/__init__.py in <module>
      2 from cudf.utils.gpu_utils import validate_setup
      3 
----> 4 validate_setup()
      5 
      6 import cupy

/opt/conda/envs/rapids/lib/python3.7/site-packages/cudf/utils/gpu_utils.py in validate_setup()
     16     import warnings
     17 
---> 18     from rmm._cuda.gpu import (
     19         CUDARuntimeError,
     20         cudaDeviceAttr,

/opt/conda/envs/rapids/lib/python3.7/site-packages/rmm/__init__.py in <module>
     14 import weakref
     15 
---> 16 from rmm import mr
     17 from rmm._lib.device_buffer import DeviceBuffer
     18 from rmm._version import get_versions

/opt/conda/envs/rapids/lib/python3.7/site-packages/rmm/mr.py in <module>
      1 # Copyright (c) 2020, NVIDIA CORPORATION.
----> 2 from rmm._lib.memory_resource import (
      3     BinningMemoryResource,
      4     CudaAsyncMemoryResource,
      5     CudaMemoryResource,

/opt/conda/envs/rapids/lib/python3.7/site-packages/rmm/_lib/__init__.py in <module>
      1 # Copyright (c) 2019-2020, NVIDIA CORPORATION.
      2 
----> 3 from .device_buffer import DeviceBuffer

ImportError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short

I googled it. Some people recommend nvidia-docker2. I tried, but failed.

nvidia-docker run -v /home/hyjforesight/:/data -p 8888:8888 -p 8787:8787 -p 8786:8786 claraparabricks/single-cell-examples_rapids_cuda11.0
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

Could you please help me with this issue? I'm working in Windows 11 22H2 WSL2 Ubuntu 22.04.
Thanks!
Best,
Yuanjian

cell number limit: data point for 10M and 70M dataset

hardware: 24G memory GPU
10M dataset: 30min for KNN, 30min for UMAP, 3min for Leiden
70M dataset: not sure if it is feasible due to memory limit - 15h for KNN and still running.

is there any solution/tip for >10M dataset?

The results consistency of RAPIDS implementation

Is there anyone to compare the consistency of results generated by scanpy pipeline and RAPIDS implementation? Such as the consistency of HVG genes?

error of read_with_filter function

when I filter cell with funcion read_with_filter function wieh dataset krasnow_hlca_10x.sparse.h5ad, no cell filtered with different parameters. But it worked normally when used function scanpy.pp.filter_cells

CUDA MemoryError for loading adata variable names to cudf.Series method

Hi Team,

I'm getting CUDA memory error when I call cudf.Series method from the hlca_lung_gpu_analysis-visualization notebook.

Error Log:

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/utils/utils.py in pyarrow_buffer_to_cudf_buffer(arrow_buf, mask_size)
    157     try:
--> 158         arrow_cuda_buf = arrowCudaBuffer.from_buffer(arrow_buf)
    159         buf = Buffer(

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/pyarrow/_cuda.pyx in pyarrow._cuda.CudaBuffer.from_buffer()

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.pyarrow_internal_check_status()

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowTypeError: buffer is not backed by a CudaBuffer

During handling of the above exception, another exception occurred:

MemoryError                               Traceback (most recent call last)
<timed exec> in <module>

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/core/series.py in __init__(self, data, index, dtype, name, nan_as_null)
    186 
    187         if not isinstance(data, column.ColumnBase):
--> 188             data = column.as_column(data, nan_as_null=nan_as_null, dtype=dtype)
    189 
    190         if index is not None and not isinstance(index, Index):

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/core/column/column.py in as_column(arbitrary, nan_as_null, dtype, length)
   1552         elif arb_dtype.kind in ("O", "U"):
   1553             data = as_column(
-> 1554                 pa.Array.from_pandas(arbitrary), dtype=arbitrary.dtype
   1555             )
   1556             # There is no cast operation available for pa.Array from int to

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/core/column/column.py in as_column(arbitrary, nan_as_null, dtype, length)
   1352     elif isinstance(arbitrary, pa.Array):
   1353         if isinstance(arbitrary, pa.StringArray):
-> 1354             data = cudf.core.column.StringColumn.from_arrow(arbitrary)
   1355         elif isinstance(arbitrary, pa.NullArray):
   1356             if type(dtype) == str and dtype == "empty":

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/core/column/string.py in from_arrow(cls, array)
   4515     @classmethod
   4516     def from_arrow(cls, array):
-> 4517         pa_size, pa_offset, nbuf, obuf, sbuf = buffers_from_pyarrow(array)
   4518         children = (
   4519             column.build_column(data=obuf, dtype="int32"),

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/utils/utils.py in buffers_from_pyarrow(pa_arr)
    129 
    130     if buffers[1]:
--> 131         padata = pyarrow_buffer_to_cudf_buffer(buffers[1])
    132     else:
    133         padata = Buffer.empty(0)

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/utils/utils.py in pyarrow_buffer_to_cudf_buffer(arrow_buf, mask_size)
    172             dbuf.copy_from_host(np.asarray(arrow_buf).view("u1"))
    173             return Buffer(dbuf)
--> 174         return Buffer(arrow_buf)
    175 
    176 

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/core/buffer.py in __init__(self, data, size, owner)
     55             except TypeError:
     56                 raise TypeError("data must be Buffer, array-like or integer")
---> 57             self._init_from_array_like(np.asarray(data), owner)
     58 
     59     def __len__(self):

~/anaconda3/envs/rapidgenomics/lib/python3.7/site-packages/cudf/core/buffer.py in _init_from_array_like(self, data, owner)
     95                 data.__array_interface__
     96             )
---> 97             dbuf = DeviceBuffer(ptr=ptr, size=size)
     98             self._init_from_array_like(dbuf, owner)
     99         else:

rmm/_lib/device_buffer.pyx in rmm._lib.device_buffer.DeviceBuffer.__cinit__()

MemoryError: std::bad_alloc: CUDA error at: ../include/rmm/mr/device/managed_memory_resource.hpp:72: cudaErrorIllegalAddress an illegal memory access was encountered

Please find the SW details,
Conda Version = conda 4.8.5
Yaml - rapidgenomics_cuda10.1.yml
CUDA on disk - cuda-10.1
Ubuntu - 18.04

Regards,
Jegathesan S

rapids for pyscenic

Hi, the rapids-single cell workflow is excellent!
Do your team have any plans to write pyscenic into rapids in the future?
Now pyscenic makes use of CPU and is slow in ordinary desktop computer.
I think it would be wonderful to make pyscenic work with rapids.

RuntimeError: exception occured! when analysis using multigpu

when I analysis 1 million cells collected by myself using multigpu, step of dask_sparse_arr.compute_chunk_sizes() is error. Error information is below:

cuml/linear_model/linear_regression.pyx in cuml.linear_model.linear_regression.LinearRegression.fit()
RuntimeError: exception occured! file=_deps/raft-src/cpp/include/raft/linalg/eig.cuh line=144: eig.cuh: eigensolver couldn't converge to a solution. This usually occurs when some of the features do not vary enough.
Obtained 64 stack frames

VRAM and cell numbers

Can you comment on the amount of cells that can be processed as a function of VRAM? At 11GB, it seems like I'm running into memory limits with bigger samples. (The tutorial ran fine).

What is the max number of cells that you have tested?

fix `rapids_scanpy_funcs.highly_variable_genes`

In /notebooks/hlca_lung_gpu_analysis.ipynb, when run cell 19

%%time
hvg = rapids_scanpy_funcs.highly_variable_genes(sparse_gpu_array, genes, n_top_genes=5000)

it raises error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<timed exec> in <module>

/workspace/notebooks/rapids_scanpy_funcs.py in highly_variable_genes(sparse_gpu_array, genes, n_top_genes)
    801     mean = sparse_gpu_array.sum(axis=0).flatten() / n_cells
    802     mean_sq = sparse_gpu_array.multiply(sparse_gpu_array).sum(axis=0).flatten() / n_cells
--> 803     variable_genes = _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes)
    804 
    805     return variable_genes

/workspace/notebooks/rapids_scanpy_funcs.py in _cellranger_hvg(mean, mean_sq, genes, n_cells, n_top_genes)
    750     df = pd.DataFrame()
    751     # Note - can be replaced with cudf once 'cut' is added in 21.08
--> 752     df['genes'] = genes.to_numpy()
    753     df['means'] = mean.tolist()
    754     df['dispersions'] = dispersion.tolist()

AttributeError: 'Series' object has no attribute 'to_numpy'

Could you fix the rapids_scanpy_funcs.highly_variable_genes?

import rapids_scanpy_funcs

Looking forward to trying this!

I'm well-versed in Python/Scanpy/etc. but I wasn't able to figure out how to install the rapids_scanpy_funcs so that they would import in the notebook.

Tried both the recommended creation of an environment using the yml file and that seemed to work fine except for the error with the rapids functions import. I also tried with my normal environment and scanpy after installing cuda, rapids, etc. The other imports were fine until this:

import rapids_scanpy_funcs

Thanks in advance and apologies if I'm overlooking something simple.

weird UMAP results and very inconsistent with CPU results

Sometimes I find the results from rapids KNN and UMAP very weird. The majoy data points are in center and super small (see below). At the beginning I thought it is the data issue. Today I ran the same data, same parameters with rapids on and off. It shows totally different results and without rapids it looks like a normal UMAP. It seems the UMAP could break with rapids version with some data.

RAPIDS implementation of Scanpy rank_genes_groups appears incorrect

I tried running the RAPIDS implementation of rank_genes_groups alongside the Scanpy CPU implementation on the same data matrix, but I'm getting very different results.

Here's my code for the GPU call:

cluster_labels = cudf.Series.from_categorical(adata.obs["louvain"].cat)
var_names = cudf.Series(var_names)
dense_gpu_array = cp.array(adata_raw.X.todense())

scores, names, reference = rapids_scanpy_funcs.rank_genes_groups(
    dense_gpu_array,
    cluster_labels, 
    var_names, 
    n_genes=n_top_diff_peaks, groups='all', reference='rest')

And the CPU call:

adata_raw.obs['louvain'] = adata.obs['louvain'].tolist()
sc.tl.rank_genes_groups(adata_raw, 
                       groupby="louvain", 
                       n_genes=n_top_diff_peaks, 
                       groups='all', 
                       reference='rest',
                       method='logreg'
                       )

When I look at the top differential gene for each cluster, the outputs reported by the GPU and CPU are disjoint. Also, I note that while the CPU output is sorted by score (i.e., the top 50 diff. genes have high scores, and are sorted in decreasing order), the GPU output seems to be unsorted, and some of the scores are very low. My suspicion is that the GPU output isn't actually being properly sorted by logistic regression coefficient, so the output is just some random set of differential genes & their scores instead of the top N.

When I scatterplot the results, the CPU results also seem to make much more sense than the GPU.

TypeError: rank_genes_groups() got multiple values for argument 'groups'

I am running the Example 4: Droplet Single-cell ATAC-seq of 60K Bone Marrow Cells demo data in a GPU cluster in Google Cloud. All the codes ran perfectly well around a month ago. Now, I am re-running the exact codes again with the exact same demo data (Example 4), but I am encountering TypeError: rank_genes_groups() got multiple values for argument 'groups' during "Find Differential Peaks" step. I am attaching the screenshot here. Can anybody help why this is happening now and how I can clear this error?

Colab implementation

Thanks for creating the notebooks!

I was interested in trying out the examples with an easy access to GPU. Are there plans to support Colab-friendly notebooks?

I put up a small example using the PBMC dataset here (this one being a small dataset is probably not the best to demonstrate the advantages of a GPU implementation though): https://colab.research.google.com/drive/1yHzkDfpqHOlX8jA3DqFTQNFyeubrt-YD?usp=sharing

CUDARuntimeError in Notebooks

Hello everyone,

I'm having trouble running the notebooks on our institutes server (64 Core Epyc and 2 Quadro RTX 6000), however when I'm at home running them on my personal computer (AMD 5950x and RTX 3090) the notebooks run perfectly. If I run the 1M Brain GPU notebook it crashes once it reaches the sparse_gpu_array = cp.sparse.csr_matrix(adata.X[:USE_FIRST_N_CELLS], dtype=cp.float32) line

---------------------------------------------------------------------------
CUDARuntimeError                          Traceback (most recent call last)
<timed exec> in <module>

~/conda/rapids-0.18-10/lib/python3.8/site-packages/cupyx/scipy/sparse/compressed.py in __init__(self, arg1, shape, dtype, copy)
    351             x = arg1.asformat(self.format)
    352             data = cupy.array(x.data)
--> 353             indices = cupy.array(x.indices, dtype='i')
    354             indptr = cupy.array(x.indptr, dtype='i')
    355             copy = False

~/conda/rapids-0.18-10/lib/python3.8/site-packages/cupy/_creation/from_data.py in array(obj, dtype, copy, order, subok, ndmin)
     39 
     40     """
---> 41     return core.array(obj, dtype, copy, order, subok, ndmin)
     42 
     43 

cupy/core/core.pyx in cupy.core.core.array()

cupy/core/core.pyx in cupy.core.core.array()

cupy/core/core.pyx in cupy.core.core._send_object_to_gpu()

cupy/core/core.pyx in cupy.core.core._alloc_async_transfer_buffer()

cupy/core/core.pyx in cupy.core.core._alloc_async_transfer_buffer()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.alloc_pinned_memory()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemoryPool.malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory._malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory._malloc()

cupy/cuda/pinned_memory.pyx in cupy.cuda.pinned_memory.PinnedMemory.__init__()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.hostAlloc()

cupy_backends/cuda/api/runtime.pyx in cupy_backends.cuda.api.runtime.check_status()

CUDARuntimeError: cudaErrorOperatingSystem: OS call failed or operation not supported on this OS

I can produce the same error when I run the hlca gpu notebook during adata.obsm["X_pca"] = PCA(n_components=n_components, output_type="numpy").fit_transform(adata.X), if I wait some time after the scaling and before the PCA step. If submit the whole notebook at once I don't get any issues on the server.
So far I tested these notebooks with rapids-0.18 for CUDAtoolkit 10.1 and 11.0.
What is the issue here and how can I fix it? I am also confused since both the Quadro RTX 6000 and the RTX3090 have 24GB of VRAM. Could this be an issue with the memory allocation with rmm? Thank you for your help.

csr_matrix with the .get method will return "Last value of index pointer should be less than the size of index and data arrays"

import scanpy as sc
import anndata
import time
import os,wget
import cudf
import cupy as cp
from cuml.decomposition import PCA
from cuml.manifold import TSNE
from cuml.cluster import KMeans
from cuml.preprocessing import StandardScaler
from matplotlib import pyplot as plt
import warnings
warnings.filterwarnings('ignore', 'Expected ')
warnings.simplefilter('ignore')
import rmm

rmm.reinitialize(
    managed_memory=True, # Allows oversubscription
    pool_allocator=False, # default is False
    devices=0, # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm.rmm_cupy_allocator)

MT_GENE_PREFIX = "MT-" # Prefix for mitochondria genes to regress out
markers = ["ACE2", "TMPRSS2", "EPCAM"] # Marker genes for visualization
# filtering cells
min_genes_per_cell = 1 # Filter out cells with fewer genes than this expressed
max_genes_per_cell = 100000 # Filter out cells with more genes than this expressed
pt_max = 1
# filtering genes
min_cells_per_gene = 1 # Filter out genes expressed in fewer cells than this
n_top_genes = 2000 # Number of highly variable genes to retain
# PCA
n_components = 50 # Number of principal components to compute
# t-SNE
tsne_n_pcs = 20 # Number of principal components to use for t-SNE
# KNN
n_neighbors = 15 # Number of nearest neighbors for KNN graph
knn_n_pcs = 50 # Number of principal components to use for finding nearest neighbors
# UMAP
umap_min_dist = 0.3
umap_spread = 1.0
# Gene ranking
ranking_n_top_genes = 50

adata = sc.read('/rapids_clara/c952.diff_PRO.h5ad')

genes = cudf.Series(adata.var_names)
sparse_gpu_array=cp.sparse.csr_matrix(adata.raw.X)
sparse_gpu_array[1350000:1360000].get()

When I use .get() for the csr_matrix(sparse_gpu_array), an error shown as follows:

ValueError                                Traceback (most recent call last)
Input In [94], in <cell line: 1>()
----> 1 sparse_gpu_array[1350000:1360000].get()

File /opt/conda/envs/rapids/lib/python3.9/site-packages/cupyx/scipy/sparse/csr.py:73, in csr_matrix.get(self, stream)
     71 indices = self.indices.get(stream)
     72 indptr = self.indptr.get(stream)
---> 73 return scipy.sparse.csr_matrix(
     74     (data, indices, indptr), shape=self._shape)

File /opt/conda/envs/rapids/lib/python3.9/site-packages/scipy/sparse/_compressed.py:106, in _cs_matrix.__init__(self, arg1, shape, dtype, copy)
    103 if dtype is not None:
    104     self.data = self.data.astype(dtype, copy=False)
--> 106 self.check_format(full_check=False)

File /opt/conda/envs/rapids/lib/python3.9/site-packages/scipy/sparse/_compressed.py:178, in _cs_matrix.check_format(self, full_check)
    176     raise ValueError("indices and data should have the same size")
    177 if (self.indptr[-1] > len(self.indices)):
--> 178     raise ValueError("Last value of index pointer should be less than "
    179                      "the size of index and data arrays")
    181 self.prune()
    183 if full_check:
    184     # check format validity (more expensive)

ValueError: Last value of index pointer should be less than the size of index and data arrays

However, if I set the interval with [1000:1360000], it will not feedback with any error. The original h5ad file is about 20GiB. And the shape for sparse_gpu_array is (1462703, 27610). Why it will show error for this special interval?

Nearest neighbors graph computation crashes

Hi,

I am trying the hlca_lung_gpu notebook with my own data to test the dimensionality reduction on GPU. The notebook is running well (data loading, pca...) until the nearest neighbors computation

Here is the traceback:

sc.pp.neighbors(adata, n_neighbors=n_neighbors, n_pcs=knn_n_pcs, method='rapids')
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
    743     try:
--> 744         yield
    745     except NumbaError as e:

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/lowering.py in lower_block(self, block)
    229                                    loc=self.loc, errcls_=defaulterrcls):
--> 230                 self.lower_inst(inst)
    231         self.post_block(block)

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/lowering.py in lower_inst(self, inst)
    327             val = self.lower_assign(ty, inst)
--> 328             self.storevar(val, inst.target.name)
    329 

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/lowering.py in storevar(self, value, name)
   1277                                                           name=name)
-> 1278             raise AssertionError(msg)
   1279 

AssertionError: Storing i64 to ptr of i32 ('dim'). FE type int32

During handling of the above exception, another exception occurred:

LoweringError                             Traceback (most recent call last)
<timed eval> in <module>

~/miniconda3/envs/rapids/lib/python3.7/site-packages/scanpy/neighbors/__init__.py in neighbors(adata, n_neighbors, n_pcs, use_rep, knn, random_state, method, metric, metric_kwds, key_added, copy)
    120         n_neighbors=n_neighbors, knn=knn, n_pcs=n_pcs, use_rep=use_rep,
    121         method=method, metric=metric, metric_kwds=metric_kwds,
--> 122         random_state=random_state,
    123     )
    124 

~/miniconda3/envs/rapids/lib/python3.7/site-packages/scanpy/neighbors/__init__.py in compute_neighbors(self, n_neighbors, knn, n_pcs, use_rep, method, random_state, write_knn_indices, metric, metric_kwds)
    744                 knn_distances,
    745                 self._adata.shape[0],
--> 746                 self.n_neighbors,
    747             )
    748         # overwrite the umap connectivities if method is 'gauss'

~/miniconda3/envs/rapids/lib/python3.7/site-packages/scanpy/neighbors/__init__.py in _compute_connectivities_umap(knn_indices, knn_dists, n_obs, n_neighbors, set_op_mix_ratio, local_connectivity)
    345     fuzzy simplicial sets into a global one via a fuzzy union.
    346     """
--> 347     from umap.umap_ import fuzzy_simplicial_set
    348 
    349     X = coo_matrix(([], ([], [])), shape=(n_obs, 1))

~/miniconda3/envs/rapids/lib/python3.7/site-packages/umap/__init__.py in <module>
----> 1 from .umap_ import UMAP
      2 
      3 # Workaround: https://github.com/numba/numba/issues/3341
      4 import numba
      5 

~/miniconda3/envs/rapids/lib/python3.7/site-packages/umap/umap_.py in <module>
     52 from umap.spectral import spectral_layout
     53 from umap.utils import deheap_sort, submatrix
---> 54 from umap.layouts import (
     55     optimize_layout_euclidean,
     56     optimize_layout_generic,

~/miniconda3/envs/rapids/lib/python3.7/site-packages/umap/layouts.py in <module>
     34         "result": numba.types.float32,
     35         "diff": numba.types.float32,
---> 36         "dim": numba.types.int32,
     37     },
     38 )

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/decorators.py in wrapper(func)
    219             with typeinfer.register_dispatcher(disp):
    220                 for sig in sigs:
--> 221                     disp.compile(sig)
    222                 disp.disable_compile()
    223         return disp

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, sig)
    907                 with ev.trigger_event("numba:compile", data=ev_details):
    908                     try:
--> 909                         cres = self._compiler.compile(args, return_type)
    910                     except errors.ForceLiteralArg as e:
    911                         def folded(args, kws):

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, args, return_type)
     77 
     78     def compile(self, args, return_type):
---> 79         status, retval = self._compile_cached(args, return_type)
     80         if status:
     81             return retval

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_cached(self, args, return_type)
     91 
     92         try:
---> 93             retval = self._compile_core(args, return_type)
     94         except errors.TypingError as e:
     95             self._failed_cache[key] = e

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_core(self, args, return_type)
    109                                       args=args, return_type=return_type,
    110                                       flags=flags, locals=self.locals,
--> 111                                       pipeline_class=self.pipeline_class)
    112         # Check typing error if object mode is used
    113         if cres.typing_error is not None and not flags.enable_pyobject:

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    604     pipeline = pipeline_class(typingctx, targetctx, library,
    605                               args, return_type, flags, locals)
--> 606     return pipeline.compile_extra(func)
    607 
    608 

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(self, func)
    351         self.state.lifted = ()
    352         self.state.lifted_from = None
--> 353         return self._compile_bytecode()
    354 
    355     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler.py in _compile_bytecode(self)
    413         """
    414         assert self.state.func_ir is None
--> 415         return self._compile_core()
    416 
    417     def _compile_ir(self):

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    393                 self.state.status.fail_reason = e
    394                 if is_final_pipeline:
--> 395                     raise e
    396         else:
    397             raise CompilerError("All available pipelines exhausted")

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    384             res = None
    385             try:
--> 386                 pm.run(self.state)
    387                 if self.state.cr is not None:
    388                     break

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    337                     (self.pipeline_name, pass_desc)
    338                 patched_exception = self._patch_error(msg, e)
--> 339                 raise patched_exception
    340 
    341     def dependency_analysis(self):

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    328                 pass_inst = _pass_registry.get(pss).pass_inst
    329                 if isinstance(pass_inst, CompilerPass):
--> 330                     self._runPass(idx, pass_inst, state)
    331                 else:
    332                     raise BaseException("Legacy pass in use")

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     33         def _acquire_compile_lock(*args, **kwargs):
     34             with self:
---> 35                 return func(*args, **kwargs)
     36         return _acquire_compile_lock
     37 

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler_machinery.py in _runPass(self, index, pss, internal_state)
    287             mutated |= check(pss.run_initialization, internal_state)
    288         with SimpleTimer() as pass_time:
--> 289             mutated |= check(pss.run_pass, internal_state)
    290         with SimpleTimer() as finalize_time:
    291             mutated |= check(pss.run_finalizer, internal_state)

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/compiler_machinery.py in check(func, compiler_state)
    260 
    261         def check(func, compiler_state):
--> 262             mangled = func(compiler_state)
    263             if mangled not in (True, False):
    264                 msg = ("CompilerPass implementations should return True/False. "

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    461 
    462         # TODO: Pull this out into the pipeline
--> 463         NativeLowering().run_pass(state)
    464         lowered = state['cr']
    465         signature = typing.signature(state.return_type, *state.args)

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    382                 lower = lowering.Lower(targetctx, library, fndesc, interp,
    383                                        metadata=metadata)
--> 384                 lower.lower()
    385                 if not flags.no_cpython_wrapper:
    386                     lower.create_cpython_wrapper(flags.release_gil)

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/lowering.py in lower(self)
    134         if self.generator_info is None:
    135             self.genlower = None
--> 136             self.lower_normal_function(self.fndesc)
    137         else:
    138             self.genlower = self.GeneratorLower(self)

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/lowering.py in lower_normal_function(self, fndesc)
    188         # Init argument values
    189         self.extract_function_arguments()
--> 190         entry_block_tail = self.lower_function_body()
    191 
    192         # Close tail of entry block

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/lowering.py in lower_function_body(self)
    214             bb = self.blkmap[offset]
    215             self.builder.position_at_end(bb)
--> 216             self.lower_block(block)
    217         self.post_lower()
    218         return entry_block_tail

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/lowering.py in lower_block(self, block)
    228             with new_error_context('lowering "{inst}" at {loc}', inst=inst,
    229                                    loc=self.loc, errcls_=defaulterrcls):
--> 230                 self.lower_inst(inst)
    231         self.post_block(block)
    232 

~/miniconda3/envs/rapids/lib/python3.7/contextlib.py in __exit__(self, type, value, traceback)
    128                 value = type()
    129             try:
--> 130                 self.gen.throw(type, value, traceback)
    131             except StopIteration as exc:
    132                 # Suppress StopIteration *unless* it's the same exception that

~/miniconda3/envs/rapids/lib/python3.7/site-packages/numba/core/errors.py in new_error_context(fmt_, *args, **kwargs)
    749         newerr = errcls(e).add_context(_format_msg(fmt_, args, kwargs))
    750         tb = sys.exc_info()[2] if numba.core.config.FULL_TRACEBACKS else None
--> 751         raise newerr.with_traceback(tb)
    752 
    753 

LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Storing i64 to ptr of i32 ('dim'). FE type int32

File "../../../../../home/egilson/miniconda3/envs/rapids/lib/python3.7/site-packages/umap/layouts.py", line 52:
def rdist(x, y):
    <source elided>
    result = 0.0
    dim = x.shape[0]
    ^

During: lowering "dim = static_getitem(value=$8load_attr.2, index=0, index_var=$const10.3, fn=<built-in function getitem>)" at /home/egilson/miniconda3/envs/rapids/lib/python3.7/site-packages/umap/layouts.py (52)

change of cuml library

the recent upgrade cuml has moved cuml.common.memory_utils to cuml.internals.memory_utils

Cannot open UCX library: (null) in PCA analysis of 1M_brain_gpu_analysis_multigpu

Dear authors,
Thank you for share the valuable pipelines to use GPU in single cell genomics data analysis. When I go through the 1M_brain_gpu_analysis_multigpu pipeline you have provided, I meet a problem in the step of PCA analysis of dask_sparse_arr data。 Please check the below.

%%time
from cuml.dask.decomposition import PCA
pca_data = PCA(n_components=50).fit_transform(dask_sparse_arr)
pca_data.compute_chunk_sizes()

output

2022-11-12 16:17:05,992 - distributed.worker - WARNING - Run Failed
Function: _func_init_all
args: (b"'{\x19\xf3'\x1dB\x81\x93\xaeya!\x86q6", b'\x02\x00\xb1\x0c\xac\x15\x1d:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', False, {'tcp://127.0.0.1:33644': {'rank': 0}, 'tcp://127.0.0.1:42241': {'rank': 1}}, False, 0)
kwargs: {}
Traceback (most recent call last):
File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/worker.py", line 3160, in run
result = await function(*args, **kwargs)
File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 459, in _func_init_all
_func_build_handle(sessionId, streams_per_handle, verbose)
File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 559, in _func_build_handle
inject_comms_on_handle_coll_only(
File "comms_utils.pyx", line 264, in raft_dask.common.comms_utils.inject_comms_on_handle_coll_only
RuntimeError: exception occured! file=/project/cpp/include/raft/comms/detail/ucp_helper.hpp line=124: Cannot open UCX library: (null)

2022-11-12 16:17:05,996 - distributed.worker - WARNING - Run Failed
Function: _func_init_all
args: (b"'{\x19\xf3'\x1dB\x81\x93\xaeya!\x86q6", b'\x02\x00\xb1\x0c\xac\x15\x1d:\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', False, {'tcp://127.0.0.1:33644': {'rank': 0}, 'tcp://127.0.0.1:42241': {'rank': 1}}, False, 0)
kwargs: {}
Traceback (most recent call last):
File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/worker.py", line 3160, in run
result = await function(*args, **kwargs)
File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 459, in _func_init_all
_func_build_handle(sessionId, streams_per_handle, verbose)
File "/home/pzchen/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py", line 559, in _func_build_handle
inject_comms_on_handle_coll_only(
File "comms_utils.pyx", line 264, in raft_dask.common.comms_utils.inject_comms_on_handle_coll_only
RuntimeError: exception occured! file=/project/cpp/include/raft/comms/detail/ucp_helper.hpp line=124: Cannot open UCX library: (null)

RuntimeError Traceback (most recent call last)
File :2

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/cuml/dask/decomposition/pca.py:177, in PCA.fit_transform(self, X)
165 def fit_transform(self, X):
166 """
167 Fit the model with X and apply the dimensionality reduction on X.
168
(...)
175 X_new : dask cuDF
176 """
--> 177 return self.fit(X).transform(X)

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/cuml/dask/decomposition/pca.py:162, in PCA.fit(self, X)
153 def fit(self, X):
154 """
155 Fit the model with X.
156
(...)
159 X : dask cuDF input
160 """
--> 162 self._fit(X)
163 return self

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/cuml/dask/decomposition/base.py:71, in DecompositionSyncFitMixin._fit(self, X, _transform)
68 else:
69 comms = Comms(comms_p2p=False)
---> 71 comms.init(workers=data.workers)
73 data.calculate_parts_to_sizes(comms)
75 worker_info = comms.worker_info(comms.worker_addresses)

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py:200, in Comms.init(self, workers)
196 worker_info = {w: worker_info[w] for w in self.worker_addresses}
198 self.create_nccl_uniqueid()
--> 200 self.client.run(
201 _func_init_all,
202 self.sessionId,
203 self.uniqueId,
204 self.comms_p2p,
205 worker_info,
206 self.verbose,
207 self.streams_per_handle,
208 workers=self.worker_addresses,
209 wait=True,
210 )
212 self.nccl_initialized = True
214 if self.comms_p2p:

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/client.py:2836, in Client.run(self, function, workers, wait, nanny, on_error, *args, **kwargs)
2753 def run(
2754 self,
2755 function,
(...)
2761 **kwargs,
2762 ):
2763 """
2764 Run a function on all workers outside of task scheduling system
2765
(...)
2834 >>> c.run(print_state, wait=False) # doctest: +SKIP
2835 """
-> 2836 return self.sync(
2837 self._run,
2838 function,
2839 *args,
2840 workers=workers,
2841 wait=wait,
2842 nanny=nanny,
2843 on_error=on_error,
2844 **kwargs,
2845 )

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/utils.py:339, in SyncMethodMixin.sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
337 return future
338 else:
--> 339 return sync(
340 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
341 )

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/utils.py:406, in sync(loop, func, callback_timeout, *args, **kwargs)
404 if error:
405 typ, exc, tb = error
--> 406 raise exc.with_traceback(tb)
407 else:
408 return result

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/utils.py:379, in sync..f()
377 future = asyncio.wait_for(future, callback_timeout)
378 future = asyncio.ensure_future(future)
--> 379 result = yield future
380 except Exception:
381 error = sys.exc_info()

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/tornado/gen.py:769, in Runner.run(self)
766 exc_info = None
768 try:
--> 769 value = future.result()
770 except Exception:
771 exc_info = sys.exc_info()

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/distributed/client.py:2741, in Client._run(self, function, nanny, workers, wait, on_error, *args, **kwargs)
2738 continue
2740 if on_error == "raise":
-> 2741 raise exc
2742 elif on_error == "return":
2743 results[key] = exc

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py:459, in _func_init_all()
456 worker.log_event(topic="info", msg="Done building handle.")
458 else:
--> 459 _func_build_handle(sessionId, streams_per_handle, verbose)

File ~/miniconda3/envs/rapidsgenomics/lib/python3.9/site-packages/raft_dask/common/comms.py:559, in _func_build_handle()
556 nWorkers = raft_comm_state["nworkers"]
558 nccl_comm = raft_comm_state["nccl"]
--> 559 inject_comms_on_handle_coll_only(
560 handle, nccl_comm, nWorkers, workerId, verbose
561 )
562 raft_comm_state["handle"] = handle

File comms_utils.pyx:264, in raft_dask.common.comms_utils.inject_comms_on_handle_coll_only()

RuntimeError: exception occured! file=/project/cpp/include/raft/comms/detail/ucp_helper.hpp line=124: Cannot open UCX library: (null)

I use the python=3.9 and install the packages using pip install function.

thank you very much!

MultiGPU notebooks calculated differentially expressed PCAs

Hey Nvidia Genomics Team,

In the Multi GPU Notebook the AnnData object post PCA is created with the PCA in .X. Therefore when you later later perform the differential gene expression you calculate the which PCA is the most import for each cluster.

The ranked gene groups function still has a couple of bugs. There is a PR with a fix. Which also introduces some performance improvements.

Yours Severin

nvidia-genomics-research / rapids-single-cell-examples Goto Github PK

rapids-single-cell-examples's Introduction

GPU-Accelerated Single-Cell Genomics Analysis with RAPIDS

Installation

Docker container

conda

Configuration

Citation

Example 1: Single-cell RNA-seq of 70,000 Human Lung Cells

Example Dataset

Example Code

Acceleration

Example 2: Single-cell RNA-seq of 1.3 Million Mouse Brain Cells

Example Dataset

Example Code

Acceleration

Example 3: GPU-based Interactive Visualization of 70,000 Human Lung Cells (beta version)

Installation

Example Dataset

Example Code

Example 4: Droplet Single-cell ATAC-seq of 60K Bone Marrow Cells

Example Dataset

Example Code

Acceleration

Example 5: Visualizing Chromatin Accessibility in 5,000 PBMCs with RAPIDS and AtacWorks (Beta version)

Example Data

Example Model

Example Code

Adapting these examples to another dataset

rapids-single-cell-examples's People

Contributors

Stargazers

Watchers

Forkers

rapids-single-cell-examples's Issues

Getting the error:

Recommend Projects

Recommend Topics

Recommend Org