Comments (16)
This is a great question. I admit we have been mostly testing on 32gb GPUs but I can generate some random data with known sparsity to get a feel for how far we can push different GPUs.
What is the ideal target size and sparsity for your GPU? Are there any public datasets with a similar size/sparsity?
from rapids-single-cell-examples.
Good examples would be here: http://mousebrain.org/downloads.html (either the aggregate loom https://storage.googleapis.com/linnarsson-lab-loom/l5_all.agg.loom or maybe you can see where you hit the limit with concatenating different subsets http://mousebrain.org/loomfiles_level_L1.html ?)
For example, I'm currently processing a dataset of 330,000 cells including all of the above and a bunch of our datasets combined with batch correction.
from rapids-single-cell-examples.
Wanted to provide a small update just to let you know that I have been looking into this. I think the most straightforward solution here might be to use the unified virtual memory allocator. I’ll get an example together.
from rapids-single-cell-examples.
Ok, thanks! I'll close this.
from rapids-single-cell-examples.
I've made a modification to the notebook to enable the Unified Virtual Memory manager in RAPIDS & CuPy. Specifically, the change looks like this:
import rmm
rmm.reinitialize(
managed_memory=True, # Allows oversubscription
devices=0, # GPU device IDs to register. By default registers only GPU 0.
)
cp.cuda.set_allocator(rmm.rmm_cupy_allocator)
This should allow you to oversubscribe your GPU memory to use all available host memory and it will page memory onto the GPU as needed. While swapping pages can slow down the workflow, it does make it much easier to experiment and explore different datasets & sizes without having to think about available GPU memory.
For example, I was able to load the 10x 1.3M neuron dataset pretty easily onto a 32gb GPU and do all sorts of transformations to it without ever encountering an OOM.
If you get a chance, you should try this feature out and let us know if it allow you to scale higher than before. Specifically, we're also very curious to know if you still find it extremely fast. I didn't notice much hit to performance at all and I wasn't even paying attention to how much memory I was using (I'm sure it was well over 100gb by the time I was done).
from rapids-single-cell-examples.
I will try the 1.3M neuron dataset myself and some internal datasets at some point today.
Using this new modification looks excellent so far. Initial results are: 54 seconds (GPU) vs. 563 seconds (CPU) on a 32 core/ 64 thread Threadripper with a 1080 Ti and 128GB ram.
from rapids-single-cell-examples.
Just FYI, the 1.3M neuron dataset crashes the kernel on this line:
filtered = rapids_scanpy_funcs.filter_cells(sparse_gpu_array, min_genes=min_genes_per_cell, max_genes=max_genes_per_cell)
However, my other dataset of ~330K cells gets past this step. It doesn't seem to be a RAM/swap issue as I don't seem to be approaching the ceiling. Perhaps it's a VRAM issue? I'll try again later.
from rapids-single-cell-examples.
I meant to respond to you earlier about this. Indeed, as I was playing around with this I noticed the crash as well. It's actually a known bug in Cusparse and we're waiting for them to fix it. I played around a little bit and managed to isolate the bug to entry 1057790
in the input data. The problem is in the conversion from the CPU array to the GPU.
If you slice off the first 1M
records, or vstack everything up to 1057789 and above 1057791, the filter will work. I was able to run the 1M fairly easily all the way up to the regress_out
. We're also very close to merging a PR on cuML that will enable sparse inputs for PCA (and doesn't require conversion to dense for the mean centering).
from rapids-single-cell-examples.
Just FYI, I'm consistently crashing the kernel here with your new notebook and rapids_scanpy_funcs.py file:
%%time
sc.pp.highly_variable_genes(adata, n_top_genes=n_top_genes, flavor="cell_ranger")
adata = adata[:, adata.var.highly_variable]
Any suggestions?
from rapids-single-cell-examples.
What is the shape of adata
passed into the highly_variable_genes
? Is it giving you any type of error at all before the kernel crashes?
from rapids-single-cell-examples.
(989838, 23781) right after this step (adding a line):
%%time
adata = anndata.AnnData(sparse_gpu_array.get())
adata.var_names = genes.to_pandas()
adata.shape
and there is no python error except for the kernal crash and a system request to send a report.
Let me try shaving off cells to see if it's a memory issue.
from rapids-single-cell-examples.
I believe we might have hit a similar issue today where our Jupyter kernel crashed without giving any type of useful error information. I’m pretty sure it’s because we were running on a system that didn’t have enough main memory.
While the benefit to using the managed memory option is the ability to oversubscribe the GPU memory, it does now increase the requirement on the amount of main memory needed.
Many of the CPU examples of the 1.3M cells dataset indicate a requirement of at least 30gb of main memory to do the processing end to end. I think you can get away with a smaller gpu and managed memory, but this comes at the expense of needing more main memory.
from rapids-single-cell-examples.
I can run it fine with 300K cells (266 seconds) but somewhere a little above that number it fails to work. At 500K it gets stuck, never finishing but at 700K or above it seems to crash the kernel. As I mentioned, I have 128 GB of RAM and 628GB set aside for Swap but it doesn't appear to get near that limit--especially with 500K.
We are ordering workstations with 256 GB of RAM and hopefully, I'll add more VRAM with the next gen of video cards.
Update: it's a cell number somewhere between 350K and 400K that causes the kernel crash for me...350 took 299 seconds to finish but 400K crashed the kernel.
from rapids-single-cell-examples.
Just FYI, I've upgraded to 256 GB of RAM and completely reinstalled drivers and CUDA (from 10.1 to 10.2) and still have problems with the code getting "stuck" perpetually in the regression or scaling steps (no kernel crashes lately but a few CPU cores are continually engaged by Python but no progression to completion). Have you run this code on a 2080 Ti or other non-TESLA card?
This only happens above 350K cells.
Is there any way to troubleshoot this?
from rapids-single-cell-examples.
Have you run this code on a 2080 Ti or other non-TESLA card?
Unfortunately, I don't have any 2080 Ti's available to try and reproduce your problem on my end and the 1M cell notebook appears to be working w/ the T4 instances in AWS, which rules out the problem being exclusive to the Turing architecture. This behavior does sound very strange, though.
Is there any way to troubleshoot this?
A lot of times when errors are printed, they end up displaying in the command-line that's running the Jupyter notebook and not in the notebook itself. Do you see any errors on the command-line?
You can set verbose=True
in the call to rapids_scanpy_funcs.regress_out
, which will print something after every 500 cells are processed. If that's not enough, you can add more prints to the regress_out
and scale
functions in rapids_scanpy_funcs
.
If you have a command-line available, can you also run the nvidia-smi
command? That should at least help us determine if the GPU is being actively utilized when the code gets stuck.
from rapids-single-cell-examples.
Sorry for the delay...coming back to this it seems like it's now hanging at
%%time
sparse_gpu_array, genes = rapids_scanpy_funcs.filter_genes(sparse_gpu_array, genes, min_cells=1)
I'm guessing this is related to issue #53 ???
from rapids-single-cell-examples.
Related Issues (20)
- RuntimeError: exception occured! when analysis using multigpu HOT 2
- The results consistency of RAPIDS implementation
- error of read_with_filter function
- Cannot open UCX library: (null) in PCA analysis of 1M_brain_gpu_analysis_multigpu HOT 3
- TypeError: rank_genes_groups() got multiple values for argument 'groups'
- jupyter notebooks difference between in the repo and within docker container
- can't run inference with atacworks pretrained model within the container on A100
- fix `rapids_scanpy_funcs.highly_variable_genes`
- ImportError: /usr/lib/x86_64-linux-gnu/libcuda.so.1: file too short HOT 1
- Illegal Memory Access: 1.3M cells RTX A6000 48GB out of memory on scaling step HOT 1
- change of cuml library
- gpu memory issue - 8M cell 24G GPU out of memory during umap HOT 1
- cell number limit: data point for 10M and 70M dataset
- weird UMAP results and very inconsistent with CPU results HOT 1
- conda import environment not working HOT 1
- csr_matrix with the .get method will return "Last value of index pointer should be less than the size of index and data arrays" HOT 3
- Getting a scalar from an IntervalIndex is not yet supported
- Residual GPU Memory usage HOT 3
- error in rapids_scanpy_funcs.highly_variable_genes HOT 5
- MultiGPU notebooks calculated differentially expressed PCAs HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rapids-single-cell-examples.