Coder Social home page Coder Social logo

Dockerfile install about cuquantum HOT 6 CLOSED

nvidia avatar nvidia commented on June 22, 2024
Dockerfile install

from cuquantum.

Comments (6)

brian-dellabetta avatar brian-dellabetta commented on June 22, 2024 2

@mtjrider thank you! The architecture diagram is what I was missing, this is super helpful. I appreciate your help in sanity checking the image in a working environment, we'll try to reproduce on our end.

I will close and re-open the issue if we have further questions. Thanks again for the help

from cuquantum.

leofang avatar leofang commented on June 22, 2024 1

One more thing:

though the lib64->lib symlink is needed for it to work

Yes, we have become aware of this issue for building cuQuantum Python from source. We'll push a fix shortly. Thanks for bringing it up, Brian.

from cuquantum.

mtjrider avatar mtjrider commented on June 22, 2024

Hi @brian-dellabetta. Thanks for your interest in cuQuantum!

The image has cupy-cuda115, the conda install of cuquantum-python installs another version of cupy as a dependency so I uninstall the old one (it will complain during import if both are available). make all builds successfully (though the lib64->lib symlink is needed for it to work), but I am unable to run the python samples without hitting import errors.

All samples require an Nvidia GPU to run. Specifically, a GPU with compute capability 7.0+. Here's a useful table.

I am running on an intel-chip mac, just trying to clear up the import errors before we run this on a cloud instance with an nvidia GPU mounted in.

I'm guessing this is the issue. The import statements will fail without a valid driver installation. Without seeing the full error output, I cannot confirm.

Before posting any stacktraces, am I on the right track here? Maybe I should use a different base image that has an equivalent version of cupy. I'm also not sure if the cuda version is incompatible.

For cuQuantum, as long as your CUDA toolkit version is 11.2+, and CuPy's version is 9.5+, you should be fine. If you have a more specific concern, please include it in your response.

I am happy to submit a PR with the working Dockerfile once we figure this all out :)

Unfortunately, we aren't accepting code contributions at this time.

I'm wondering why you're using wget to acquire the binaries when they are automatically installed by conda in this line:

conda install -c conda-forge cuquantum-python

(e.g.)

conda install -c conda-forge cuquantum-python
...
The following NEW packages will be INSTALLED:

...
  cupy               conda-forge/linux-64::cupy-10.1.0-py310h64c8dd9_1
  cuquantum          conda-forge/linux-64::cuquantum-0.1.0.30-h5c60f85_2
  cuquantum-python   conda-forge/linux-64::cuquantum-python-0.1.0.0-py310h013f86e_3
  cutensor           conda-forge/linux-64::cutensor-1.4.0.6-h7537e88_2
...

It is also true that all of the samples are hosted in this repository.

Let us know if you're still having trouble or if you have other questions!

from cuquantum.

brian-dellabetta avatar brian-dellabetta commented on June 22, 2024

@mtjrider I'm just trying to make sure the image is valid and has all dependencies before attempting to run on an nvidia GPU. This requires an nvidia V100 or higher for compute capability 7.0+, corresponding to a p3.2xlarge or higher on AWS, and these get pricey, so I'm trying to tackle as much beforehand as possible.

Here's the error I'm seeing:

>>> import cuquantum
Traceback (most recent call last):
  File "/opt/conda/lib/python3.8/site-packages/cupy/__init__.py", line 18, in <module>
    from cupy import _core  # NOQA
  File "/opt/conda/lib/python3.8/site-packages/cupy/_core/__init__.py", line 1, in <module>
    from cupy._core import core  # NOQA
  File "cupy/_core/core.pyx", line 1, in init cupy._core.core
  File "/opt/conda/lib/python3.8/site-packages/cupy/cuda/__init__.py", line 8, in <module>
    from cupy.cuda import compiler  # NOQA
  File "/opt/conda/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 14, in <module>
    from cupy.cuda import function
  File "cupy/cuda/function.pyx", line 1, in init cupy.cuda.function
  File "cupy/_core/_carray.pyx", line 1, in init cupy._core._carray
  File "cupy/_core/internal.pyx", line 1, in init cupy._core.internal
  File "cupy/cuda/memory.pyx", line 1, in init cupy.cuda.memory
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

This seems to me more related to the versions of cupy and libcuda than an actual runtime error from lack of gpu. I might be mistaken though that the driver won't live in the docker image, that it will need to be installed on host and mounted into the image? I hope to try on a VM with a GPU later this week, will post updates here.

If not a Dockerfile, will an image be made available at some point on the NGC catalog or elsewhere? I'm sure it would be useful to others

from cuquantum.

brian-dellabetta avatar brian-dellabetta commented on June 22, 2024

Also @mtjrider the wget on the repo is just to pull in the code samples. i didn't see them in the installed directories
/opt/conda/lib/python3.8/site-packages/cuquantum_python-0.1.0.0.dist-info
/opt/conda/lib/python3.8/site-packages/cuquantum

Also, thanks for all the help!

from cuquantum.

mtjrider avatar mtjrider commented on June 22, 2024

@mtjrider I'm just trying to make sure the image is valid and has all dependencies before attempting to run on an nvidia GPU. This requires an nvidia V100 or higher for compute capability 7.0+, corresponding to a p3.2xlarge or higher on AWS, and these get pricey, so I'm trying to tackle as much beforehand as possible.

Makes perfect sense. Thanks for this clarification. To be clear, I've tested your Dockerfile on a system with GPUs to compile and run the tests, and it works without issue. When you deploy, please take care to confirm that the driver and compilation toolchain are compatible. The CUDA driver and kernel mode driver compatibility is documented here.

The following error indicates that the CUDA driver is missing. This is not installed in the container. Here is an architecture overview.

>>> import cuquantum
...
ImportError: libcuda.so.1: cannot open shared object file: No such file or directory

Also @mtjrider the wget on the repo is just to pull in the code samples. i didn't see them in the installed directories
/opt/conda/lib/python3.8/site-packages/cuquantum_python-0.1.0.0.dist-info
/opt/conda/lib/python3.8/site-packages/cuquantum

I meant that you may also clone the samples because they are hosted in this repository:

git clone https://github.com/NVIDIA/cuQuantum.git cuquantum && \
  ls -la cuquantum/samples
##  custatevec
##  cutensornet

Note: per this comment, I had to modify the Makefile to rename lib64 to lib. This line. Separately, I had to set LD_LIBRARY_PATH=/opt/conda/lib:$LD_LIBRARY_PATH. The command I used to compile the custatevec samples is:

CUSTATEVEC_ROOT=/opt/conda make

Here, I should note that I removed any wget commands because they are redundant with the conda install command.

from cuquantum.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.