nvidia / cuda-python Goto Github PK
View Code? Open in Web Editor NEWCUDA Python Low-level Bindings
Home Page: https://nvidia.github.io/cuda-python/
License: Other
CUDA Python Low-level Bindings
Home Page: https://nvidia.github.io/cuda-python/
License: Other
The output variable cshareableHandle_ptr
is incorrectly set to NULL
and is passed to CUDA, thus always returning CUDA_ERROR_INVALID_VALUE
.
The following similar APIs are likely not working either since they too have output variable void*
:
We're considering dropping support for Python 3.7 for the next release.
Per NEP 29, Python 3.7 drop schedule was almost a year ago and many associated libraries have already dropped it.
Let us know if there's concerns in having Python 3.7 dropped next release. Thanks!
I am experimenting with Cython 3.0.0 beta 1. It raises the following warning hundreds of times when cimport
-ing cuda-python.
warning: /home/bdice/mambaforge/envs/cudf_cython_beta/lib/python3.10/site-packages/cuda/ccudart.pxd:1123:154: The keyword 'nogil' should appear at the end of the function signature line. Placing it before 'except' or 'noexcept' will be disallowed in a future version of Cython.
I think this expects nogil except ?cudaErrorCallRequiresNewerDriver
to be replaced by except ?cudaErrorCallRequiresNewerDriver nogil
. If we could fix this, it would be a massive improvement to the compiler diagnostics that are shown during builds, which are otherwise overwhelmed by the sheer volume of this message.
Hi, I am running into the ''Context stack was not empty" error when I try to run a computer vision script on a webcam with a gstream pipeline. It works fine when running it from the terminal but when I schedule it with corn it runs into this error. Any suggestions?
When I ran the example file TensorFlowToTensorRT-NHWC.py, it occurs:Traceback (most recent call last):
File "TensorFlowToTensorRT-NHWC.py", line 161, in
_, inputD0 = cudart.cudaMallocAsync(inputH0.nbytes, stream)
File "cuda/cudart.pyx", line 16938, in cuda.cudart.cudaMallocAsync
File "cuda/ccudart.pyx", line 1210, in cuda.ccudart.cudaMallocAsync
File "cuda/_cuda/ccuda.pyx", line 4970, in cuda._cuda.ccuda._cuMemAllocAsync
RuntimeError: Function "cuMemAllocAsync" not found.
I'm using the NVIDIA NGC nvcr.io/nvidia/tensorflow:21.12-tf1-py3, the detailed environment are as follows:
GeForce RTX 2080 Ti,Driver Version: 455.23.05,nvcr.io/nvidia/tensorflow:21.12-tf1-py3,
Package Version
absl-py 1.0.0
appdirs 1.4.4
argon2-cffi 21.1.0
asgiref 3.4.1
astor 0.8.1
astunparse 1.6.3
attrs 21.2.0
audioread 2.1.9
backcall 0.2.0
bleach 4.1.0
cachetools 4.2.4
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.8
click 8.0.3
cloudpickle 2.0.0
cmake-setuptools 0.1.3
cuda-python 11.7.0
cudf 21.10.0a0+345.ge05bd4bf3c
cugraph 21.10.0a0+102.gab401cad
cuml 21.10.0a0+116.gdc14361ba
cupy-cuda114 9.3.0
cupy-cuda115 9.6.0
cycler 0.11.0
Cython 0.29.24
dask 2021.9.1
dask-cuda 21.10.0
dask-cudf 21.10.0a0+345.ge05bd4bf3c
dask-glm 0.2.0
dask-ml 1.9.0
debugpy 1.5.1
decorator 5.1.0
defusedxml 0.7.1
distributed 2021.9.1
Django 3.2.6
entrypoints 0.3
fastavro 1.4.4
fastrlock 0.8
filelock 3.4.0
flatbuffers 1.12
fsspec 2021.7.0
future 0.18.2
gast 0.3.3
google-pasta 0.2.0
graphsurgeon 0.4.5
grpcio 1.42.0
gunicorn 20.1.0
h11 0.12.0
h5py 2.10.0
HeapDict 1.0.1
horovod 0.22.1
httptools 0.2.0
huggingface-hub 0.0.12
idna 3.3
importlib-metadata 4.8.2
importlib-resources 5.4.0
iniconfig 1.1.1
ipykernel 6.6.0
ipython 7.30.0
ipython-genutils 0.2.0
jedi 0.18.1
Jinja2 3.0.3
joblib 1.1.0
json5 0.9.6
jsonschema 4.2.1
jupyter-client 7.1.0
jupyter-core 4.9.1
jupyter-tensorboard 0.2.0
jupyterlab 2.3.2
jupyterlab-pygments 0.1.2
jupyterlab-server 1.2.0
jupytext 1.13.2
Keras-Applications 1.0.8
Keras-Preprocessing 1.0.5
kiwisolver 1.3.2
librosa 0.9.1
llvmlite 0.36.0
locket 0.2.1
Markdown 3.3.6
markdown-it-py 1.1.0
MarkupSafe 2.0.1
matplotlib 3.4.3
matplotlib-inline 0.1.3
mdit-py-plugins 0.2.8
mistune 0.8.4
mock 3.0.5
msgpack 1.0.3
multipledispatch 0.6.0
nbclient 0.5.9
nbconvert 6.3.0
nbformat 5.1.3
nest-asyncio 1.5.4
networkx 2.6.3
nltk 3.6.4
notebook 6.4.3
numba 0.53.1
numpy 1.22.4
nvidia-dali-cuda110 1.8.0
nvidia-dali-tf-plugin-cuda110 1.8.0
nvidia-dlprofviewer 1.8.0
nvidia-pyindex 1.0.9
nvtx 0.2.3
onnx 1.11.0
onnxruntime-gpu 1.11.1
opencv-python 4.5.5.64
opt-einsum 3.3.0
packaging 21.3
pandas 1.2.5
pandocfilters 1.5.0
parso 0.8.3
partd 1.2.0
pexpect 4.7.0
pickleshare 0.7.5
Pillow 8.4.0
pip 21.3.1
pluggy 1.0.0
polygraphy 0.33.0
pooch 1.6.0
portpicker 1.3.1
prometheus-client 0.12.0
prompt-toolkit 3.0.23
protobuf 3.19.1
psutil 5.7.0
ptyprocess 0.7.0
py 1.11.0
pyarrow 5.0.0
pycparser 2.21
Pygments 2.10.0
pynvml 11.4.1
pyparsing 3.0.6
pypi-kenlm 0.1.20210121
pyrsistent 0.18.0
pytest 6.2.5
python-dateutil 2.8.2
python-dotenv 0.19.2
pytz 2021.3
PyYAML 6.0
pyzmq 22.3.0
regex 2021.11.10
requests 2.26.0
resampy 0.2.2
rmm 21.10.0a0+42.gae27a57
sacremoses 0.0.46
scikit-learn 0.24.0
scipy 1.4.1
Send2Trash 1.8.0
setuptools 59.4.0
six 1.16.0
sortedcontainers 2.4.0
SoundFile 0.10.3.post1
sqlparse 0.4.2
tblib 1.7.0
tensorboard 1.15.0
tensorflow 1.15.5+nv
tensorflow-estimator 1.15.1
tensorrt 8.2.1.8
termcolor 1.1.0
terminado 0.12.1
testpath 0.5.0
tf2onnx 1.10.1
threadpoolctl 3.0.0
tokenizers 0.10.3
toml 0.10.2
toolz 0.11.2
tornado 6.1
tqdm 4.62.3
traitlets 5.1.1
transformers 4.9.1
treelite 2.1.0
treelite-runtime 2.1.0
typing_extensions 4.0.1
ucx-py 0.21.0a0+37.gbfa0450
uff 0.6.9
urllib3 1.26.7
uvicorn 0.15.0
uvloop 0.16.0
watchgod 0.7
wcwidth 0.2.5
webencodings 0.5.1
websockets 10.1
Werkzeug 2.0.2
wheel 0.37.0
whitenoise 5.3.0
wrapt 1.13.3
xgboost 1.4.2
zict 2.0.0
zipp 3.6.0
Great project guys! I have a single request: Instead exposing the cudaMemoryAlloc etc APIs and then leaving it up to the programmer to perform garbage collection- could the memory management strategy happen internally?
Because I already can foresee a plethora of issues, wherein programmers forget to account for threads left on the kernel, along with memory leaks, access violations, segment faults etc which already happen for operating systems however now we introduce these absenteeisms to the GPU.
On a side note: Do you have any plans either with the foundries (AMD, Intel, TSMC, etc) for GPU based Processor-In-Memory ( PIM ) architecture or are these experimentals exclusive to the RAM developers ( Samsung ) only ?!
Regards
Right now the project doesn't have any set of explicitly supported python versions. NEP 29 provides an example of how this can be done:
All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.
Minimum Python ... version support should be adjusted upward on [a] major and minor release, but never on a patch release.
This language also allows forecasting of python versions and forecasting (of some degree) of the resources required to maintain the project due to PEP 602 which normalizes the release schedule of python versions.
There are at least two areas this practically impacts:
nvidia
channel of conda-forge, this bounds for which versions of python the binaries are targeted.from cuda import cuda
cuda.cuInit(0)
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Input In [2], in <module>
----> 1 cuda.cuInit(0)
File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/cuda.pyx:8876, in cuda.cuda.cuInit()
File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/ccuda.pyx:17, in cuda.ccuda.cuInit()
File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()
File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()
RuntimeError: Failed to dlopen libcuda.so
This is because in a WSL environment libcuda.so
lives in /usr/lib/wsl/lib
which is not in the default search path of dlopen. For libraries that link against libcuda
this isn't a problem because there's a file at /etc/ld.so.conf.d/ld.wsl.conf
which instructs the linker as to where it can find the libraries, but unfortunately dlopen doesn't use this.
As a workaround, adding /usr/lib/wsl/lib
to the LD_LIBRARY_PATH
environment variable resolves the problem.
Congratulations on the GA release! 🥳
I've been looking forward to the cuda bindings for a while, and was just looking through the docs.
The overview notes an implementation of ASSERT_DRV
, which already contains the caveat:
In a future release, this may automatically raise exceptions using a Python object model.
I'm not sure if that means that the errors are going to be subclasses of something like a CUDAError
, or if that is to be interpreted some other way, but in any case, I was quite surprised about this choice of exception API
Why not make the functions raise err
by default? Right now, IIUC, every invocation would need to accept an extra err
-return (and handle it with something like ASSERT_DRV
). This seems like a really onerous task to achieve the default behaviour of "fail in case of something unexpected" (and actively choosing where to introduce try... except:
handling to continue even if things fail).
It seems like a bad trade-off for me (high verbosity, and easy to forget adding an ASSERT_DRV
), but maybe I'm overlooking something?
The reasons I'm raising this right now, is that this would be a pretty fundamental API change, and if there's any chance at all (assuming it's not already "zero" after GA), it would be ASAP.
jkh@megamind-> env CUDA_HOME=~/Src/cuda-python python globalToShmemAsyncCopy.py
[globalToShmemAsyncCopy] - Starting...
Traceback (most recent call last):
File "/home/jkh/Src/cuda-python/examples/0_Simple/globalToShmemAsyncCopy.py", line 1054, in
main()
File "/home/jkh/Src/cuda-python/examples/0_Simple/globalToShmemAsyncCopy.py", line 1022, in main
major = checkCudaErrors(cudart.cudaDeviceGetAttribute(cudart.cudaDeviceAttr.cudaDevAttrComputeCapabilityMajor, devID))
File "/home/jkh/Src/cuda-python/examples/common/helper_cuda.py", line 24, in checkCudaErrors
raise RuntimeError("CUDA error code={}({})".format(result[0].value, _cudaGetErrorEnum(result[0])))
Note that I can get some of the other samples to pass, so this does not seem like a generic error.
I'm trying to compile cuda-python in a fairly minimal conda environment (nothing installed but the requirements), with cuda-11.6 installed, and seeing several instances of the following sort of error:
Error compiling Cython file:
------------------------------------------------------------
...
Get memory address of class instance
"""
pass
cdef class CUkernelNodeAttrValue_v1(CUlaunchAttributeValue_union):
^
------------------------------------------------------------
cuda/cuda.pxd:2637:36: First base of 'CUkernelNodeAttrValue_v1' is not an extension type
I do have other cuda versions installed alongside 11.6 but judging from the output of Parsing headers in "/usr/local/cuda-11.6/include"
it seems like it's probably finding the right version? Any advice on how to get past this, or debug it? Thanks!
The Cython example below demonstrates an attempt to use CUDA Python to interact with some external C++ code. Note that the "external code" is included inline in the Cython.
# distutils: language=c++
# distutils: extra_compile_args=-I/usr/local/cuda/include/
from cuda.ccudart cimport cudaMemAllocationHandleType
cdef extern from *:
"""
#include <cuda_runtime_api.h>
void foo(cudaMemAllocationHandleType x) {
return;
}
"""
void foo(cudaMemAllocationHandleType x)
foo(cudaMemAllocationHandleType.cudaMemHandleTypeNone)
The external code is a function foo
that accepts a cudaMemAllocationHandleType
. We attempt to invoke that function from Cython by passing in a cuda.ccudart.cudaMemAlloccationHandleType
, but this fails with an error like:
error: cannot convert '__pyx_t_4cuda_7ccudart_cudaMemAllocationHandleType' to 'cudaMemAllocationHandleType'
4857 | foo(__pyx_e_4cuda_7ccudart_cudaMemHandleTypeNone);
To reproduce the problem, save the example above to a flle foo.pyx
, then run cythonize -i foo.pyx
.
This is because the function foo
expects a cudaMemAllocationHandleType
that is defined in the CUDA runtime library. But CUDA Python "rewrites" the runtime library at the Cython layer, and has its own cudaMemAllocationHandleType
(which ends up with a mangled name when transpiled from Cython to C++). The two are not interchangeable.
A potential solution, proposed by @leofang in an offline discussion, is to use extern declarations for types in ccudart.pxd
, rather than to redefine them. For example:
diff --git a/cuda/ccudart.pxd b/cuda/ccudart.pxd
index 57e1e96..6c0b5d4 100644
--- a/cuda/ccudart.pxd
+++ b/cuda/ccudart.pxd
@@ -678,11 +678,12 @@ cdef enum cudaMemAllocationType:
cudaMemAllocationTypePinned = 1
cudaMemAllocationTypeMax = 2147483647
-cdef enum cudaMemAllocationHandleType:
- cudaMemHandleTypeNone = 0
- cudaMemHandleTypePosixFileDescriptor = 1
- cudaMemHandleTypeWin32 = 2
- cudaMemHandleTypeWin32Kmt = 4
+cdef extern from 'driver_types.h':
+ ctypedef enum cudaMemAllocationHandleType 'cudaMemAllocationHandleType':
+ cudaMemHandleTypeNone = 0
+ cudaMemHandleTypePosixFileDescriptor = 1
+ cudaMemHandleTypeWin32 = 2
+ cudaMemHandleTypeWin32Kmt = 4
cdef struct cudaMemPoolProps:
cudaMemAllocationType allocType
diff --git a/setup.py b/setup.py
index 394166e..16fad9f 100644
--- a/setup.py
+++ b/setup.py
@@ -30,6 +30,7 @@ except Exception:
include_dirs = [
os.path.dirname(sysconfig.get_path("include")),
+ '/usr/local/cuda-11.4/include',
]
library_dirs = [get_python_lib(), os.path.join(os.sys.prefix, "lib")]
Currently, we ship a single version of CUDA Python that is built with the latest CUDA toolkit, and we expect it to work for older minor versions of the CUDA toolkit by leveraging CUDA enhanced compatibility.
Historically, there have been cases when the runtime API has changed across minor versions of the CUDA toolkit. In particular, the names/ordering of enum members have changed between minor versions. For example, in CUDA 10.1, there was a typo in the enum member cudaErrorDeviceUninitilialized
that was fixed in 10.2.
It's not clear how we would handle the situation if something like that were to happen again. In the example above, we would have to have separate extern declarations for 10.1 and 10.2 somehow.
cuda-python 11.6.1
cuda toolkit 11.2
Ubuntu Linux
If you run something like the following on a multi-GPU machine
device_num = 5
err, = cuda.cuInit(0)
err, device = cuda.cuDeviceGet(device_num)
err, cuda_context = cuda.cuCtxCreate(0, device)
err, = cudart.cudaSetDevice(device)
The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES). I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices. This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program. I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs. Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.
I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor. When I call cudart.cudaSetDevice, I am able to use the pointer properly.
jkh@megamind-> pytest
============================= test session starts ==============================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/jkh/Src/cuda-python, inifile: pytest.ini
plugins: typeguard-2.11.1
collecting ... Fatal Python error: Segmentation fault
Current thread 0x00007f1f80dc5740 (most recent call first):
File "/home/jkh/.local/lib/python3.8/site-packages/llvmlite/binding/ffi.py", line 113 in call
File "/home/jkh/.local/lib/python3.8/site-packages/llvmlite/binding/targets.py", line 60 in get_host_cpu_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 881 in get_host_cpu_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 782 in _get_host_cpu_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 846 in _customize_tm_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 652 in _init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 645 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/cpu.py", line 47 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 32 in _acquire_compile_lock
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/base.py", line 259 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/registry.py", line 31 in _toplevel_target_context
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/utils.py", line 332 in get
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/registry.py", line 47 in target_context
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/dispatcher.py", line 670 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/decorators.py", line 187 in wrapper
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/decorators.py", line 173 in jit
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/decorators.py", line 236 in njit
File "/home/jkh/.local/lib/python3.8/site-packages/numba/typed/typeddict.py", line 23 in
File "", line 219 in _call_with_frames_removed
File "", line 783 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/.local/lib/python3.8/site-packages/numba/typed/init.py", line 1 in
File "", line 219 in _call_with_frames_removed
File "", line 783 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/.local/lib/python3.8/site-packages/numba/init.py", line 298 in
File "", line 219 in _call_with_frames_removed
File "", line 783 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/Src/cuda-python/cuda/benchmarks/test_numba.py", line 11 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line 152 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/anaconda3/lib/python3.8/site-packages/py/_path/local.py", line 704 in pyimport
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 511 in _importtestmodule
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 443 in _getobj
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 261 in obj
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 459 in _inject_setup_module_fixture
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 446 in collect
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 264 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 244 in from_call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 264 in pytest_make_collect_report
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 382 in collect_one_node
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 681 in genitems
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 684 in genitems
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 490 in _perform_collect
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 452 in perform_collect
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 257 in pytest_collection
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 246 in _main
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 191 in wrap_session
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 240 in pytest_cmdline_main
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/config/init.py", line 124 in main
File "/home/jkh/anaconda3/bin/pytest", line 11 in
Segmentation fault (core dumped)
After following the steps on cuda-python to install cuda-python with conda instruction, I try to
from cuda import cuda, nvrtc
as in the example in the pycharm python console, but it raises an error:
Traceback (most recent call last):
File "D:\Anaconda\envs\hierot\lib\code.py", line 90, in runcode
exec(code, self.locals)
File "<input>", line 1, in <module>
File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
File "cuda\cuda.pyx", line 1, in init cuda.cuda
# Copyright 2021-2022 NVIDIA Corporation. All rights reserved.
File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'cuda._lib'; 'cuda' is not a package
But the code above can be successfully run in the terminal
(hierot) D:\Projects\SimPlatform>python
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda import cuda, nvrtc
>>>
Please help me with the problem, thanks in advance. Further information provided on request.
I searched with
ModuleNotFoundError: No module named 'xxx'
Solutions suggest configure correct python interpreter, but I believe my interpreter is already properly configured.
And search with
No module named 'xxx'; 'yyy' is not a package
Some says the cause is the name cuda is shadowed by the package name cuda, I think it might be the problem. Please check this.
I am currently stuck on trying to get cudaMemcpy2D
working. It reports the error cudaErrorInvalidMemcpyDirection
when supplying the function with any value from MemcpyKind
. I have reinstalled and double checked my CUDA and toolkit installs multiple times and have tested on my own and a colleagues machine, so I'm positive they are fine.
A minimal example that shows the problem I am facing is as follows:
from cuda import cuda, cudart
from cuda_utils import cuda_check_errors
import numpy as np
checkCudaErrors(cuda.cuInit(0))
device_id = checkCudaErrors(cuda.cuDeviceGet(0))
context = checkCudaErrors(cuda.cuCtxCreate(0, device_id ))
def allocate_np_array(array: np.ndarray):
rows, cols = array.shape
device_ptr, pitch = checkCudaErrors(cudart.cudaMallocPitch(cols * array.itemsize, rows))
checkCudaErrors(
cudart.cudaMemcpy2D(
device_ptr,
pitch,
array.ctypes.data,
array.shape[1] * array.itemsize,
array.shape[1] * array.itemsize,
array.shape[0],
cudart.cudaMemcpyKind.cudaMemcpyHostToDevice
))
if __name__ == "__main__":
arr = np.ones((40, 40), dtype=np.float32)
allocate_np_array(arr)
The checkCudaErrors
function is the same one used in this repos examples.
The output looks as follows (with filepath removed):
Traceback (most recent call last):
File "test.py", line 27, in <module>
allocate_np_array(arr)
File "test.py", line 13, in allocate_np_array
cuda_check_errors(
File "/.../cuda_utils.py", line 22, in checkCudaErrors
raise RuntimeError("CUDA Error code: {} ({})".format(result[0].value, cuda_get_error_enum(result[0])))
RuntimeError: CUDA Error code: 21 (cudaErrorInvalidMemcpyDirection)
It might be important to note that I am new to CUDA and this is my first real issue post, so don't be afraid to point out my errors in either. Thanks in advance!
Using an environment with:
mamba create -n testing -c nvidia -c conda-forge python=3.9 'cuda-toolkit>=11.7' 'cuda-python>=11.7'
from cuda import cudart
print(cudart.cudaSetDevice(2))
print(cudart.cudaGetDevice())
=>
(<cudaError_t.cudaSuccess: 0>,)
(<cudaError_t.cudaSuccess: 0>, 0)
Expected result: the cudaGetDevice()
call should return device 2, not device 0.
The problem appears to be because cudaSetDevice
only calls ccudart.utils.lazyInitGlobal
, whereas cudaGetDevice
calls ccudart.utils.lazyInit
(which calls lazyInitDevice(0)
).
I think that cudaGetDevice
just needs to not call lazyInit
(the case of no context being in place is handled by the branch that calls cudaSetDevice(0)
)
https://github.com/NVIDIA/cuda-python/blob/main/cuda/_lib/ccudart/ccudart.pyx#L1039-L1045
Plausibly a patch like this?
diff --git a/cuda/_lib/ccudart/ccudart.pyx b/cuda/_lib/ccudart/ccudart.pyx
index d42d594..d7f3602 100644
--- a/cuda/_lib/ccudart/ccudart.pyx
+++ b/cuda/_lib/ccudart/ccudart.pyx
@@ -1032,9 +1032,6 @@ cdef cudaError_t _cudaGetDevice(int* device) nogil except ?cudaErrorCallRequires
cdef cudaError_t err
cdef ccuda.CUresult err_driver
cdef ccuda.CUcontext context
- err = m_global.lazyInit()
- if err != cudaSuccess:
- return err
err_driver = ccuda._cuCtxGetCurrent(&context)
if err_driver == ccuda.cudaError_enum.CUDA_ERROR_INVALID_CONTEXT or (err_driver == ccuda.cudaError_enum.CUDA_SUCCESS and context == NULL):
@@ -1045,14 +1042,16 @@ cdef cudaError_t _cudaGetDevice(int* device) nogil except ?cudaErrorCallRequires
err_driver = ccuda._cuCtxGetCurrent(&context)
if err_driver != ccuda.cudaError_enum.CUDA_SUCCESS:
- _setLastError(err)
- return err
+ _setLastError(<cudaError_t>err_driver)
+ return <cudaError_t>err
found = False
for deviceOrdinal in range(m_global._numDevices):
if m_global._driverContext[deviceOrdinal] == context:
found = True
break
+ else:
+ return cudaErrorDeviceUninitialized
device[0] = deviceOrdinal if found else 0
return cudaSuccess
Note this has two other fixes:
err_driver != CUDA_SUCCESS
actually return the error codecudaErrorDeviceUninitialized
(not sure if this is the correct error code)It seems to be missing cudaLaunchKernel.
>>> from cuda import cudart
>>> print(cudart.cudaLaunchKernel)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'cuda.cudart' has no attribute 'cudaLaunchKernel'
Hi,
I got the following strncat stringop-overflow warning when compiling.
I think it is better to add an appropriate check for the string length (for example, kumattau@f442d65)
gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./cuda -I./cuda/_cuda -I/usr/local/python/include/python3.8 -c cuda/_cuda/loader.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/loader.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
In file included from /usr/include/string.h:495,
from /usr/include/c++/9/cstring:42,
from cuda/_cuda/loader.cpp:10:
In function ‘char* strncat(char*, const char*, size_t)’,
inlined from ‘char* replaceSystemPath(char*)’ at cuda/_cuda/loader.cpp:219:12,
inlined from ‘int dxcore_check_adapter(dxcore_lib*, char*, dxcore_adapterInfo*)’ at cuda/_cuda/loader.cpp:246:43,
inlined from ‘int dxcore_enum_adapters(dxcore_lib*, char*)’ at cuda/_cuda/loader.cpp:290:34,
inlined from ‘int getCUDALibraryPath(char*, bool)’ at cuda/_cuda/loader.cpp:345:29:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:136:34: warning: ‘char* __builtin___strncat_chk(char*, const char*, long unsigned int, long unsigned int)’ specified bound 260 equals destination size [-Wstringop-overflow=]
136 | return __builtin___strncat_chk (__dest, __src, __len, __bos (__dest));
| ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
python setup.py ...
in favor of pip install ...
.In order to prevent errors, a continuous tested system and make an efficient architecture for CUDA API to run smoothly there is a need for Doctests for the python function
You can avoid crashes, bugs, and the result would be an efficient system
References are :
https://dev.to/perigk/doctests-the-shy-giant-of-testing-modules-3g74
https://pymotw.com/2/doctest/
Hello, I'm running into the following issues anytime I try installing cuda-python
via poetry. I also tried considering
cuda-python = {url = "https://files.pythonhosted.org/packages/bb/3f/0c38c8716a3a15d71c94696fd43290ed4d4f0361d36409f68ffb15478593/cuda_python-12.3.0-cp311-cp311-win_amd64.whl"}
but no luck. I would greatly appreciate any advice around this
Package operations: 1 install, 3 updates, 0 removals
• Updating urllib3 (1.26.18 -> 2.0.7)
• Updating protobuf (3.20.3 -> 4.24.4)
• Updating types-requests (2.31.0.6 -> 2.31.0.10)
• Installing cuda-python (12.3.0): Failed
RuntimeError
Unable to find installation candidates for cuda-python (12.3.0)
at /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/poetry/installation/chooser.py:109 in choose_for
105│
106│ links.append(link)
107│
108│ if not links:
→ 109│ raise RuntimeError(f"Unable to find installation candidates for {package}")
110│
111│ # Get the best link
112│ chosen = max(links, key=lambda link: self._sort_key(package, link))
Capturing this conversations from a few different places to the soon to be public repo. The top level package is currently cudapython
, however we would like to rename it to cuda
since having python in the package name is redundant.
from cudapython import cuda, nvrtc
becomes...
from cuda import cuda, nvrtc
The PIP and conda package will be cuda
.
cc @jakirkham
The function signature of many APIs in the docs appear to return a 2-tuple like (status, None)
but the actual return value is a 1-tuple like (status,)
.
For example:
Docs link: https://nvidia.github.io/cuda-python/api.html#cuda.cudart.cudaSetDevice
Source:
Lines 7903 to 7910 in d7a354d
In some cases, there is a second element like a string or memory pool in the tuple but the documented type indicates it as None
.
For example:
Lines 7164 to 7193 in d7a354d
Lines 7641 to 7670 in d7a354d
Let me know if you'd like me to work on this issue. I would be happy to contribute a pull request to improve the docs. 👍 I see PRs are not currently accepted.
Hi all,
As the title suggests, is there a way to limit the total amount of memory that a process can allocate on a single CUDA device?
Perhaps, even by using pyNVML?
This issue is related to the following discussions:
What are the cons of sharing the resources of a single CUDA device among different processes competing for access?
The current implementation of cuda.cudart.cudaRuntimeGetVersion()
hard-codes the runtime version, rather than querying the runtime for its version. This results in incorrect runtime versions if the runtime version is different from the version of cuda-python.
cuda-python/cuda/_lib/ccudart/ccudart.pyx
Lines 79 to 82 in 746b773
cuda-python/cuda/_lib/ccudart/utils.pyx
Line 37 in 746b773
A workaround used in rapidsai/rmm#946 is to use numba's API for this instead:
import numba.cuda
def cudaRuntimeGetVersion():
major, minor = numba.cuda.runtime.get_version()
return major * 1000 + minor * 10
Hi Developers,
As a newcomer to cuda-python, I'm eager to learn how to implement CUDA-aware MPI transmission. I'd appreciate any guidance or resources on this topic. Thank you!
Hi,
I checked out the package and tried to build it on AmazonLinux but it fails to compile. Please see the build output below. I also tried all other commands there were mentioned in installation guide, but all failed with the same issue.
Cuda : 11.2
GCC: 9.3
$ python setup.py build
Compiling cuda/_cuda/ccuda.pyx because it changed.
Compiling cuda/_cuda/cnvrtc.pyx because it changed.
[1/2] Cythonizing cuda/_cuda/ccuda.pyx
[2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
Compiling cuda/_lib/utils.pyx because it changed.
[1/1] Cythonizing cuda/_lib/utils.pyx
Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
Compiling cuda/_lib/ccudart/utils.pyx because it changed.
[1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
[2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
Compiling cuda/ccuda.pyx because it changed.
Compiling cuda/ccudart.pyx because it changed.
Compiling cuda/cnvrtc.pyx because it changed.
Compiling cuda/cuda.pyx because it changed.
Compiling cuda/cudart.pyx because it changed.
Compiling cuda/nvrtc.pyx because it changed.
[1/6] Cythonizing cuda/ccuda.pyx
[2/6] Cythonizing cuda/ccudart.pyx
[3/6] Cythonizing cuda/cnvrtc.pyx
[4/6] Cythonizing cuda/cuda.pyx
[5/6] Cythonizing cuda/cudart.pyx
[6/6] Cythonizing cuda/nvrtc.pyx
Compiling cuda/tests/test_ccuda.pyx because it changed.
Compiling cuda/tests/test_ccudart.pyx because it changed.
Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
[1/3] Cythonizing cuda/tests/test_ccuda.pyx
[2/3] Cythonizing cuda/tests/test_ccudart.pyx
[3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/cuda
copying cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda
copying cuda/_version.py -> build/lib.linux-x86_64-3.8/cuda
creating build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_cuda
creating build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib
creating build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/__init__.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/kernels.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/perf_test_utils.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_cupy.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_launch_latency.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_numba.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_pointer_attributes.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
creating build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/__init__.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_cuda.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_cudart.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_cython.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_interoperability.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_kernelParams.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_nvrtc.py -> build/lib.linux-x86_64-3.8/cuda/tests
creating build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/__init__.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cuda.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cudart.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/nvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cuda.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cudart.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/nvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cuda.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cudart.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/nvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/_cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/loader.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/loader.h -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/loader.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_lib/dlfcn.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/param_packer.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/param_packer.h -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/param_packer.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/tests/test_ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_interoperability_cython.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_interoperability_cython.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/_lib/ccudart/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
UPDATING build/lib.linux-x86_64-3.8/cuda/_version.py
set build/lib.linux-x86_64-3.8/cuda/_version.py to '11.7.1'
running build_ext
building 'cuda._cuda.ccuda' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cuda
creating build/temp.linux-x86_64-3.8/cuda/_cuda
/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -fPIC -I./cuda -I./cuda/_cuda -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include -I/usr/local/cuda-11.2/include -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include/python3.8 -c cuda/_cuda/ccuda.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/ccuda.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cuda/_cuda/ccuda.cpp: In function 'int __pyx_f_4cuda_5_cuda_5ccuda_cuPythonInit()':
cuda/_cuda/ccuda.cpp:4202:138: error: 'CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM' was not declared in this scope
4202 | __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0x1B58, CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 836, __pyx_L4_error)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:4924:137: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
4924 | __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0xFA0, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 917, __pyx_L4_error)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:5637:152: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
5637 | __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuGetErrorString"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuGetErrorString), 0x1770, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 997, __pyx_L4_error)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:15609:73: error: 'CUflushGPUDirectRDMAWritesTarget' was not declared in this scope
15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:15609:122: error: 'CUflushGPUDirectRDMAWritesScope' was not declared in this scope
15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:15609:167: warning: expression list treated as compound expression in initializer [-fpermissive]
15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
| ^
cuda/_cuda/ccuda.cpp:16977:94: error: 'CUexecAffinityType' has not been declared
16977 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int *__pyx_v_pi, CUexecAffinityType __pyx_v_typename, CUdevice __pyx_v_dev) {
| ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int*, int, CUdevice)':
cuda/_cuda/ccuda.cpp:17082:30: error: expected primary-expression before '(' token
17082 | __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17082:32: error: expected primary-expression before ')' token
17082 | __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17082:34: error: expected primary-expression before 'int'
17082 | __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
| ^~~
cuda/_cuda/ccuda.cpp:17082:41: error: 'CUexecAffinityType' was not declared in this scope
17082 | __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
| ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:17082:69: error: expected primary-expression before ')' token
17082 | __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17082:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport'
17082 | __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:17319:86: error: 'CUexecAffinityParam' has not been declared
17319 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUcontext *__pyx_v_pctx, CUexecAffinityParam *__pyx_v_paramsArray, int __pyx_v_numParams, unsigned int __pyx_v_flags, CUdevice __pyx_v_dev) {
| ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUctx_st**, int*, int, unsigned int, CUdevice)':
cuda/_cuda/ccuda.cpp:17424:30: error: expected primary-expression before '(' token
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17424:32: error: expected primary-expression before ')' token
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17424:44: error: expected primary-expression before '*' token
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17424:45: error: expected primary-expression before ',' token
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17424:47: error: 'CUexecAffinityParam' was not declared in this scope
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:17424:68: error: expected primary-expression before ',' token
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17424:70: error: expected primary-expression before 'int'
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^~~
cuda/_cuda/ccuda.cpp:17424:75: error: expected primary-expression before 'unsigned'
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:17424:97: error: expected primary-expression before ')' token
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ^
cuda/_cuda/ccuda.cpp:17424:99: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3'
17424 | __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:20397:67: error: 'CUexecAffinityParam' was not declared in this scope
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
| ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:20397:88: error: '__pyx_v_pExecAffinity' was not declared in this scope
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
| ^~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:20397:111: error: 'CUexecAffinityType' was not declared in this scope
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
| ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:20397:146: warning: expression list treated as compound expression in initializer [-fpermissive]
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
| ^
cuda/_cuda/ccuda.cpp:33564:75: error: 'CUDA_ARRAY_MEMORY_REQUIREMENTS' was not declared in this scope
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:33564:107: error: '__pyx_v_memoryRequirements' was not declared in this scope
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:33564:143: error: expected primary-expression before '__pyx_v_array'
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
| ^~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:33564:167: error: expected primary-expression before '__pyx_v_device'
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
---
truncated due to git issue limit
---
cuda/_cuda/ccuda.cpp:58806:44: error: 'CUgraphMem_attribute' was not declared in this scope
58806 | __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
| ^~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:58806:66: error: expected primary-expression before 'void'
58806 | __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
| ^~~~
cuda/_cuda/ccuda.cpp:58806:74: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute'
58806 | __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:64515:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
| ^~~~~~~~~~~~
| CUsurfObject
cuda/_cuda/ccuda.cpp:64515:79: error: '__pyx_v_object_out' was not declared in this scope
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
| ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:64515:99: error: expected primary-expression before 'void'
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
| ^~~~
cuda/_cuda/ccuda.cpp:64515:127: error: expected primary-expression before '__pyx_v_destroy'
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
| ^~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:64515:144: error: expected primary-expression before 'unsigned'
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:64515:182: error: expected primary-expression before 'unsigned'
64515 | atic CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:64515:208: warning: expression list treated as compound expression in initializer [-fpermissive]
64515 | a_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
| ^
cuda/_cuda/ccuda.cpp:64686:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
| ^~~~~~~~~~~~
| CUsurfObject
cuda/_cuda/ccuda.cpp:64686:94: error: expected primary-expression before 'unsigned'
64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:64686:120: warning: expression list treated as compound expression in initializer [-fpermissive]
64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
| ^
cuda/_cuda/ccuda.cpp:64857:66: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
| ^~~~~~~~~~~~
| CUsurfObject
cuda/_cuda/ccuda.cpp:64857:95: error: expected primary-expression before 'unsigned'
64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:64857:121: warning: expression list treated as compound expression in initializer [-fpermissive]
64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
| ^
cuda/_cuda/ccuda.cpp:65028:93: error: 'CUuserObject' has not been declared
65028 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count, unsigned int __pyx_v_flags) {
| ^~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph, int, unsigned int, unsigned int)':
cuda/_cuda/ccuda.cpp:65133:30: error: expected primary-expression before '(' token
65133 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
| ^
cuda/_cuda/ccuda.cpp:65133:32: error: expected primary-expression before ')' token
65133 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
| ^
cuda/_cuda/ccuda.cpp:65133:41: error: expected primary-expression before ',' token
65133 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
| ^
cuda/_cuda/ccuda.cpp:65133:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
65133 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
| ^~~~~~~~~~~~
| CUsurfObject
cuda/_cuda/ccuda.cpp:65133:57: error: expected primary-expression before 'unsigned'
65133 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:65133:71: error: expected primary-expression before 'unsigned'
65133 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:65133:85: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject'
65133 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:65199:94: error: 'CUuserObject' has not been declared
65199 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
| ^~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph, int, unsigned int)':
cuda/_cuda/ccuda.cpp:65304:30: error: expected primary-expression before '(' token
65304 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
| ^
cuda/_cuda/ccuda.cpp:65304:32: error: expected primary-expression before ')' token
65304 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
| ^
cuda/_cuda/ccuda.cpp:65304:41: error: expected primary-expression before ',' token
65304 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
| ^
cuda/_cuda/ccuda.cpp:65304:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
65304 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
| ^~~~~~~~~~~~
| CUsurfObject
cuda/_cuda/ccuda.cpp:65304:57: error: expected primary-expression before 'unsigned'
65304 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:65304:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject'
65304 | __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:74604:69: error: 'CUmoduleLoadingMode' was not declared in this scope
74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
| ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:74604:90: error: '__pyx_v_mode' was not declared in this scope; did you mean '__pyx_k_name'?
74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
| ^~~~~~~~~~~~
| __pyx_k_name
cuda/_cuda/ccuda.cpp:74775:145: error: 'CUmemRangeHandleType' has not been declared
74775 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void *__pyx_v_handle, CUdeviceptr __pyx_v_dptr, size_t __pyx_v_size, CUmemRangeHandleType __pyx_v_handleType, unsigned PY_LONG_LONG __pyx_v_flags) {
| ^~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void*, CUdeviceptr, size_t, int, long long unsigned int)':
cuda/_cuda/ccuda.cpp:74880:30: error: expected primary-expression before '(' token
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ^
cuda/_cuda/ccuda.cpp:74880:32: error: expected primary-expression before ')' token
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ^
cuda/_cuda/ccuda.cpp:74880:34: error: expected primary-expression before 'void'
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ^~~~
cuda/_cuda/ccuda.cpp:74880:53: error: expected primary-expression before ',' token
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ^
cuda/_cuda/ccuda.cpp:74880:61: error: expected primary-expression before ',' token
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ^
cuda/_cuda/ccuda.cpp:74880:63: error: 'CUmemRangeHandleType' was not declared in this scope; did you mean 'CUmemHandleType'?
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ^~~~~~~~~~~~~~~~~~~~
| CUmemHandleType
cuda/_cuda/ccuda.cpp:74880:85: error: expected primary-expression before 'unsigned'
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ^~~~~~~~
cuda/_cuda/ccuda.cpp:74880:108: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange'
74880 | __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
| ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| )
error: command '/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc' failed with exit status 1
It would be great if it was possible to link against NVRTC statically.
(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python$ pip install cuda-python==11.7
Requirement already satisfied: cuda-python==11.7 in /home/nyck33/anaconda3/envs/cython/lib/python3.10/site-packages (11.7.0)
Requirement already satisfied: cython in /home/nyck33/anaconda3/envs/cython/lib/python3.10/site-packages (from cuda-python==11.7) (3.0.0a11)
(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python$ python -m pytest
================================ test session starts =================================
platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python, configfile: pytest.ini
plugins: benchmark-4.0.0
collected 12 items / 21 errors
======================================= ERRORS =======================================
______________ ERROR collecting cuda/benchmarks/test_launch_latency.py _______________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/benchmarks/test_launch_latency.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/benchmarks/test_launch_latency.py:9: in <module>
from cuda import cuda
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
____________ ERROR collecting cuda/benchmarks/test_pointer_attributes.py _____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/benchmarks/test_pointer_attributes.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/benchmarks/test_pointer_attributes.py:9: in <module>
from cuda import cuda
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
______________________ ERROR collecting cuda/tests/test_cuda.py ______________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_cuda.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_cuda.py:10: in <module>
import cuda.cuda as cuda
E ModuleNotFoundError: No module named 'cuda.cuda'
_____________________ ERROR collecting cuda/tests/test_cudart.py _____________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_cudart.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_cudart.py:10: in <module>
import cuda.cudart as cudart
E ModuleNotFoundError: No module named 'cuda.cudart'
_____________________ ERROR collecting cuda/tests/test_cython.py _____________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_cython.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_cython.py:35: in <module>
mod = importlib.import_module(mod)
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
E ModuleNotFoundError: No module named 'cuda.tests.test_ccuda'
________________ ERROR collecting cuda/tests/test_interoperability.py ________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_interoperability.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_interoperability.py:9: in <module>
import cuda.cuda as cuda
E ModuleNotFoundError: No module named 'cuda.cuda'
__________________ ERROR collecting cuda/tests/test_kernelParams.py __________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_kernelParams.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_kernelParams.py:9: in <module>
from cuda import cuda, cudart, nvrtc
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_____________________ ERROR collecting cuda/tests/test_nvrtc.py ______________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_nvrtc.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_nvrtc.py:9: in <module>
from cuda import nvrtc
E ImportError: cannot import name 'nvrtc' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
____________ ERROR collecting examples/0_Introduction/clock_nvrtc_test.py ____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/clock_nvrtc_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/clock_nvrtc_test.py:9: in <module>
from cuda import cuda
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_______ ERROR collecting examples/0_Introduction/simpleCubemapTexture_test.py ________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/simpleCubemapTexture_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/simpleCubemapTexture_test.py:13: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_____________ ERROR collecting examples/0_Introduction/simpleP2P_test.py _____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/simpleP2P_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/simpleP2P_test.py:11: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
__________ ERROR collecting examples/0_Introduction/simpleZeroCopy_test.py ___________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/simpleZeroCopy_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/simpleZeroCopy_test.py:13: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_________ ERROR collecting examples/0_Introduction/systemWideAtomics_test.py _________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/systemWideAtomics_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/systemWideAtomics_test.py:12: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
___________ ERROR collecting examples/0_Introduction/vectorAddDrv_test.py ____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/vectorAddDrv_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/vectorAddDrv_test.py:11: in <module>
from cuda import cuda
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
___________ ERROR collecting examples/0_Introduction/vectorAddMMAP_test.py ___________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/vectorAddMMAP_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/vectorAddMMAP_test.py:12: in <module>
from cuda import cuda
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_ ERROR collecting examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py _
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py:13: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
______ ERROR collecting examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py ______
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py:13: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_________ ERROR collecting examples/3_CUDA_Features/simpleCudaGraphs_test.py _________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/3_CUDA_Features/simpleCudaGraphs_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/3_CUDA_Features/simpleCudaGraphs_test.py:11: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
__ ERROR collecting examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py __
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py:12: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_______________ ERROR collecting examples/extra/isoFDModelling_test.py _______________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/extra/isoFDModelling_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/extra/isoFDModelling_test.py:10: in <module>
from cuda import cuda, cudart
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
________________ ERROR collecting examples/extra/jit_program_test.py _________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/extra/jit_program_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
examples/extra/jit_program_test.py:10: in <module>
from cuda import cuda, nvrtc
E ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
============================== short test summary info ===============================
ERROR cuda/benchmarks/test_launch_latency.py
ERROR cuda/benchmarks/test_pointer_attributes.py
ERROR cuda/tests/test_cuda.py
ERROR cuda/tests/test_cudart.py
ERROR cuda/tests/test_cython.py
ERROR cuda/tests/test_interoperability.py
ERROR cuda/tests/test_kernelParams.py
ERROR cuda/tests/test_nvrtc.py
ERROR examples/0_Introduction/clock_nvrtc_test.py
ERROR examples/0_Introduction/simpleCubemapTexture_test.py
ERROR examples/0_Introduction/simpleP2P_test.py
ERROR examples/0_Introduction/simpleZeroCopy_test.py
ERROR examples/0_Introduction/systemWideAtomics_test.py
ERROR examples/0_Introduction/vectorAddDrv_test.py
ERROR examples/0_Introduction/vectorAddMMAP_test.py
ERROR examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py
ERROR examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py
ERROR examples/3_CUDA_Features/simpleCudaGraphs_test.py
ERROR examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py
ERROR examples/extra/isoFDModelling_test.py
ERROR examples/extra/jit_program_test.py
!!!!!!!!!!!!!!!!!!!!!! Interrupted: 21 errors during collection !!!!!!!!!!!!!!!!!!!!!!
================================= 21 errors in 0.62s =================================
(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python$
Line 104 in 6782a64
This line assumes that pyparsing
has been imported, but it is not imported in the setup.py
script.
This appears to be a transitive dependency of pyclibrary
, and cuda-python's setup.py
expects version 2.4.7. However, the latest version of pyclibrary
has a much looser pinning on pyparsing>=2.3.1,<4
: https://github.com/MatthieuDartiailh/pyclibrary/blob/1d4dbfc207afee3fd80b72e94c60d47d8263d49a/setup.py#L48
As of this writing, the latest pyparsing is 3.0.9. https://github.com/pyparsing/pyparsing/releases/tag/pyparsing_3.0.9
I'm not sure how to reconcile this. Is the error message out of date? Does a dependency on pyparsing
need to be added? Is pyparsing
3.0.9 allowable or not?
Here is an example traceback I encountered while attempting to package cuda-python for CUDA 12 on conda-forge (work in progress: conda-forge/cuda-python-feedstock#33).
Processing $SRC_DIR
Added file://$SRC_DIR to build tracker '/tmp/pip-build-tracker-8n7yddr7'
Running setup.py (path:$SRC_DIR/setup.py) egg_info for package from file://$SRC_DIR
Created temporary directory: /tmp/pip-pip-egg-info-3mflzan5
Preparing metadata (setup.py): started
Running command python setup.py egg_info
Parsing headers in "/home/conda/feedstock_root/build_artifacts/cuda-python_1682019216274/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/targets/x86_64-linux/include" (Caching False)
Missing header cuda.h
Missing header cudaProfiler.h
Missing header cudaEGL.h
Missing header cudaGL.h
Missing header cudaVDPAU.h
Parsing driver headers
Missing header driver_types.h
Missing header vector_types.h
Missing header cuda_runtime.h
Missing header surface_types.h
Missing header texture_types.h
Missing header library_types.h
Missing header cuda_runtime_api.h
Missing header device_types.h
Missing header driver_functions.h
Missing header cuda_profiler_api.h
Missing header cuda_egl_interop.h
Missing header cuda_gl_interop.h
Missing header cuda_vdpau_interop.h
Parsing runtime headers
Missing header nvrtc.h
Parsing nvrtc headers
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/home/conda/feedstock_root/build_artifacts/cuda-python_1682019216274/work/setup.py", line 103, in <module>
if pyparsing.__version__ != '2.4.7':
NameError: name 'pyparsing' is not defined
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.
CI link (this link will eventually expire, so the relevant portion is copied above): https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=693718&view=logs&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=841356e0-85bb-57d8-dbbc-852e683d1642
Currently cuda-python
relies on all binaries (like nvcc
), all headers, and all libraries to live in a single directory (specified by $CUDA_HOME
or similar).
However there are use cases (like cross-compilation, as with conda-build) where the build tools may live in one location (and perform builds on that architecture) whereas the headers and libraries may live in a different location (and target a different architecture). In this case not everything lives in $CUDA_HOME
.
It would be helpful to have a way of specifying where these different components come from. Here are some options:
$NVCC
for the nvcc
location$CUDA_BIN
(if specified) to get build tool directory$CUDA_HOME
Maybe there are other reasonable options worth considering?
Installing on Windows:
python -m pip install cuda-python
Then from python:
from cuda import cuda
Fails with
File "cuda\cuda.pyx", line 1, in init cuda.cuda
File "cuda\ccuda.pyx", line 1, in init cuda.ccuda
File "cuda\_cuda\ccuda.pyx", line 8, in init cuda._cuda.ccuda
ModuleNotFoundError: No module named 'win32api'
I can fix this by installing pypiwin32
manually. But I think it should be listed in requirements.txt if platform_system is Windows.
Thanks
venv "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\Python.exe"
Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]
Commit hash:
Installing torch and torchvision
Traceback (most recent call last):
File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 227, in
prepare_enviroment()
File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 150, in prepare_enviroment
run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch")
File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 33, in run
raise RuntimeError(message)
RuntimeError: Couldn't install torch.
Command: "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\python.exe" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
Error code: 1
stdout: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113
stderr: ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)
ERROR: No matching distribution found for torch==1.12.1+cu113
Press any key to continue . . .
Dear sir/madam:
When I inference on a deep learning model (slowfast model), I'm facing a problem that my python program seems to take more inference time in cuda env compared to cpu. It's not the whole model but one specific layer takes more time on cuda env than cpu. I'm so confused that hope someone can help me with it. Here is the details.
the specific layer is "slowway-conv1" layer as showned in the pic below representing the model structure of slowfast.
And my confusing result is as follows. the first for cuda and the second for cpu.
In cuda env, I found the processing time of "conv1" (0.97s) accounts for a great proportion of the processing time of the whole model (1.04s), while in cpu env, the processing time of "conv1" (0.07s) only accounts for a very small proportion of the processing time of the whole model (4.43s). And I reckon that the proportion in cpu env is reasonable considering the calculation budget.
Is my method of time measurement mistaken? I used the following code to measure time cost.
If it's my fault that causing the confusing result, please kindly point out, or please give me some ideas to help me solve this problem. Thank you very much!
Yours, Koala
Was looking at https://nvidia.github.io/cuda-python/motivation.html and noticed a broken link.
The "Numba" link sends you to https://numpy.org/ instead of https://numba.pydata.org/
On the API reference, there is no documentation about cudaStreamCreateWithFlags
, despite other streams functions mentioning it (as here).
Also, it was not clear for me how to specify flags in stream, for example, what is the value or where can I find CU_STREAM_DEFAULT
or CU_STREAM_NON_BLOCKING
.
I'm looking forward to migrate from pycuda to cuda-python, great to see this effort!
We're considering dropping support for Python 3.8 for the next release.
Per NEP 29, Python 3.8 was dropped on Apr 14th 2023.
Let us know if there's concerns in having Python 3.8 dropped next release. Thanks!
We're considering dropping package releases for ppc64le on PYPI and conda-nvidia channel in the next release. Source builds will continue to work and testing will continue.
Let us know if there's any concerns. Thanks!
~/cuda-python$ pip install -e .
Obtaining file:///home/vinuj/cuda-python
Requirement already satisfied: cython in /home/vinuj/anaconda3/lib/python3.9/site-packages (from cuda-python==11.7.1) (0.29.28)
Installing collected packages: cuda-python
Attempting uninstall: cuda-python
Found existing installation: cuda-python 11.7.1
Uninstalling cuda-python-11.7.1:
Successfully uninstalled cuda-python-11.7.1
Running setup.py develop for cuda-python
from cuda import cuda, cudart
ImportError: /home/vinuj/cuda-python/cuda/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv
jkh@megamind-> pytest
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --benchmark-skip
inifile: /home/jkh/Src/cuda-python/pytest.ini
rootdir: /home/jkh/Src/cuda-python
Python version is 3.9.5
I am doing some internal tests using CUDA Python, and I find the Python API Reference is hardly readable. Currently all APIs (functions, classes (=C structs), attributes, etc) are all dumped in the same page. Furthermore, the docs for all modules (cuda, cudart, nvrtc) are also coalesced in the same page, making the situation even worse. This is a screenshot of the gargantua page:
What I'd expect:
What we'll achieve by doing so:
https://nvidia.github.io/cuda-python/install.html#requirements
https://github.com/NVIDIA/cuda-python/blob/main/README.md#requirements
These two sections leave incorrect impressions:
Dear Developers,
I find that, in my project, if I initialize the cuda-python package with cuInit(0), then when using the pytorch to train a neural network, the torch package will raise an error claiming that its c++ backend engine can't find the right CUDA stream. I've checked other parts thoughly and, only when seperating the usage of these two packages into two seperate functions, i.e. isolating the different streams, things can workout.
I'm wondering if there's a good solution for this issue or some abuse need to be avoided to eliminate this issue.
PS: I'm also wondering if there'll be further python wrapper support for cusolver, cublas like this user-friendly and highly productive package?
Looking forward to hearing from you! Great thanks for your attention!
Best regards,
Mingran
They seem similar. Except pycuda is not opensource and does not support cuda graph. This is just a question because I am a little confused.
I change directories to try to run some examples.
(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction$ python clock_nvrtc_test.py
Traceback (most recent call last):
File "/home/nyck33/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction/clock_nvrtc_test.py", line 10, in <module>
from examples.common import common
ModuleNotFoundError: No module named 'examples'
What am I doing wrong?
I am looking at pypi package called absolufy-imports to try to get this going.
Dear developers,
I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.
For more details and to reproduce the issue, please refer to this StackOverflow question.
Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?
Thank you in advance
could you please update the readme about how to compile it in jetson agx xavier?
As per NEP 29, CUDA Python should support Python 3.11.
Conda packages released for cuda-python
to the nvidia
channel depend on cuda-toolkit
, i.e. the entire CUDA Toolkit.
https://anaconda.org/nvidia/cuda-python/files?version=12.0.0
The cuda-python
package should declare dependencies only on components that are actually used, which might be a more limited subset like these, which I found by reading extern from
declarations:
cuda.h
, cuda_runtime.h
, driver_types.h
, and other headers in https://github.com/NVIDIA/cuda-python/blob/main/cuda/ccudart.pxd.innvrtc.h
cuda-python/cuda/cnvrtc.pxd.in
Line 11 in 9ac2d31
cudaProfiler.h
Line 1868 in 9ac2d31
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.