nvidia / cuda-python Goto Github PK

CUDA Python Low-level Bindings

Home Page: https://nvidia.github.io/cuda-python/

License: Other

Python 82.52% C 0.42% C++ 7.88% Cython 6.28% Makefile 0.29% Batchfile 0.35% Shell 2.27%

cuda-python's Issues

cuMemExportToShareableHandle always returns CUDA_ERROR_INVALID_VALUE

The output variable cshareableHandle_ptr is incorrectly set to NULL and is passed to CUDA, thus always returning CUDA_ERROR_INVALID_VALUE.

The following similar APIs are likely not working either since they too have output variable void*:

cuMemExportToShareableHandle
cuDeviceGetNvSciSyncAttributes
cudaDeviceGetNvSciSyncAttributes
cuMemPoolExportToShareableHandle
cudaMemPoolExportToShareableHandle
cudaGraphMemFreeNodeGetParams
cuMemGetHandleForAddressRange

Dropping Python 3.7

We're considering dropping support for Python 3.7 for the next release.
Per NEP 29, Python 3.7 drop schedule was almost a year ago and many associated libraries have already dropped it.

Let us know if there's concerns in having Python 3.7 dropped next release. Thanks!

Cython 3: nogil must be at the end of the function signature line

I am experimenting with Cython 3.0.0 beta 1. It raises the following warning hundreds of times when cimport-ing cuda-python.

warning: /home/bdice/mambaforge/envs/cudf_cython_beta/lib/python3.10/site-packages/cuda/ccudart.pxd:1123:154: The keyword 'nogil' should appear at the end of the function signature line. Placing it before 'except' or 'noexcept' will be disallowed in a future version of Cython.

I think this expects nogil except ?cudaErrorCallRequiresNewerDriver to be replaced by except ?cudaErrorCallRequiresNewerDriver nogil. If we could fix this, it would be a massive improvement to the compiler diagnostics that are shown during builds, which are otherwise overwhelmed by the sheer volume of this message.

PyCuda Error: The context stack was not empty upon module cleanup. (When running cron)

Hi, I am running into the ''Context stack was not empty" error when I try to run a computer vision script on a webcam with a gstream pipeline. It works fine when running it from the terminal but when I schedule it with corn it runs into this error. Any suggestions?

RuntimeError: Function "cuMemAllocAsync" not found

When I ran the example file TensorFlowToTensorRT-NHWC.py, it occurs：Traceback (most recent call last):
File "TensorFlowToTensorRT-NHWC.py", line 161, in
_, inputD0 = cudart.cudaMallocAsync(inputH0.nbytes, stream)
File "cuda/cudart.pyx", line 16938, in cuda.cudart.cudaMallocAsync
File "cuda/ccudart.pyx", line 1210, in cuda.ccudart.cudaMallocAsync
File "cuda/_cuda/ccuda.pyx", line 4970, in cuda._cuda.ccuda._cuMemAllocAsync
RuntimeError: Function "cuMemAllocAsync" not found.

I'm using the NVIDIA NGC nvcr.io/nvidia/tensorflow:21.12-tf1-py3, the detailed environment are as follows:
GeForce RTX 2080 Ti，Driver Version: 455.23.05，nvcr.io/nvidia/tensorflow:21.12-tf1-py3，
Package Version

absl-py 1.0.0
appdirs 1.4.4
argon2-cffi 21.1.0
asgiref 3.4.1
astor 0.8.1
astunparse 1.6.3
attrs 21.2.0
audioread 2.1.9
backcall 0.2.0
bleach 4.1.0
cachetools 4.2.4
certifi 2021.10.8
cffi 1.15.0
charset-normalizer 2.0.8
click 8.0.3
cloudpickle 2.0.0
cmake-setuptools 0.1.3
cuda-python 11.7.0
cudf 21.10.0a0+345.ge05bd4bf3c
cugraph 21.10.0a0+102.gab401cad
cuml 21.10.0a0+116.gdc14361ba
cupy-cuda114 9.3.0
cupy-cuda115 9.6.0
cycler 0.11.0
Cython 0.29.24
dask 2021.9.1
dask-cuda 21.10.0
dask-cudf 21.10.0a0+345.ge05bd4bf3c
dask-glm 0.2.0
dask-ml 1.9.0
debugpy 1.5.1
decorator 5.1.0
defusedxml 0.7.1
distributed 2021.9.1
Django 3.2.6
entrypoints 0.3
fastavro 1.4.4
fastrlock 0.8
filelock 3.4.0
flatbuffers 1.12
fsspec 2021.7.0
future 0.18.2
gast 0.3.3
google-pasta 0.2.0
graphsurgeon 0.4.5
grpcio 1.42.0
gunicorn 20.1.0
h11 0.12.0
h5py 2.10.0
HeapDict 1.0.1
horovod 0.22.1
httptools 0.2.0
huggingface-hub 0.0.12
idna 3.3
importlib-metadata 4.8.2
importlib-resources 5.4.0
iniconfig 1.1.1
ipykernel 6.6.0
ipython 7.30.0
ipython-genutils 0.2.0
jedi 0.18.1
Jinja2 3.0.3
joblib 1.1.0
json5 0.9.6
jsonschema 4.2.1
jupyter-client 7.1.0
jupyter-core 4.9.1
jupyter-tensorboard 0.2.0
jupyterlab 2.3.2
jupyterlab-pygments 0.1.2
jupyterlab-server 1.2.0
jupytext 1.13.2
Keras-Applications 1.0.8
Keras-Preprocessing 1.0.5
kiwisolver 1.3.2
librosa 0.9.1
llvmlite 0.36.0
locket 0.2.1
Markdown 3.3.6
markdown-it-py 1.1.0
MarkupSafe 2.0.1
matplotlib 3.4.3
matplotlib-inline 0.1.3
mdit-py-plugins 0.2.8
mistune 0.8.4
mock 3.0.5
msgpack 1.0.3
multipledispatch 0.6.0
nbclient 0.5.9
nbconvert 6.3.0
nbformat 5.1.3
nest-asyncio 1.5.4
networkx 2.6.3
nltk 3.6.4
notebook 6.4.3
numba 0.53.1
numpy 1.22.4
nvidia-dali-cuda110 1.8.0
nvidia-dali-tf-plugin-cuda110 1.8.0
nvidia-dlprofviewer 1.8.0
nvidia-pyindex 1.0.9
nvtx 0.2.3
onnx 1.11.0
onnxruntime-gpu 1.11.1
opencv-python 4.5.5.64
opt-einsum 3.3.0
packaging 21.3
pandas 1.2.5
pandocfilters 1.5.0
parso 0.8.3
partd 1.2.0
pexpect 4.7.0
pickleshare 0.7.5
Pillow 8.4.0
pip 21.3.1
pluggy 1.0.0
polygraphy 0.33.0
pooch 1.6.0
portpicker 1.3.1
prometheus-client 0.12.0
prompt-toolkit 3.0.23
protobuf 3.19.1
psutil 5.7.0
ptyprocess 0.7.0
py 1.11.0
pyarrow 5.0.0
pycparser 2.21
Pygments 2.10.0
pynvml 11.4.1
pyparsing 3.0.6
pypi-kenlm 0.1.20210121
pyrsistent 0.18.0
pytest 6.2.5
python-dateutil 2.8.2
python-dotenv 0.19.2
pytz 2021.3
PyYAML 6.0
pyzmq 22.3.0
regex 2021.11.10
requests 2.26.0
resampy 0.2.2
rmm 21.10.0a0+42.gae27a57
sacremoses 0.0.46
scikit-learn 0.24.0
scipy 1.4.1
Send2Trash 1.8.0
setuptools 59.4.0
six 1.16.0
sortedcontainers 2.4.0
SoundFile 0.10.3.post1
sqlparse 0.4.2
tblib 1.7.0
tensorboard 1.15.0
tensorflow 1.15.5+nv
tensorflow-estimator 1.15.1
tensorrt 8.2.1.8
termcolor 1.1.0
terminado 0.12.1
testpath 0.5.0
tf2onnx 1.10.1
threadpoolctl 3.0.0
tokenizers 0.10.3
toml 0.10.2
toolz 0.11.2
tornado 6.1
tqdm 4.62.3
traitlets 5.1.1
transformers 4.9.1
treelite 2.1.0
treelite-runtime 2.1.0
typing_extensions 4.0.1
ucx-py 0.21.0a0+37.gbfa0450
uff 0.6.9
urllib3 1.26.7
uvicorn 0.15.0
uvloop 0.16.0
watchgod 0.7
wcwidth 0.2.5
webencodings 0.5.1
websockets 10.1
Werkzeug 2.0.2
wheel 0.37.0
whitenoise 5.3.0
wrapt 1.13.3
xgboost 1.4.2
zict 2.0.0
zipp 3.6.0

Internal memory allocation management and garbage collection

Great project guys! I have a single request: Instead exposing the cudaMemoryAlloc etc APIs and then leaving it up to the programmer to perform garbage collection- could the memory management strategy happen internally?

Because I already can foresee a plethora of issues, wherein programmers forget to account for threads left on the kernel, along with memory leaks, access violations, segment faults etc which already happen for operating systems however now we introduce these absenteeisms to the GPU.

On a side note: Do you have any plans either with the foundries (AMD, Intel, TSMC, etc) for GPU based Processor-In-Memory ( PIM ) architecture or are these experimentals exclusive to the RAM developers ( Samsung ) only ?!

Regards

Adopting a set of "supported" python versions

Right now the project doesn't have any set of explicitly supported python versions. NEP 29 provides an example of how this can be done:

All minor versions of Python released 42 months prior to the project, and at minimum the two latest minor versions.

Minimum Python ... version support should be adjusted upward on [a] major and minor release, but never on a patch release.

This language also allows forecasting of python versions and forecasting (of some degree) of the resources required to maintain the project due to PEP 602 which normalizes the release schedule of python versions.

There are at least two areas this practically impacts:

Support for version specific issues. Having a specified set of support versions allows some version specific issues to be termed in or out of scope, and be prioritized appropriately.
Binary distributions are currently made available on pypi and the nvidia channel of conda-forge, this bounds for which versions of python the binaries are targeted.

Failed to dlopen libcuda.so in WSL environment

from cuda import cuda
cuda.cuInit(0)

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [2], in <module>
----> 1 cuda.cuInit(0)

File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/cuda.pyx:8876, in cuda.cuda.cuInit()

File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/ccuda.pyx:17, in cuda.ccuda.cuInit()

File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:3553, in cuda._cuda.ccuda._cuInit()

File ~/miniconda3/envs/dev/lib/python3.8/site-packages/cuda/_cuda/ccuda.pyx:424, in cuda._cuda.ccuda.cuPythonInit()

RuntimeError: Failed to dlopen libcuda.so

This is because in a WSL environment libcuda.so lives in /usr/lib/wsl/lib which is not in the default search path of dlopen. For libraries that link against libcuda this isn't a problem because there's a file at /etc/ld.so.conf.d/ld.wsl.conf which instructs the linker as to where it can find the libraries, but unfortunately dlopen doesn't use this.

As a workaround, adding /usr/lib/wsl/lib to the LD_LIBRARY_PATH environment variable resolves the problem.

Use python exceptions instead of `err, ... =`

Congratulations on the GA release! 🥳

I've been looking forward to the cuda bindings for a while, and was just looking through the docs.

The overview notes an implementation of ASSERT_DRV, which already contains the caveat:

In a future release, this may automatically raise exceptions using a Python object model.

I'm not sure if that means that the errors are going to be subclasses of something like a CUDAError, or if that is to be interpreted some other way, but in any case, I was quite surprised about this choice of exception API

Why not make the functions raise err by default? Right now, IIUC, every invocation would need to accept an extra err-return (and handle it with something like ASSERT_DRV). This seems like a really onerous task to achieve the default behaviour of "fail in case of something unexpected" (and actively choosing where to introduce try... except: handling to continue even if things fail).

It seems like a bad trade-off for me (high verbosity, and easy to forget adding an ASSERT_DRV), but maybe I'm overlooking something?

The reasons I'm raising this right now, is that this would be a pretty fundamental API change, and if there's any chance at all (assuming it's not already "zero" after GA), it would be ASAP.

Sample code execution failure: globalToShmemAsyncCopy.py

jkh@megamind-> env CUDA_HOME=~/Src/cuda-python python globalToShmemAsyncCopy.py
[globalToShmemAsyncCopy] - Starting...
Traceback (most recent call last):
File "/home/jkh/Src/cuda-python/examples/0_Simple/globalToShmemAsyncCopy.py", line 1054, in
main()
File "/home/jkh/Src/cuda-python/examples/0_Simple/globalToShmemAsyncCopy.py", line 1022, in main
major = checkCudaErrors(cudart.cudaDeviceGetAttribute(cudart.cudaDeviceAttr.cudaDevAttrComputeCapabilityMajor, devID))
File "/home/jkh/Src/cuda-python/examples/common/helper_cuda.py", line 24, in checkCudaErrors
raise RuntimeError("CUDA error code={}({})".format(result[0].value, _cudaGetErrorEnum(result[0])))

Note that I can get some of the other samples to pass, so this does not seem like a generic error.

First base of 'CUkernelNodeAttrValue_v1' is not an extension type

I'm trying to compile cuda-python in a fairly minimal conda environment (nothing installed but the requirements), with cuda-11.6 installed, and seeing several instances of the following sort of error:

      Error compiling Cython file:
      ------------------------------------------------------------
      ...
              Get memory address of class instance

          """
          pass

      cdef class CUkernelNodeAttrValue_v1(CUlaunchAttributeValue_union):
                                         ^
      ------------------------------------------------------------

      cuda/cuda.pxd:2637:36: First base of 'CUkernelNodeAttrValue_v1' is not an extension type

I do have other cuda versions installed alongside 11.6 but judging from the output of Parsing headers in "/usr/local/cuda-11.6/include" it seems like it's probably finding the right version? Any advice on how to get past this, or debug it? Thanks!

Cannot use CUDA Python to interact with external C/C++ code

The problem

The Cython example below demonstrates an attempt to use CUDA Python to interact with some external C++ code. Note that the "external code" is included inline in the Cython.

# distutils: language=c++
# distutils: extra_compile_args=-I/usr/local/cuda/include/

from cuda.ccudart cimport cudaMemAllocationHandleType

cdef extern from *:
    """
    #include <cuda_runtime_api.h>
    
    void foo(cudaMemAllocationHandleType x) {
        return;
    }
    """
    void foo(cudaMemAllocationHandleType x)

foo(cudaMemAllocationHandleType.cudaMemHandleTypeNone)

The external code is a function foo that accepts a cudaMemAllocationHandleType. We attempt to invoke that function from Cython by passing in a cuda.ccudart.cudaMemAlloccationHandleType, but this fails with an error like:

error: cannot convert '__pyx_t_4cuda_7ccudart_cudaMemAllocationHandleType' to 'cudaMemAllocationHandleType'
 4857 |   foo(__pyx_e_4cuda_7ccudart_cudaMemHandleTypeNone);

To reproduce the problem, save the example above to a flle foo.pyx, then run cythonize -i foo.pyx.

Why this happens

This is because the function foo expects a cudaMemAllocationHandleType that is defined in the CUDA runtime library. But CUDA Python "rewrites" the runtime library at the Cython layer, and has its own cudaMemAllocationHandleType (which ends up with a mangled name when transpiled from Cython to C++). The two are not interchangeable.

A potential solution

A potential solution, proposed by @leofang in an offline discussion, is to use extern declarations for types in ccudart.pxd, rather than to redefine them. For example:

diff --git a/cuda/ccudart.pxd b/cuda/ccudart.pxd
index 57e1e96..6c0b5d4 100644
--- a/cuda/ccudart.pxd
+++ b/cuda/ccudart.pxd
@@ -678,11 +678,12 @@ cdef enum cudaMemAllocationType:
     cudaMemAllocationTypePinned = 1
     cudaMemAllocationTypeMax = 2147483647
 
-cdef enum cudaMemAllocationHandleType:
-    cudaMemHandleTypeNone = 0
-    cudaMemHandleTypePosixFileDescriptor = 1
-    cudaMemHandleTypeWin32 = 2
-    cudaMemHandleTypeWin32Kmt = 4
+cdef extern from 'driver_types.h':
+    ctypedef enum cudaMemAllocationHandleType 'cudaMemAllocationHandleType':
+        cudaMemHandleTypeNone = 0
+        cudaMemHandleTypePosixFileDescriptor = 1
+        cudaMemHandleTypeWin32 = 2
+        cudaMemHandleTypeWin32Kmt = 4
 
 cdef struct cudaMemPoolProps:
     cudaMemAllocationType allocType
diff --git a/setup.py b/setup.py
index 394166e..16fad9f 100644
--- a/setup.py
+++ b/setup.py
@@ -30,6 +30,7 @@ except Exception:
 
 include_dirs = [
     os.path.dirname(sysconfig.get_path("include")),
+    '/usr/local/cuda-11.4/include',
 ]
 
 library_dirs = [get_python_lib(), os.path.join(os.sys.prefix, "lib")]

Gotcha

Currently, we ship a single version of CUDA Python that is built with the latest CUDA toolkit, and we expect it to work for older minor versions of the CUDA toolkit by leveraging CUDA enhanced compatibility.

Historically, there have been cases when the runtime API has changed across minor versions of the CUDA toolkit. In particular, the names/ordering of enum members have changed between minor versions. For example, in CUDA 10.1, there was a typo in the enum member cudaErrorDeviceUninitilialized that was fixed in 10.2.

It's not clear how we would handle the situation if something like that were to happen again. In the example above, we would have to have separate extern declarations for 10.1 and 10.2 somehow.

cudart.cudaSetDevice allocates memory on GPU other than target

cuda-python 11.6.1
cuda toolkit 11.2
Ubuntu Linux

If you run something like the following on a multi-GPU machine

device_num = 5
err, = cuda.cuInit(0)
err, device = cuda.cuDeviceGet(device_num)
err, cuda_context = cuda.cuCtxCreate(0, device)
err, = cudart.cudaSetDevice(device)

The call to cudart.cudaSetDevice will properly set your device to '5', but it will also allocate ~305 MB of memory on device 0 (or whichever is the 0th device in the device list provided by CUDA_VISIBLE_DEVICES). I think this issue (possibly in the C-CUDA runtime underneath?) may possibly be the root of many downstream issues in libraries like Tensorflow and Pytorch who have similar issues where a user selects a device but still gets tons of allocations on other devices. This 305 MB may not sound like a lot, but I'm running a program on an Nvidia-DGX with 16 GPUs and I have 64 worker processes, causing 64*305 = 19GB of unusable space to be allocated on GPU 0, which crashes the program. I cannot simply set CUDA_VISIBLE_DEVICES to correct this problem because the workers are communicating via shared GPU memory (via cuIPCMemHandle) with their parent process, and the parent process needs access to all GPUs. Additionally, the worker processes are performing data augmentation on one GPU, while writing output to another GPU with a different device ID.

I am trying to investigate a workaround to not call 'cudart.cudaSetDevice' at all, but when it is not called I cannot properly use the pointer given by cuda.cuMemAlloc to create a PyTorch tensor. When I call cudart.cudaSetDevice, I am able to use the pointer properly.

pytest crashes

jkh@megamind-> pytest
============================= test session starts ==============================
platform linux -- Python 3.8.3, pytest-5.4.3, py-1.9.0, pluggy-0.13.1
rootdir: /home/jkh/Src/cuda-python, inifile: pytest.ini
plugins: typeguard-2.11.1
collecting ... Fatal Python error: Segmentation fault

Current thread 0x00007f1f80dc5740 (most recent call first):
File "/home/jkh/.local/lib/python3.8/site-packages/llvmlite/binding/ffi.py", line 113 in call
File "/home/jkh/.local/lib/python3.8/site-packages/llvmlite/binding/targets.py", line 60 in get_host_cpu_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 881 in get_host_cpu_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 782 in _get_host_cpu_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 846 in _customize_tm_features
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 652 in _init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/codegen.py", line 645 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/cpu.py", line 47 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/compiler_lock.py", line 32 in _acquire_compile_lock
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/base.py", line 259 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/registry.py", line 31 in _toplevel_target_context
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/utils.py", line 332 in get
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/registry.py", line 47 in target_context
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/dispatcher.py", line 670 in init
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/decorators.py", line 187 in wrapper
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/decorators.py", line 173 in jit
File "/home/jkh/.local/lib/python3.8/site-packages/numba/core/decorators.py", line 236 in njit
File "/home/jkh/.local/lib/python3.8/site-packages/numba/typed/typeddict.py", line 23 in
File "", line 219 in _call_with_frames_removed
File "", line 783 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/.local/lib/python3.8/site-packages/numba/typed/init.py", line 1 in
File "", line 219 in _call_with_frames_removed
File "", line 783 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/.local/lib/python3.8/site-packages/numba/init.py", line 298 in
File "", line 219 in _call_with_frames_removed
File "", line 783 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/Src/cuda-python/cuda/benchmarks/test_numba.py", line 11 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/assertion/rewrite.py", line 152 in exec_module
File "", line 671 in _load_unlocked
File "", line 975 in _find_and_load_unlocked
File "", line 991 in _find_and_load
File "/home/jkh/anaconda3/lib/python3.8/site-packages/py/_path/local.py", line 704 in pyimport
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 511 in _importtestmodule
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 443 in _getobj
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 261 in obj
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 459 in _inject_setup_module_fixture
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/python.py", line 446 in collect
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 264 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 244 in from_call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 264 in pytest_make_collect_report
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/runner.py", line 382 in collect_one_node
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 681 in genitems
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 684 in genitems
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 490 in _perform_collect
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 452 in perform_collect
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 257 in pytest_collection
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 246 in _main
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 191 in wrap_session
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/main.py", line 240 in pytest_cmdline_main
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/callers.py", line 187 in _multicall
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 84 in
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/manager.py", line 93 in _hookexec
File "/home/jkh/anaconda3/lib/python3.8/site-packages/pluggy/hooks.py", line 286 in call
File "/home/jkh/anaconda3/lib/python3.8/site-packages/_pytest/config/init.py", line 124 in main
File "/home/jkh/anaconda3/bin/pytest", line 11 in
Segmentation fault (core dumped)

No module named 'cuda._lib'; 'cuda' is not a package

After following the steps on cuda-python to install cuda-python with conda instruction, I try to

from cuda import cuda, nvrtc

as in the example in the pycharm python console, but it raises an error:

Traceback (most recent call last):
  File "D:\Anaconda\envs\hierot\lib\code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
  File "cuda\cuda.pyx", line 1, in init cuda.cuda
    # Copyright 2021-2022 NVIDIA Corporation.  All rights reserved.
  File "D:\PyCharm Community Edition 2022.1.3\plugins\python-ce\helpers\pydev\_pydev_bundle\pydev_import_hook.py", line 21, in do_import
    module = self._system_import(name, *args, **kwargs)
ModuleNotFoundError: No module named 'cuda._lib'; 'cuda' is not a package

But the code above can be successfully run in the terminal

(hierot) D:\Projects\SimPlatform>python
Python 3.9.13 (main, Aug 25 2022, 23:51:50) [MSC v.1916 64 bit (AMD64)] :: Anaconda, Inc. on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> from cuda import cuda, nvrtc
>>>

Please help me with the problem, thanks in advance. Further information provided on request.

I searched with

ModuleNotFoundError: No module named 'xxx'

Solutions suggest configure correct python interpreter, but I believe my interpreter is already properly configured.

And search with

No module named 'xxx'; 'yyy' is not a package

Some says the cause is the name cuda is shadowed by the package name cuda, I think it might be the problem. Please check this.

cudaErrorInvalidMemcpyDirection when supplying value from MemcpyKind to cudaMemcpy2D

I am currently stuck on trying to get cudaMemcpy2D working. It reports the error cudaErrorInvalidMemcpyDirection when supplying the function with any value from MemcpyKind. I have reinstalled and double checked my CUDA and toolkit installs multiple times and have tested on my own and a colleagues machine, so I'm positive they are fine.

A minimal example that shows the problem I am facing is as follows:

from cuda import cuda, cudart
from cuda_utils import cuda_check_errors
import numpy as np

checkCudaErrors(cuda.cuInit(0))
device_id = checkCudaErrors(cuda.cuDeviceGet(0))
context = checkCudaErrors(cuda.cuCtxCreate(0, device_id ))

def allocate_np_array(array: np.ndarray):
    rows, cols = array.shape
    device_ptr, pitch = checkCudaErrors(cudart.cudaMallocPitch(cols * array.itemsize, rows)) 

    checkCudaErrors(
        cudart.cudaMemcpy2D(
            device_ptr,
            pitch,
            array.ctypes.data,
            array.shape[1] * array.itemsize,
            array.shape[1] * array.itemsize,
            array.shape[0],
            cudart.cudaMemcpyKind.cudaMemcpyHostToDevice
        ))


if __name__ == "__main__":
    arr = np.ones((40, 40), dtype=np.float32)
    allocate_np_array(arr)

The checkCudaErrors function is the same one used in this repos examples.
The output looks as follows (with filepath removed):

Traceback (most recent call last):
  File "test.py", line 27, in <module>
    allocate_np_array(arr)
  File "test.py", line 13, in allocate_np_array
    cuda_check_errors(
  File "/.../cuda_utils.py", line 22, in checkCudaErrors
    raise RuntimeError("CUDA Error code: {} ({})".format(result[0].value, cuda_get_error_enum(result[0])))
RuntimeError: CUDA Error code: 21 (cudaErrorInvalidMemcpyDirection)

It might be important to note that I am new to CUDA and this is my first real issue post, so don't be afraid to point out my errors in either. Thanks in advance!

`cudart.cudaSetDevice` before `cudart.cudaGetDevice` produces invalid results

Using an environment with:

mamba create -n testing -c nvidia -c conda-forge python=3.9 'cuda-toolkit>=11.7' 'cuda-python>=11.7'

from cuda import cudart

print(cudart.cudaSetDevice(2))
print(cudart.cudaGetDevice())

(<cudaError_t.cudaSuccess: 0>,)
(<cudaError_t.cudaSuccess: 0>, 0)

Expected result: the cudaGetDevice() call should return device 2, not device 0.

The problem appears to be because cudaSetDevice only calls ccudart.utils.lazyInitGlobal, whereas cudaGetDevice calls ccudart.utils.lazyInit (which calls lazyInitDevice(0)).

I think that cudaGetDevice just needs to not call lazyInit (the case of no context being in place is handled by the branch that calls cudaSetDevice(0))

https://github.com/NVIDIA/cuda-python/blob/main/cuda/_lib/ccudart/ccudart.pyx#L1039-L1045

Plausibly a patch like this?

diff --git a/cuda/_lib/ccudart/ccudart.pyx b/cuda/_lib/ccudart/ccudart.pyx
index d42d594..d7f3602 100644
--- a/cuda/_lib/ccudart/ccudart.pyx
+++ b/cuda/_lib/ccudart/ccudart.pyx
@@ -1032,9 +1032,6 @@ cdef cudaError_t _cudaGetDevice(int* device) nogil except ?cudaErrorCallRequires
     cdef cudaError_t err
     cdef ccuda.CUresult err_driver
     cdef ccuda.CUcontext context
-    err = m_global.lazyInit()
-    if err != cudaSuccess:
-        return err
 
     err_driver = ccuda._cuCtxGetCurrent(&context)
     if err_driver == ccuda.cudaError_enum.CUDA_ERROR_INVALID_CONTEXT or (err_driver == ccuda.cudaError_enum.CUDA_SUCCESS and context == NULL):
@@ -1045,14 +1042,16 @@ cdef cudaError_t _cudaGetDevice(int* device) nogil except ?cudaErrorCallRequires
         err_driver = ccuda._cuCtxGetCurrent(&context)
 
     if err_driver != ccuda.cudaError_enum.CUDA_SUCCESS:
-        _setLastError(err)
-        return err
+        _setLastError(<cudaError_t>err_driver)
+        return <cudaError_t>err
 
     found = False
     for deviceOrdinal in range(m_global._numDevices):
         if m_global._driverContext[deviceOrdinal] == context:
             found = True
             break
+    else:
+        return cudaErrorDeviceUninitialized
     device[0] = deviceOrdinal if found else 0
     return cudaSuccess

Note this has two other fixes:

in the case where err_driver != CUDA_SUCCESS actually return the error code
If after all this, we still can't find a context, return cudaErrorDeviceUninitialized (not sure if this is the correct error code)

Missing cudaLaunchKernel

It seems to be missing cudaLaunchKernel.

>>> from cuda import cudart
>>> print(cudart.cudaLaunchKernel)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'cuda.cudart' has no attribute 'cudaLaunchKernel'

strncat stringop-overflow warning

Hi,
I got the following strncat stringop-overflow warning when compiling.
I think it is better to add an appropriate check for the string length (for example, kumattau@f442d65)

gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I./cuda -I./cuda/_cuda -I/usr/local/python/include/python3.8 -c cuda/_cuda/loader.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/loader.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
In file included from /usr/include/string.h:495,
                 from /usr/include/c++/9/cstring:42,
                 from cuda/_cuda/loader.cpp:10:
In function ‘char* strncat(char*, const char*, size_t)’,
    inlined from ‘char* replaceSystemPath(char*)’ at cuda/_cuda/loader.cpp:219:12,
    inlined from ‘int dxcore_check_adapter(dxcore_lib*, char*, dxcore_adapterInfo*)’ at cuda/_cuda/loader.cpp:246:43,
    inlined from ‘int dxcore_enum_adapters(dxcore_lib*, char*)’ at cuda/_cuda/loader.cpp:290:34,
    inlined from ‘int getCUDALibraryPath(char*, bool)’ at cuda/_cuda/loader.cpp:345:29:
/usr/include/x86_64-linux-gnu/bits/string_fortified.h:136:34: warning: ‘char* __builtin___strncat_chk(char*, const char*, long unsigned int, long unsigned int)’ specified bound 260 equals destination size [-Wstringop-overflow=]
  136 |   return __builtin___strncat_chk (__dest, __src, __len, __bos (__dest));
      |          ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Docs Updates

The logo should have a transparent background instead of a white one
Regarding installation: Avoid python setup.py ... in favor of pip install ....

I think the code written in python needs Doctests

In order to prevent errors, a continuous tested system and make an efficient architecture for CUDA API to run smoothly there is a need for Doctests for the python function

The code is complex and dealing with different paradigms
Doctests will improve code quality, documentation standards and will ensure all the functionality is correct and updated
Reasons of the output will be at run time

You can avoid crashes, bugs, and the result would be an efficient system

References are :

https://dev.to/perigk/doctests-the-shy-giant-of-testing-modules-3g74
https://pymotw.com/2/doctest/

poetry unable to find package

Hello, I'm running into the following issues anytime I try installing cuda-python via poetry. I also tried considering

cuda-python = {url = "https://files.pythonhosted.org/packages/bb/3f/0c38c8716a3a15d71c94696fd43290ed4d4f0361d36409f68ffb15478593/cuda_python-12.3.0-cp311-cp311-win_amd64.whl"}

but no luck. I would greatly appreciate any advice around this

Package operations: 1 install, 3 updates, 0 removals

  • Updating urllib3 (1.26.18 -> 2.0.7)
  • Updating protobuf (3.20.3 -> 4.24.4)
  • Updating types-requests (2.31.0.6 -> 2.31.0.10)
  • Installing cuda-python (12.3.0): Failed

  RuntimeError

  Unable to find installation candidates for cuda-python (12.3.0)

  at /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/poetry/installation/chooser.py:109 in choose_for
      105│ 
      106│             links.append(link)
      107│ 
      108│         if not links:
    → 109│             raise RuntimeError(f"Unable to find installation candidates for {package}")
      110│ 
      111│         # Get the best link
      112│         chosen = max(links, key=lambda link: self._sort_key(package, link))

Renaming

Capturing this conversations from a few different places to the soon to be public repo. The top level package is currently cudapython, however we would like to rename it to cuda since having python in the package name is redundant.

from cudapython import cuda, nvrtc

becomes...

from cuda import cuda, nvrtc

The PIP and conda package will be cuda.

cc @jakirkham

Documented return types in cudart.pyx are inconsistent with implementation.

The function signature of many APIs in the docs appear to return a 2-tuple like (status, None) but the actual return value is a 1-tuple like (status,).

For example:
Docs link: https://nvidia.github.io/cuda-python/api.html#cuda.cudart.cudaSetDevice
Source:

cuda-python/cuda/cudart.pyx

Lines 7903 to 7910 in d7a354d

    
               Returns 
        
               ------- 
        
               cudaError_t 
        
                   cudaSuccess 
        
                   cudaErrorInvalidDevice 
        
                   cudaErrorDeviceAlreadyInUse 
        
               None 
        
                   None

Rendered docs:

In some cases, there is a second element like a string or memory pool in the tuple but the documented type indicates it as None.

For example:

cuda-python/cuda/cudart.pyx

Lines 7164 to 7193 in d7a354d

    
           def cudaGetErrorName(error not None : cudaError_t): 
        
               """ Returns the string representation of an error code enum name. 
        
               Returns a string containing the name of an error code in the enum. If 
        
               the error code is not recognized, "unrecognized error code" is 
        
               returned. 
        
               Parameters 
        
               ---------- 
        
               error : cudaError_t 
        
                   Error code to convert to string 
        
               Returns 
        
               ------- 
        
               cudaError_t 
        
                   `char*` pointer to a NULL-terminated string 
        
               None 
        
                   None 
        
               See Also 
        
               -------- 
        
               cudaGetErrorString 
        
               cudaGetLastError 
        
               cudaPeekAtLastError 
        
               cudaError 
        
               cuGetErrorName 
        
               """ 
        
               cdef ccudart.cudaError_t cerror = error.value 
        
               err = ccudart.cudaGetErrorName(cerror) 
        
               return (cudaError_t.cudaSuccess, err)

cuda-python/cuda/cudart.pyx

Lines 7641 to 7670 in d7a354d

    
           def cudaDeviceGetDefaultMemPool(int device): 
        
               """ Returns the default mempool of a device. 
        
               The default mempool of a device contains device memory from that 
        
               device. 
        
               Returns 
        
               ------- 
        
               cudaError_t 
        
                   cudaSuccess 
        
                   cudaErrorInvalidDevice 
        
                   cudaErrorInvalidValue 
        
                   cudaErrorNotSupported 
        
               None 
        
                   None 
        
               See Also 
        
               -------- 
        
               cuDeviceGetDefaultMemPool 
        
               cudaMallocAsync 
        
               cudaMemPoolTrimTo 
        
               cudaMemPoolGetAttribute 
        
               cudaDeviceSetMemPool 
        
               cudaMemPoolSetAttribute 
        
               cudaMemPoolSetAccess 
        
               """ 
        
               cdef cudaMemPool_t memPool = cudaMemPool_t() 
        
               with nogil: 
        
                   err = ccudart.cudaDeviceGetDefaultMemPool(<ccudart.cudaMemPool_t*>memPool._ptr, device) 
        
               return (cudaError_t(err), memPool)

~~Let me know if you'd like me to work on this issue. I would be happy to contribute a pull request to improve the docs. 👍~~ I see PRs are not currently accepted.

limit the amount of memory a process can allocate on a single CUDA device

Hi all,

As the title suggests, is there a way to limit the total amount of memory that a process can allocate on a single CUDA device?

Perhaps, even by using pyNVML?

This issue is related to the following discussions:

What are the cons of sharing the resources of a single CUDA device among different processes competing for access?

cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime

The current implementation of cuda.cudart.cudaRuntimeGetVersion() hard-codes the runtime version, rather than querying the runtime for its version. This results in incorrect runtime versions if the runtime version is different from the version of cuda-python.

cuda-python/cuda/_lib/ccudart/ccudart.pyx

Lines 79 to 82 in 746b773

    
           cdef cudaError_t _cudaRuntimeGetVersion(int* runtimeVersion) nogil except ?cudaErrorCallRequiresNewerDriver: 
        
               cdef cudaError_t err 
        
               runtimeVersion[0] = m_global.CUDART_VERSION 
        
               return cudaSuccess

cuda-python/cuda/_lib/ccudart/utils.pyx

Line 37 in 746b773

self.CUDART_VERSION = 11060

Additional context

A workaround used in rapidsai/rmm#946 is to use numba's API for this instead:

import numba.cuda

def cudaRuntimeGetVersion():
    major, minor = numba.cuda.runtime.get_version()
    return major * 1000 + minor * 10

Exploring CUDA-Aware MPI Transmission in cuda-python

Hi Developers,

As a newcomer to cuda-python, I'm eager to learn how to implement CUDA-aware MPI transmission. I'd appreciate any guidance or resources on this topic. Thank you!

Fails to build on AmazonLinux

Hi,

I checked out the package and tried to build it on AmazonLinux but it fails to compile. Please see the build output below. I also tried all other commands there were mentioned in installation guide, but all failed with the same issue.

Cuda : 11.2
GCC: 9.3

$ python setup.py build
Compiling cuda/_cuda/ccuda.pyx because it changed.
Compiling cuda/_cuda/cnvrtc.pyx because it changed.
[1/2] Cythonizing cuda/_cuda/ccuda.pyx
[2/2] Cythonizing cuda/_cuda/cnvrtc.pyx
Compiling cuda/_lib/utils.pyx because it changed.
[1/1] Cythonizing cuda/_lib/utils.pyx
Compiling cuda/_lib/ccudart/ccudart.pyx because it changed.
Compiling cuda/_lib/ccudart/utils.pyx because it changed.
[1/2] Cythonizing cuda/_lib/ccudart/ccudart.pyx
[2/2] Cythonizing cuda/_lib/ccudart/utils.pyx
Compiling cuda/ccuda.pyx because it changed.
Compiling cuda/ccudart.pyx because it changed.
Compiling cuda/cnvrtc.pyx because it changed.
Compiling cuda/cuda.pyx because it changed.
Compiling cuda/cudart.pyx because it changed.
Compiling cuda/nvrtc.pyx because it changed.
[1/6] Cythonizing cuda/ccuda.pyx
[2/6] Cythonizing cuda/ccudart.pyx
[3/6] Cythonizing cuda/cnvrtc.pyx
[4/6] Cythonizing cuda/cuda.pyx
[5/6] Cythonizing cuda/cudart.pyx
[6/6] Cythonizing cuda/nvrtc.pyx
Compiling cuda/tests/test_ccuda.pyx because it changed.
Compiling cuda/tests/test_ccudart.pyx because it changed.
Compiling cuda/tests/test_interoperability_cython.pyx because it changed.
[1/3] Cythonizing cuda/tests/test_ccuda.pyx
[2/3] Cythonizing cuda/tests/test_ccudart.pyx
[3/3] Cythonizing cuda/tests/test_interoperability_cython.pyx
running build
running build_py
creating build
creating build/lib.linux-x86_64-3.8
creating build/lib.linux-x86_64-3.8/cuda
copying cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda
copying cuda/_version.py -> build/lib.linux-x86_64-3.8/cuda
creating build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_cuda
creating build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib
creating build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/__init__.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/kernels.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/perf_test_utils.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_cupy.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_launch_latency.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_numba.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
copying cuda/benchmarks/test_pointer_attributes.py -> build/lib.linux-x86_64-3.8/cuda/benchmarks
creating build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/__init__.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_cuda.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_cudart.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_cython.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_interoperability.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_kernelParams.py -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_nvrtc.py -> build/lib.linux-x86_64-3.8/cuda/tests
creating build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/__init__.py -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/__init__.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cuda.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cudart.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/nvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cuda.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cudart.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/nvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cuda.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/cudart.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/nvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda
copying cuda/_cuda/ccuda.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/cnvrtc.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/loader.pxd -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/cnvrtc.pyx -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/loader.h -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/loader.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_cuda/cnvrtc.cpp -> build/lib.linux-x86_64-3.8/cuda/_cuda
copying cuda/_lib/dlfcn.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/param_packer.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/param_packer.h -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/param_packer.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/_lib/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib
copying cuda/tests/test_ccuda.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_interoperability_cython.pyx -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_ccuda.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/tests/test_interoperability_cython.cpp -> build/lib.linux-x86_64-3.8/cuda/tests
copying cuda/_lib/ccudart/ccudart.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/utils.pxd -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/ccudart.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/utils.pyx -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/ccudart.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
copying cuda/_lib/ccudart/utils.cpp -> build/lib.linux-x86_64-3.8/cuda/_lib/ccudart
UPDATING build/lib.linux-x86_64-3.8/cuda/_version.py
set build/lib.linux-x86_64-3.8/cuda/_version.py to '11.7.1'
running build_ext
building 'cuda._cuda.ccuda' extension
creating build/temp.linux-x86_64-3.8
creating build/temp.linux-x86_64-3.8/cuda
creating build/temp.linux-x86_64-3.8/cuda/_cuda
/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc -Wno-unused-result -Wsign-compare -DNDEBUG -fwrapv -O2 -Wall -Wstrict-prototypes -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -pipe -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -DNDEBUG -D_FORTIFY_SOURCE=2 -O2 -isystem /home/ec2-user/anaconda3/envs/tensorflow2_p38/include -fPIC -I./cuda -I./cuda/_cuda -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include -I/usr/local/cuda-11.2/include -I/home/ec2-user/anaconda3/envs/tensorflow2_p38/include/python3.8 -c cuda/_cuda/ccuda.cpp -o build/temp.linux-x86_64-3.8/cuda/_cuda/ccuda.o -std=c++14 -fpermissive -Wno-deprecated-declarations -D _GLIBCXX_ASSERTIONS -fno-var-tracking-assignments -O3
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++
cuda/_cuda/ccuda.cpp: In function 'int __pyx_f_4cuda_5_cuda_5ccuda_cuPythonInit()':
cuda/_cuda/ccuda.cpp:4202:138: error: 'CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM' was not declared in this scope
 4202 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0x1B58, CU_GET_PROC_ADDRESS_PER_THREAD_DEFAULT_STREAM); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 836, __pyx_L4_error)
      |                                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:4924:137: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
 4924 |         __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuMemcpy"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuMemcpy), 0xFA0, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 917, __pyx_L4_error)
      |                                                                                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:5637:152: error: 'CU_GET_PROC_ADDRESS_DEFAULT' was not declared in this scope
 5637 |       __pyx_t_8 = __pyx_f_4cuda_5ccuda_cuGetProcAddress(((char const *)"cuGetErrorString"), (&__pyx_v_4cuda_5_cuda_5ccuda___cuGetErrorString), 0x1770, CU_GET_PROC_ADDRESS_DEFAULT); if (unlikely(__pyx_t_8 == ((CUresult)CUDA_ERROR_NOT_FOUND) && __Pyx_ErrOccurredWithGIL())) __PYX_ERR(0, 997, __pyx_L4_error)
      |                                                                                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:15609:73: error: 'CUflushGPUDirectRDMAWritesTarget' was not declared in this scope
15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
      |                                                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:15609:122: error: 'CUflushGPUDirectRDMAWritesScope' was not declared in this scope
15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
      |                                                                                                                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:15609:167: warning: expression list treated as compound expression in initializer [-fpermissive]
15609 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuFlushGPUDirectRDMAWrites(CUflushGPUDirectRDMAWritesTarget __pyx_v_target, CUflushGPUDirectRDMAWritesScope __pyx_v_scope) {
      |                                                                                                                                                                       ^
cuda/_cuda/ccuda.cpp:16977:94: error: 'CUexecAffinityType' has not been declared
16977 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int *__pyx_v_pi, CUexecAffinityType __pyx_v_typename, CUdevice __pyx_v_dev) {
      |                                                                                              ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuDeviceGetExecAffinitySupport(int*, int, CUdevice)':
cuda/_cuda/ccuda.cpp:17082:30: error: expected primary-expression before '(' token
17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
      |                              ^
cuda/_cuda/ccuda.cpp:17082:32: error: expected primary-expression before ')' token
17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
      |                                ^
cuda/_cuda/ccuda.cpp:17082:34: error: expected primary-expression before 'int'
17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
      |                                  ^~~
cuda/_cuda/ccuda.cpp:17082:41: error: 'CUexecAffinityType' was not declared in this scope
17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
      |                                         ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:17082:69: error: expected primary-expression before ')' token
17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
      |                                                                     ^
cuda/_cuda/ccuda.cpp:17082:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport'
17082 |     __pyx_v_err = ((CUresult (*)(int *, CUexecAffinityType, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceGetExecAffinitySupport)(__pyx_v_pi, __pyx_v_typename, __pyx_v_dev);
      |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                       )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:17319:86: error: 'CUexecAffinityParam' has not been declared
17319 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUcontext *__pyx_v_pctx, CUexecAffinityParam *__pyx_v_paramsArray, int __pyx_v_numParams, unsigned int __pyx_v_flags, CUdevice __pyx_v_dev) {
      |                                                                                      ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxCreate_v3(CUctx_st**, int*, int, unsigned int, CUdevice)':
cuda/_cuda/ccuda.cpp:17424:30: error: expected primary-expression before '(' token
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                              ^
cuda/_cuda/ccuda.cpp:17424:32: error: expected primary-expression before ')' token
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                ^
cuda/_cuda/ccuda.cpp:17424:44: error: expected primary-expression before '*' token
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                            ^
cuda/_cuda/ccuda.cpp:17424:45: error: expected primary-expression before ',' token
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                             ^
cuda/_cuda/ccuda.cpp:17424:47: error: 'CUexecAffinityParam' was not declared in this scope
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                               ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:17424:68: error: expected primary-expression before ',' token
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                                                    ^
cuda/_cuda/ccuda.cpp:17424:70: error: expected primary-expression before 'int'
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                                                      ^~~
cuda/_cuda/ccuda.cpp:17424:75: error: expected primary-expression before 'unsigned'
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                                                           ^~~~~~~~
cuda/_cuda/ccuda.cpp:17424:97: error: expected primary-expression before ')' token
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                                                                                                 ^
cuda/_cuda/ccuda.cpp:17424:99: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3'
17424 |     __pyx_v_err = ((CUresult (*)(CUcontext *, CUexecAffinityParam *, int, unsigned int, CUdevice))__pyx_v_4cuda_5_cuda_5ccuda___cuCtxCreate_v3)(__pyx_v_pctx, __pyx_v_paramsArray, __pyx_v_numParams, __pyx_v_flags, __pyx_v_dev);
      |                   ~                                                                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                                                   )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:20397:67: error: 'CUexecAffinityParam' was not declared in this scope
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
      |                                                                   ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:20397:88: error: '__pyx_v_pExecAffinity' was not declared in this scope
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
      |                                                                                        ^~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:20397:111: error: 'CUexecAffinityType' was not declared in this scope
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
      |                                                                                                               ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:20397:146: warning: expression list treated as compound expression in initializer [-fpermissive]
20397 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuCtxGetExecAffinity(CUexecAffinityParam *__pyx_v_pExecAffinity, CUexecAffinityType __pyx_v_typename) {
      |                                                                                                                                                  ^
cuda/_cuda/ccuda.cpp:33564:75: error: 'CUDA_ARRAY_MEMORY_REQUIREMENTS' was not declared in this scope
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
      |                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:33564:107: error: '__pyx_v_memoryRequirements' was not declared in this scope
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
      |                                                                                                           ^~~~~~~~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:33564:143: error: expected primary-expression before '__pyx_v_array'
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
      |                                                                                                                                               ^~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:33564:167: error: expected primary-expression before '__pyx_v_device'
33564 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuArrayGetMemoryRequirements(CUDA_ARRAY_MEMORY_REQUIREMENTS *__pyx_v_memoryRequirements, CUarray __pyx_v_array, CUdevice __pyx_v_device) {
---
truncated due to git issue limit
---
cuda/_cuda/ccuda.cpp:58806:44: error: 'CUgraphMem_attribute' was not declared in this scope
58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
      |                                            ^~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:58806:66: error: expected primary-expression before 'void'
58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
      |                                                                  ^~~~
cuda/_cuda/ccuda.cpp:58806:74: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute'
58806 |     __pyx_v_err = ((CUresult (*)(CUdevice, CUgraphMem_attribute, void *))__pyx_v_4cuda_5_cuda_5ccuda___cuDeviceSetGraphMemAttribute)(__pyx_v_device, __pyx_v_attr, __pyx_v_value);
      |                   ~                                                      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                          )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:64515:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
      |                                                                 ^~~~~~~~~~~~
      |                                                                 CUsurfObject
cuda/_cuda/ccuda.cpp:64515:79: error: '__pyx_v_object_out' was not declared in this scope
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
      |                                                                               ^~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:64515:99: error: expected primary-expression before 'void'
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
      |                                                                                                   ^~~~
cuda/_cuda/ccuda.cpp:64515:127: error: expected primary-expression before '__pyx_v_destroy'
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
      |                                                                                                                               ^~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:64515:144: error: expected primary-expression before 'unsigned'
64515 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
      |                                                                                                                                                ^~~~~~~~
cuda/_cuda/ccuda.cpp:64515:182: error: expected primary-expression before 'unsigned'
64515 | atic CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
      |                                                                                                                                                                                    ^~~~~~~~

cuda/_cuda/ccuda.cpp:64515:208: warning: expression list treated as compound expression in initializer [-fpermissive]
64515 | a_5_cuda_5ccuda__cuUserObjectCreate(CUuserObject *__pyx_v_object_out, void *__pyx_v_ptr, CUhostFn __pyx_v_destroy, unsigned int __pyx_v_initialRefcount, unsigned int __pyx_v_flags) {
      |                                                                                                                                                                                    ^

cuda/_cuda/ccuda.cpp:64686:65: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
      |                                                                 ^~~~~~~~~~~~
      |                                                                 CUsurfObject
cuda/_cuda/ccuda.cpp:64686:94: error: expected primary-expression before 'unsigned'
64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
      |                                                                                              ^~~~~~~~
cuda/_cuda/ccuda.cpp:64686:120: warning: expression list treated as compound expression in initializer [-fpermissive]
64686 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRetain(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
      |                                                                                                                        ^
cuda/_cuda/ccuda.cpp:64857:66: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
      |                                                                  ^~~~~~~~~~~~
      |                                                                  CUsurfObject
cuda/_cuda/ccuda.cpp:64857:95: error: expected primary-expression before 'unsigned'
64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
      |                                                                                               ^~~~~~~~
cuda/_cuda/ccuda.cpp:64857:121: warning: expression list treated as compound expression in initializer [-fpermissive]
64857 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuUserObjectRelease(CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
      |                                                                                                                         ^
cuda/_cuda/ccuda.cpp:65028:93: error: 'CUuserObject' has not been declared
65028 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count, unsigned int __pyx_v_flags) {
      |                                                                                             ^~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphRetainUserObject(CUgraph, int, unsigned int, unsigned int)':
cuda/_cuda/ccuda.cpp:65133:30: error: expected primary-expression before '(' token
65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
      |                              ^
cuda/_cuda/ccuda.cpp:65133:32: error: expected primary-expression before ')' token
65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
      |                                ^
cuda/_cuda/ccuda.cpp:65133:41: error: expected primary-expression before ',' token
65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
      |                                         ^
cuda/_cuda/ccuda.cpp:65133:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
      |                                           ^~~~~~~~~~~~
      |                                           CUsurfObject
cuda/_cuda/ccuda.cpp:65133:57: error: expected primary-expression before 'unsigned'
65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
      |                                                         ^~~~~~~~
cuda/_cuda/ccuda.cpp:65133:71: error: expected primary-expression before 'unsigned'
65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
      |                                                                       ^~~~~~~~
cuda/_cuda/ccuda.cpp:65133:85: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject'
65133 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphRetainUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count, __pyx_v_flags);
      |                   ~                                                                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                                     )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:65199:94: error: 'CUuserObject' has not been declared
65199 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph __pyx_v_graph, CUuserObject __pyx_v_object, unsigned int __pyx_v_count) {
      |                                                                                              ^~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuGraphReleaseUserObject(CUgraph, int, unsigned int)':
cuda/_cuda/ccuda.cpp:65304:30: error: expected primary-expression before '(' token
65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
      |                              ^
cuda/_cuda/ccuda.cpp:65304:32: error: expected primary-expression before ')' token
65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
      |                                ^
cuda/_cuda/ccuda.cpp:65304:41: error: expected primary-expression before ',' token
65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
      |                                         ^
cuda/_cuda/ccuda.cpp:65304:43: error: 'CUuserObject' was not declared in this scope; did you mean 'CUsurfObject'?
65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
      |                                           ^~~~~~~~~~~~
      |                                           CUsurfObject
cuda/_cuda/ccuda.cpp:65304:57: error: expected primary-expression before 'unsigned'
65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
      |                                                         ^~~~~~~~
cuda/_cuda/ccuda.cpp:65304:71: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject'
65304 |     __pyx_v_err = ((CUresult (*)(CUgraph, CUuserObject, unsigned int))__pyx_v_4cuda_5_cuda_5ccuda___cuGraphReleaseUserObject)(__pyx_v_graph, __pyx_v_object, __pyx_v_count);
      |                   ~                                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                       )
cuda/_cuda/ccuda.cpp: At global scope:
cuda/_cuda/ccuda.cpp:74604:69: error: 'CUmoduleLoadingMode' was not declared in this scope
74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
      |                                                                     ^~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp:74604:90: error: '__pyx_v_mode' was not declared in this scope; did you mean '__pyx_k_name'?
74604 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuModuleGetLoadingMode(CUmoduleLoadingMode *__pyx_v_mode) {
      |                                                                                          ^~~~~~~~~~~~
      |                                                                                          __pyx_k_name
cuda/_cuda/ccuda.cpp:74775:145: error: 'CUmemRangeHandleType' has not been declared
74775 | static CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void *__pyx_v_handle, CUdeviceptr __pyx_v_dptr, size_t __pyx_v_size, CUmemRangeHandleType __pyx_v_handleType, unsigned PY_LONG_LONG __pyx_v_flags) {
      |                                                                                                                                                 ^~~~~~~~~~~~~~~~~~~~
cuda/_cuda/ccuda.cpp: In function 'CUresult __pyx_f_4cuda_5_cuda_5ccuda__cuMemGetHandleForAddressRange(void*, CUdeviceptr, size_t, int, long long unsigned int)':
cuda/_cuda/ccuda.cpp:74880:30: error: expected primary-expression before '(' token
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                              ^
cuda/_cuda/ccuda.cpp:74880:32: error: expected primary-expression before ')' token
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                                ^
cuda/_cuda/ccuda.cpp:74880:34: error: expected primary-expression before 'void'
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                                  ^~~~
cuda/_cuda/ccuda.cpp:74880:53: error: expected primary-expression before ',' token
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                                                     ^
cuda/_cuda/ccuda.cpp:74880:61: error: expected primary-expression before ',' token
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                                                             ^
cuda/_cuda/ccuda.cpp:74880:63: error: 'CUmemRangeHandleType' was not declared in this scope; did you mean 'CUmemHandleType'?
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                                                               ^~~~~~~~~~~~~~~~~~~~
      |                                                               CUmemHandleType
cuda/_cuda/ccuda.cpp:74880:85: error: expected primary-expression before 'unsigned'
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                                                                                     ^~~~~~~~
cuda/_cuda/ccuda.cpp:74880:108: error: expected ')' before '__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange'
74880 |     __pyx_v_err = ((CUresult (*)(void *, CUdeviceptr, size_t, CUmemRangeHandleType, unsigned PY_LONG_LONG))__pyx_v_4cuda_5_cuda_5ccuda___cuMemGetHandleForAddressRange)(__pyx_v_handle, __pyx_v_dptr, __pyx_v_size, __pyx_v_handleType, __pyx_v_flags);
      |                   ~                                                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      |                                                                                                            )
error: command '/home/ec2-user/anaconda3/envs/tensorflow2_p38/bin/x86_64-conda-linux-gnu-cc' failed with exit status 1

Option to link statically against `libnvrtc_static.a`.

It would be great if it was possible to link against NVRTC statically.

21 errors during collection

(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python$ pip install cuda-python==11.7
Requirement already satisfied: cuda-python==11.7 in /home/nyck33/anaconda3/envs/cython/lib/python3.10/site-packages (11.7.0)
Requirement already satisfied: cython in /home/nyck33/anaconda3/envs/cython/lib/python3.10/site-packages (from cuda-python==11.7) (3.0.0a11)
(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python$ python -m pytest
================================ test session starts =================================
platform linux -- Python 3.10.8, pytest-7.2.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python, configfile: pytest.ini
plugins: benchmark-4.0.0
collected 12 items / 21 errors                                                       

======================================= ERRORS =======================================
______________ ERROR collecting cuda/benchmarks/test_launch_latency.py _______________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/benchmarks/test_launch_latency.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/benchmarks/test_launch_latency.py:9: in <module>
    from cuda import cuda
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
____________ ERROR collecting cuda/benchmarks/test_pointer_attributes.py _____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/benchmarks/test_pointer_attributes.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/benchmarks/test_pointer_attributes.py:9: in <module>
    from cuda import cuda
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
______________________ ERROR collecting cuda/tests/test_cuda.py ______________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_cuda.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_cuda.py:10: in <module>
    import cuda.cuda as cuda
E   ModuleNotFoundError: No module named 'cuda.cuda'
_____________________ ERROR collecting cuda/tests/test_cudart.py _____________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_cudart.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_cudart.py:10: in <module>
    import cuda.cudart as cudart
E   ModuleNotFoundError: No module named 'cuda.cudart'
_____________________ ERROR collecting cuda/tests/test_cython.py _____________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_cython.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_cython.py:35: in <module>
    mod = importlib.import_module(mod)
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
E   ModuleNotFoundError: No module named 'cuda.tests.test_ccuda'
________________ ERROR collecting cuda/tests/test_interoperability.py ________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_interoperability.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_interoperability.py:9: in <module>
    import cuda.cuda as cuda
E   ModuleNotFoundError: No module named 'cuda.cuda'
__________________ ERROR collecting cuda/tests/test_kernelParams.py __________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_kernelParams.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_kernelParams.py:9: in <module>
    from cuda import cuda, cudart, nvrtc
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_____________________ ERROR collecting cuda/tests/test_nvrtc.py ______________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/tests/test_nvrtc.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
cuda/tests/test_nvrtc.py:9: in <module>
    from cuda import nvrtc
E   ImportError: cannot import name 'nvrtc' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
____________ ERROR collecting examples/0_Introduction/clock_nvrtc_test.py ____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/clock_nvrtc_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/clock_nvrtc_test.py:9: in <module>
    from cuda import cuda
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_______ ERROR collecting examples/0_Introduction/simpleCubemapTexture_test.py ________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/simpleCubemapTexture_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/simpleCubemapTexture_test.py:13: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_____________ ERROR collecting examples/0_Introduction/simpleP2P_test.py _____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/simpleP2P_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/simpleP2P_test.py:11: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
__________ ERROR collecting examples/0_Introduction/simpleZeroCopy_test.py ___________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/simpleZeroCopy_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/simpleZeroCopy_test.py:13: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_________ ERROR collecting examples/0_Introduction/systemWideAtomics_test.py _________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/systemWideAtomics_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/systemWideAtomics_test.py:12: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
___________ ERROR collecting examples/0_Introduction/vectorAddDrv_test.py ____________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/vectorAddDrv_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/vectorAddDrv_test.py:11: in <module>
    from cuda import cuda
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
___________ ERROR collecting examples/0_Introduction/vectorAddMMAP_test.py ___________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/0_Introduction/vectorAddMMAP_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/0_Introduction/vectorAddMMAP_test.py:12: in <module>
    from cuda import cuda
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_ ERROR collecting examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py _
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py:13: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
______ ERROR collecting examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py ______
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py:13: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_________ ERROR collecting examples/3_CUDA_Features/simpleCudaGraphs_test.py _________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/3_CUDA_Features/simpleCudaGraphs_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/3_CUDA_Features/simpleCudaGraphs_test.py:11: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
__ ERROR collecting examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py __
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py:12: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
_______________ ERROR collecting examples/extra/isoFDModelling_test.py _______________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/extra/isoFDModelling_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/extra/isoFDModelling_test.py:10: in <module>
    from cuda import cuda, cudart
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
________________ ERROR collecting examples/extra/jit_program_test.py _________________
ImportError while importing test module '/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/examples/extra/jit_program_test.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../../../anaconda3/envs/cython/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
examples/extra/jit_program_test.py:10: in <module>
    from cuda import cuda, nvrtc
E   ImportError: cannot import name 'cuda' from 'cuda' (/home/nyck33/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python/cuda/__init__.py)
============================== short test summary info ===============================
ERROR cuda/benchmarks/test_launch_latency.py
ERROR cuda/benchmarks/test_pointer_attributes.py
ERROR cuda/tests/test_cuda.py
ERROR cuda/tests/test_cudart.py
ERROR cuda/tests/test_cython.py
ERROR cuda/tests/test_interoperability.py
ERROR cuda/tests/test_kernelParams.py
ERROR cuda/tests/test_nvrtc.py
ERROR examples/0_Introduction/clock_nvrtc_test.py
ERROR examples/0_Introduction/simpleCubemapTexture_test.py
ERROR examples/0_Introduction/simpleP2P_test.py
ERROR examples/0_Introduction/simpleZeroCopy_test.py
ERROR examples/0_Introduction/systemWideAtomics_test.py
ERROR examples/0_Introduction/vectorAddDrv_test.py
ERROR examples/0_Introduction/vectorAddMMAP_test.py
ERROR examples/2_Concepts_and_Techniques/streamOrderedAllocation_test.py
ERROR examples/3_CUDA_Features/globalToShmemAsyncCopy_test.py
ERROR examples/3_CUDA_Features/simpleCudaGraphs_test.py
ERROR examples/4_CUDA_Libraries/conjugateGradientMultiBlockCG_test.py
ERROR examples/extra/isoFDModelling_test.py
ERROR examples/extra/jit_program_test.py
!!!!!!!!!!!!!!!!!!!!!! Interrupted: 21 errors during collection !!!!!!!!!!!!!!!!!!!!!!
================================= 21 errors in 0.62s =================================
(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python11_7/cuda-python$

Error with pyparsing when no CUDA is found

cuda-python/setup.py

Line 104 in 6782a64

if pyparsing.__version__ != '2.4.7':

This line assumes that pyparsing has been imported, but it is not imported in the setup.py script.

This appears to be a transitive dependency of pyclibrary, and cuda-python's setup.py expects version 2.4.7. However, the latest version of pyclibrary has a much looser pinning on pyparsing>=2.3.1,<4: https://github.com/MatthieuDartiailh/pyclibrary/blob/1d4dbfc207afee3fd80b72e94c60d47d8263d49a/setup.py#L48

As of this writing, the latest pyparsing is 3.0.9. https://github.com/pyparsing/pyparsing/releases/tag/pyparsing_3.0.9

I'm not sure how to reconcile this. Is the error message out of date? Does a dependency on pyparsing need to be added? Is pyparsing 3.0.9 allowable or not?

Here is an example traceback I encountered while attempting to package cuda-python for CUDA 12 on conda-forge (work in progress: conda-forge/cuda-python-feedstock#33).

Processing $SRC_DIR
  Added file://$SRC_DIR to build tracker '/tmp/pip-build-tracker-8n7yddr7'
  Running setup.py (path:$SRC_DIR/setup.py) egg_info for package from file://$SRC_DIR
  Created temporary directory: /tmp/pip-pip-egg-info-3mflzan5
  Preparing metadata (setup.py): started
  Running command python setup.py egg_info
  Parsing headers in "/home/conda/feedstock_root/build_artifacts/cuda-python_1682019216274/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold/targets/x86_64-linux/include" (Caching False)
  Missing header cuda.h
  Missing header cudaProfiler.h
  Missing header cudaEGL.h
  Missing header cudaGL.h
  Missing header cudaVDPAU.h
  Parsing driver headers
  Missing header driver_types.h
  Missing header vector_types.h
  Missing header cuda_runtime.h
  Missing header surface_types.h
  Missing header texture_types.h
  Missing header library_types.h
  Missing header cuda_runtime_api.h
  Missing header device_types.h
  Missing header driver_functions.h
  Missing header cuda_profiler_api.h
  Missing header cuda_egl_interop.h
  Missing header cuda_gl_interop.h
  Missing header cuda_vdpau_interop.h
  Parsing runtime headers
  Missing header nvrtc.h
  Parsing nvrtc headers
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/home/conda/feedstock_root/build_artifacts/cuda-python_1682019216274/work/setup.py", line 103, in <module>
      if pyparsing.__version__ != '2.4.7':
  NameError: name 'pyparsing' is not defined
  error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> See above for output.

CI link (this link will eventually expire, so the relevant portion is copied above): https://dev.azure.com/conda-forge/feedstock-builds/_build/results?buildId=693718&view=logs&j=d0d954b5-f111-5dc4-4d76-03b6c9d0cf7e&t=841356e0-85bb-57d8-dbbc-852e683d1642

Splayed layout support

Currently cuda-python relies on all binaries (like nvcc), all headers, and all libraries to live in a single directory (specified by $CUDA_HOME or similar).

However there are use cases (like cross-compilation, as with conda-build) where the build tools may live in one location (and perform builds on that architecture) whereas the headers and libraries may live in a different location (and target a different architecture). In this case not everything lives in $CUDA_HOME.

It would be helpful to have a way of specifying where these different components come from. Here are some options:

Check $NVCC for the nvcc location
Use $CUDA_BIN (if specified) to get build tool directory
Support a list of directories in $CUDA_HOME
?

Maybe there are other reasonable options worth considering?

Windows: ModuleNotFoundError: No module named 'win32api'

Installing on Windows:

python -m pip install cuda-python

Then from python:

from cuda import cuda

Fails with

    File "cuda\cuda.pyx", line 1, in init cuda.cuda

    File "cuda\ccuda.pyx", line 1, in init cuda.ccuda

    File "cuda\_cuda\ccuda.pyx", line 8, in init cuda._cuda.ccuda

  ModuleNotFoundError: No module named 'win32api'

I can fix this by installing pypiwin32 manually. But I think it should be listed in requirements.txt if platform_system is Windows.

Thanks

ERROR Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)

venv "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\Python.exe"
Python 3.11.0 (main, Oct 24 2022, 18:26:48) [MSC v.1933 64 bit (AMD64)]
Commit hash:
Installing torch and torchvision
Traceback (most recent call last):
File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 227, in
prepare_enviroment()
File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 150, in prepare_enviroment
run(f'"{python}" -m {torch_command}', "Installing torch and torchvision", "Couldn't install torch")
File "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\launch.py", line 33, in run
raise RuntimeError(message)
RuntimeError: Couldn't install torch.
Command: "C:\stable-diffusion-webui-master\stable-diffusion-webui-master\venv\Scripts\python.exe" -m pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 --extra-index-url https://download.pytorch.org/whl/cu113
Error code: 1
stdout: Looking in indexes: https://pypi.org/simple, https://download.pytorch.org/whl/cu113

stderr: ERROR: Could not find a version that satisfies the requirement torch==1.12.1+cu113 (from versions: none)
ERROR: No matching distribution found for torch==1.12.1+cu113

Press any key to continue . . .

more inference time in cuda env compared to cpu (occured only for a layer)

Dear sir/madam:
When I inference on a deep learning model (slowfast model), I'm facing a problem that my python program seems to take more inference time in cuda env compared to cpu. It's not the whole model but one specific layer takes more time on cuda env than cpu. I'm so confused that hope someone can help me with it. Here is the details.
the specific layer is "slowway-conv1" layer as showned in the pic below representing the model structure of slowfast.

And my confusing result is as follows. the first for cuda and the second for cpu.

In cuda env, I found the processing time of "conv1" (0.97s) accounts for a great proportion of the processing time of the whole model (1.04s), while in cpu env, the processing time of "conv1" (0.07s) only accounts for a very small proportion of the processing time of the whole model (4.43s). And I reckon that the proportion in cpu env is reasonable considering the calculation budget.
Is my method of time measurement mistaken? I used the following code to measure time cost.

If it's my fault that causing the confusing result, please kindly point out, or please give me some ideas to help me solve this problem. Thank you very much!
Yours, Koala

Numba link bug

Was looking at https://nvidia.github.io/cuda-python/motivation.html and noticed a broken link.

The "Numba" link sends you to https://numpy.org/ instead of https://numba.pydata.org/

Missing docs for cudaStreamCreateWithFlags

On the API reference, there is no documentation about cudaStreamCreateWithFlags, despite other streams functions mentioning it (as here).

Also, it was not clear for me how to specify flags in stream, for example, what is the value or where can I find CU_STREAM_DEFAULT or CU_STREAM_NON_BLOCKING.

I'm looking forward to migrate from pycuda to cuda-python, great to see this effort!

Dropping Python 3.8

We're considering dropping support for Python 3.8 for the next release.
Per NEP 29, Python 3.8 was dropped on Apr 14th 2023.

Let us know if there's concerns in having Python 3.8 dropped next release. Thanks!

Dropping package releases for ppc64 on PYPI and conda-nvidia channel

We're considering dropping package releases for ppc64le on PYPI and conda-nvidia channel in the next release. Source builds will continue to work and testing will continue.

Let us know if there's any concerns. Thanks!

_ZSt28__throw_bad_array_new_lengthv

~/cuda-python$ pip install -e .
Obtaining file:///home/vinuj/cuda-python
Requirement already satisfied: cython in /home/vinuj/anaconda3/lib/python3.9/site-packages (from cuda-python==11.7.1) (0.29.28)
Installing collected packages: cuda-python
  Attempting uninstall: cuda-python
    Found existing installation: cuda-python 11.7.1
    Uninstalling cuda-python-11.7.1:
      Successfully uninstalled cuda-python-11.7.1
  Running setup.py develop for cuda-python

    from cuda import cuda, cudart
ImportError: /home/vinuj/cuda-python/cuda/cuda.cpython-39-x86_64-linux-gnu.so: undefined symbol: _ZSt28__throw_bad_array_new_lengthv

pytest seems to use an unsupported argument

jkh@megamind-> pytest
ERROR: usage: pytest [options] [file_or_dir] [file_or_dir] [...]
pytest: error: unrecognized arguments: --benchmark-skip
inifile: /home/jkh/Src/cuda-python/pytest.ini
rootdir: /home/jkh/Src/cuda-python

Python version is 3.9.5

API Reference lacks logical organization and is hardly readable

I am doing some internal tests using CUDA Python, and I find the Python API Reference is hardly readable. Currently all APIs (functions, classes (=C structs), attributes, etc) are all dumped in the same page. Furthermore, the docs for all modules (cuda, cudart, nvrtc) are also coalesced in the same page, making the situation even worse. This is a screenshot of the gargantua page:

What I'd expect:

Under the entry "CUDA Python API Reference" of the ToC on the left, we list the 3 modules as sub-entries
Under each module, we further list 3 (or more) sub-sub-entries: "Functions", "Classes", "Attributes" (maybe defines/typedefs/enums can all be combined as Attributes)
In each of the "Functions" sub-sub-entry, we organize the contents into sub-sub-sub-entries based on their purpose. For example, for the CUDA Runtime APIs we can follow how the parent page does it: https://docs.nvidia.com/cuda/cuda-runtime-api/modules.html#modules

What we'll achieve by doing so:

Match the way CUDA programmers search the CUDA documentations
Provide a better user-friendly, more logical organization for the docs

Docs for Building and Installation requirements are actually execution requirements

https://nvidia.github.io/cuda-python/install.html#requirements
https://github.com/NVIDIA/cuda-python/blob/main/README.md#requirements

These two sections leave incorrect impressions:

You don't actually need the driver to build/install CUDA Python
The listed CTK range is actually only for execution, and source builds have their own CTK requirement

Compatibility with pytorch when backward propagation is required

Dear Developers,

I find that, in my project, if I initialize the cuda-python package with cuInit(0), then when using the pytorch to train a neural network, the torch package will raise an error claiming that its c++ backend engine can't find the right CUDA stream. I've checked other parts thoughly and, only when seperating the usage of these two packages into two seperate functions, i.e. isolating the different streams, things can workout.

I'm wondering if there's a good solution for this issue or some abuse need to be avoided to eliminate this issue.

PS: I'm also wondering if there'll be further python wrapper support for cusolver, cublas like this user-friendly and highly productive package?

Looking forward to hearing from you! Great thanks for your attention!

Best regards,
Mingran

What is the relationship between this and pycuda ?

They seem similar. Except pycuda is not opensource and does not support cuda graph. This is just a question because I am a little confused.

No module named 'examples'

I change directories to try to run some examples.

(cython) nyck33@nyck33-IdeaPad-Gaming-3-15ACH6:~/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction$ python clock_nvrtc_test.py
Traceback (most recent call last):
  File "/home/nyck33/Documents/cuda-start-dec2022/cuda-python/examples/0_Introduction/clock_nvrtc_test.py", line 10, in <module>
    from examples.common import common
ModuleNotFoundError: No module named 'examples'

What am I doing wrong?
I am looking at pypi package called absolufy-imports to try to get this going.

nvrtc.nvrtcCompileProgram is changing the preferred encoding from UTF-8 to ANSI_X3.4-1968

Dear developers,

I found out that calling the NVRTC for compilation is changing the preferred encoding for the current Python instance.

For more details and to reproduce the issue, please refer to this StackOverflow question.

Do you have an idea on why this happens, and how it is possible to revert the preferred encoding to its original setting?

Thank you in advance

The cuda-python package should declare dependencies only on components that are actually used, which might be a more limited subset like these, which I found by reading extern from declarations:

cuda-cudart-dev for cuda.h, cuda_runtime.h, driver_types.h, and other headers in https://github.com/NVIDIA/cuda-python/blob/main/cuda/ccudart.pxd.in
cuda-nvrtc-dev for nvrtc.h

cuda-python/cuda/cnvrtc.pxd.in

Line 11 in 9ac2d31

cdef extern from "nvrtc.h":
cuda-profiler-api for cudaProfiler.h

cuda-python/cuda/ccuda.pxd.in

Line 1868 in 9ac2d31

cdef extern from "cudaProfiler.h":
(Others?)

	Returns
	-------
	cudaError_t
	cudaSuccess
	cudaErrorInvalidDevice
	cudaErrorDeviceAlreadyInUse
	None
	None

	def cudaGetErrorName(error not None : cudaError_t):
	""" Returns the string representation of an error code enum name.

	Returns a string containing the name of an error code in the enum. If
	the error code is not recognized, "unrecognized error code" is
	returned.

	Parameters
	----------
	error : cudaError_t
	Error code to convert to string

	Returns
	-------
	cudaError_t
	`char*` pointer to a NULL-terminated string
	None
	None

	See Also
	--------
	cudaGetErrorString
	cudaGetLastError
	cudaPeekAtLastError
	cudaError
	cuGetErrorName
	"""
	cdef ccudart.cudaError_t cerror = error.value
	err = ccudart.cudaGetErrorName(cerror)
	return (cudaError_t.cudaSuccess, err)

	def cudaDeviceGetDefaultMemPool(int device):
	""" Returns the default mempool of a device.

	The default mempool of a device contains device memory from that
	device.

	Returns
	-------
	cudaError_t
	cudaSuccess
	cudaErrorInvalidDevice
	cudaErrorInvalidValue
	cudaErrorNotSupported
	None
	None

	See Also
	--------
	cuDeviceGetDefaultMemPool
	cudaMallocAsync
	cudaMemPoolTrimTo
	cudaMemPoolGetAttribute
	cudaDeviceSetMemPool
	cudaMemPoolSetAttribute
	cudaMemPoolSetAccess
	"""
	cdef cudaMemPool_t memPool = cudaMemPool_t()
	with nogil:
	err = ccudart.cudaDeviceGetDefaultMemPool(<ccudart.cudaMemPool_t*>memPool._ptr, device)
	return (cudaError_t(err), memPool)

	cdef cudaError_t _cudaRuntimeGetVersion(int* runtimeVersion) nogil except ?cudaErrorCallRequiresNewerDriver:
	cdef cudaError_t err
	runtimeVersion[0] = m_global.CUDART_VERSION
	return cudaSuccess

nvidia / cuda-python Goto Github PK

cuda-python's Issues

The problem

Why this happens

A potential solution

Gotcha

Additional context

Recommend Projects

Recommend Topics

Recommend Org