Comments (18)
I haven't tried running on aws, so I'm not sure what might be going on, and the error reported is not so informative ):
Just as a sanity check, can you try compiling and running the deviceQueryDrv
program from the CUDA samples?
from cuda.
It compiles fine but gives the same error:
root@7d05fbb8d8d5:/opt/accelerate-llvm# git clone https://github.com/tmcdonell/cuda.git
Cloning into 'cuda'...
remote: Counting objects: 4345, done.
remote: Compressing objects: 100% (9/9), done.
remote: Total 4345 (delta 1), reused 6 (delta 1), pack-reused 4335
Receiving objects: 100% (4345/4345), 1.79 MiB | 1.20 MiB/s, done.
Resolving deltas: 100% (2384/2384), done.
Checking connectivity... done.
root@7d05fbb8d8d5:/opt/accelerate-llvm# stack ghc cuda/examples/src/deviceQueryDrv/DeviceQuery.hs
[1 of 1] Compiling Main ( cuda/examples/src/deviceQueryDrv/DeviceQuery.hs, cuda/examples/src/deviceQueryDrv/DeviceQuery.o )
Linking cuda/examples/src/deviceQueryDrv/DeviceQuery ...
root@7d05fbb8d8d5:/opt/accelerate-llvm# cuda/examples/src/deviceQueryDrv/DeviceQuery
DeviceQuery: Status.toEnum: Cannot match -1
CallStack (from HasCallStack):
error, called at src/Foreign/CUDA/Driver/Error.chs:372:22 in cuda-0.10.0.0-Lq313TS76CJ6ufZOzm0zPz:Foreign.CUDA.Driver.Error
from cuda.
Oh, I meant, the one which ships from NVIDIA as part of the CUDA toolkit. It probably lives in /usr/local/cuda/samples/1_Utilities/deviceQueryDrv
?
from cuda.
I can't find the sample you mentioned anywhere on the filesystem, or in https://github.com/NVIDIA/cuda-samples.git,
but it does have deviceQuery
, but it runs with some sort of failure
root@7d05fbb8d8d5:/tmp/cuda-samples# bin/x86_64/linux/release/deviceQuery
bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL
from cuda.
I found a version in https://github.com/zchee/cuda-sample.git, and it compiles with some minor tweaking (commenting out line 202: GENCODE_FLAGS += -gencode arch=compute_20,code=compute_20
), but its output is also a failure
root@7d05fbb8d8d5:/tmp/cuda-sample/1_Utilities/deviceQueryDrv# ./deviceQueryDrv
./deviceQueryDrv Starting...
CUDA Device Query (Driver API) statically linked version
cuInit(0) returned -1
-> (null)
Result = FAIL
from cuda.
Okay, I think that CUDA is not installed correctly on this system. It looks like that somebody installed a new version of the CUDA toolkit but did not update the device driver at the same time to match. Try reinstalling / updating the driver?
from cuda.
@tmcdonell I can confirm this. I managed to fix this by adding the nvidia driver ppa, and upgrading my nvidia driver. I suspect what happened was that when I installed CUDA it installed its own driver, which apparently causes issues. See NVIDIA/nvidia-docker#802. The instructions for nvidia-docker
AWS provisioning are not up to date, and I didn't realise before installing nvidia-docker
version 1, so I had to remove it and install version 2, which probably left some cruft on my system. Closing this now.
from cuda.
Actually, I only got it to work with the nvidia/cuda
image, which is on CUDA version 9.0.176, but I still get the error on tmcdonell/accelerate-llvm
which is on CUDA version 9.2.148.
from cuda.
For clarity, on the host machine I have version 396.54 of the nvidia driver, and no cuda-toolkit installed (this is how nvidia-docker recommends the machine is set up). I suspect perhaps I just need to install the version of the nvidia driver that the cuda-toolkit in the tmcdonell/accelerate-llvm
image expects, but I don't know how to ascertain that.
from cuda.
Funnily enough, it also works fine on nvidia/cuda:9.2-devel-ubuntu16.04
- perhaps this image was generated more recently than tmcdonell/accelerate-llvm
, and is compatible with the nvidia 396.54 driver. I'll try rebuilding tmcdonell/accelerate-llvm
and see if it works, but I wonder why this seems so unstable.
EDIT: This didn't seem to fix anything.
from cuda.
Hm, interesting. I haven't played with the docker images in a while. If you manage to fix it and could send a patch that would be awesome. Otherwise, I'll see about setting up an aws account and trying it out on a p3.2xlarge.
from cuda.
https://github.com/tmcdonell/accelerate-llvm/blob/6400e4fc20f2091c3a928eb5678a4e3f8166a4c5/Dockerfile#L14 seems to be the culprit; when this line is removed, I can compile and run deviceQueryDrv
fine
from cuda.
Unfortunately, even though that works now, when I try to run my accelerate program, I get the following error
*** Warning: Unknown CUDA device compute capability: 7.0
*** Please submit a bug report at https://github.com/tmcdonell/cuda/issues
from cuda.
Ah I guess that the missing libcuda.so.1
link has been fixed with the newer cuda release (or not needed anymore?)
Oh, I only recently added the device properties for compute 7x to the cuda
package, but looks like I did not update the stack files for accelerate-llvm to point to it yet.
I pushed patches containing both these changes just now, the new image is building and should be done soon...
from cuda.
Hmm, looks like removing that line causes the Docker build to fail at the stage of compiling accelerate - what I did was I removed it in the existing image, and allowed the linker to find
libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f37643c3000)
in my compiled binaries.
I don't know where each of these shared libraries come from, but they are different:
root@36d65cc14b9d:/opt/accelerate-llvm/kmeans# md5sum /usr/local/cuda/lib64/stubs/libcuda.so
1725f80d0ef5e44dc61c8d81f02da761 /usr/local/cuda/lib64/stubs/libcuda.so
root@36d65cc14b9d:/opt/accelerate-llvm/kmeans# md5sum /usr/lib/x86_64-linux-gnu/libcuda.so.1
0161af92fdca2cec1bd72b0ade604f05 /usr/lib/x86_64-linux-gnu/libcuda.so.1
Maybe the easy fix is to symlink to the second one instead?
from cuda.
I have confirmed that with the latest image, if I simply rm /usr/local/cuda/lib64/libcuda.so.1
, the linker manages to find libcuda.so.1
as above and my accelerate programs run correctly.
from cuda.
It is strange that building accelerate-llvm-ptx
does not automatically find the library at /usr/lib/x86_64-linux-gnu/libcuda.so.1
like it does when running the programs. Maybe you are right, and the correct solution is to instead create the symlink to that point, rather than the one in /usr/local/cuda...
is I did.
from cuda.
@tmcdonell I can confirm this. I managed to fix this by adding the nvidia driver ppa, and upgrading my nvidia driver.
@NickHu to clarify, did you install the nvidia-396
package inside the docker image, or on your host machine?
from cuda.
Related Issues (20)
- Minor edits needed for 7.10 HOT 1
- CUDA 8 support [was: Plans to support CUDA 8?] HOT 4
- Build fails with library profiling HOT 4
- Compilation fails for CUDA-8 [was: ghc 7.10.3 fail to install] HOT 2
- On Windows, the Cabal installer is looking in the wrong place. HOT 5
- Problem with module `Foreign.CUDA.Path' when installing on Windows 8.1 with CUDA-8.0 HOT 38
- Issues with dynamic parallelism? HOT 3
- GHCi only works with Stack with a custom cabal.buildinfo file. HOT 1
- Structure of nvidia-cuda-toolkit seems to have changed in Ubuntu 18.04 HOT 3
- Build failure in cuda-0.10.0.0 HOT 3
- Won't compile on Arch Linux HOT 3
- Ubuntu 16.04 + nvidia-cuda-toolkit "Found CUDA toolkit at: /usr" but 'Could not find path: ["/usr/lib64"]' HOT 7
- package revision to restrict Cabal to <3.0 HOT 1
- Compile issues for Cabal 2? HOT 2
- Cannot find a definition for `cuDevicePrimaryCtxRelease' in the header file. HOT 3
- CUDA 11.3 compatibility HOT 4
- rpath linking to cuda toolkit HOT 2
- This Library isn't Compatible with Windows HOT 4
- Porting to CUDA 12.2. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cuda.