Comments (8)
I can't replicate this: I ran test1()
120 times in a row without a single error. It also doesn't seem to really make sense, because all the finalizers should run at the end of test1
---when the devices
context closes, it runs a device_reset
(https://github.com/JuliaGPU/CUDArt.jl/blob/35e9e4922eba7ed4bd3154b520e6716f345195c1/src/device.jl#L90-L97). So that should be the equivalent of running gc()
.
julia> versioninfo()
Julia Version 0.4.0-dev+2954
Commit 8e1e310* (2015-01-28 14:07 UTC)
Platform Info:
System: Linux (x86_64-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
Things to try:
- examine the contents of
CUDArt.cuda_ptrs
between runs - set
debugMemory = true
(it's been long enough that I don't exactly remember what this does, but I remember it being useful) https://github.com/JuliaGPU/CUDArt.jl/blob/35e9e4922eba7ed4bd3154b520e6716f345195c1/src/arrays.jl#L15
from cudart.jl.
Before the failure it looks like this (I printed cuda_ptrs and the copy parameters in cudacopy!). I find it reliably reproducible, but it disappears when I use copy! instead of to_host.
Finished iteration 2.
CUDArt.cuda_ptrs = Dict{Any,Int64}(Ptr{Void} @0x0000004200100000=>0,Ptr{Void} @0x0000004200c00000=>0,Ptr{Void} @0x0000004200940000=>0,Ptr{Void} @0x00000042003c0000=>0,Ptr{Void} @0x0000004201180000=>0,Ptr{Void} @0x0000004201440000=>0,Ptr{Void} @0x0000004201700000=>0,Ptr{Void} @0x00000042019c0000=>0,Ptr{Void} @0x0000004200680000=>0,Ptr{Void} @0x0000004200ec0000=>0)
params = [rt.cudaMemcpy3DParms(C_NULL,srcpos,pitchedptr(src),C_NULL,dstpos,pitchedptr(dst),ext,cudamemcpykind(dst,src))] = [CUDArt.CUDArt_gen.cudaMemcpy3DParms(Ptr{Void} @0x0000000000000000,CUDArt.CUDArt_gen.cudaPos(0x0000000000000000,0x0000000000000000,0x0000000000000000),CUDArt.CUDArt_gen.cudaPitchedPtr(Ptr{Void} @0x0000004200100000,0x0000000000000e00,0x0000000000000321,0x0000000000000322),Ptr{Void} @0x0000000000000000,CUDArt.CUDArt_gen.cudaPos(0x0000000000000000,0x0000000000000000,0x0000000000000000),CUDArt.CUDArt_gen.cudaPitchedPtr(Ptr{Void} @0x000000000d4a8ff0,0x0000000000000c84,0x0000000000000321,0x0000000000000322),CUDArt.CUDArt_gen.cudaExtent(0x0000000000000c84,0x0000000000000322,0x0000000000000001),0x00000002)]
WARNING: CUDA error triggered from:
in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:15
in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:100
in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:288
in copy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:282
in to_host at /***/.julia/v0.4/CUDArt/src/arrays.jl:87
in anonymous at /***/test.jl:53
in devices at /***/.julia/v0.4/CUDArt/src/device.jl:67
in devices at /***/.julia/v0.4/CUDArt/src/device.jl:59
in test1 at /***/test.jl:7ERROR: "invalid argument"
in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:16
in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:100
in cudacopy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:288
in copy! at /***/.julia/v0.4/CUDArt/src/arrays.jl:282
in to_host at /***/.julia/v0.4/CUDArt/src/arrays.jl:87
in anonymous at /***/test.jl:53
in devices at /***/.julia/v0.4/CUDArt/src/device.jl:67
in devices at /***/.julia/v0.4/CUDArt/src/device.jl:59
in test1 at /***/test.jl:7
from cudart.jl.
If I surround to_host in gc_disable, gc_enable, there is also no crash, so this probably isn't to do with finalizers.
from cudart.jl.
Very strange. I get this:
julia> CUDArt.cuda_ptrs
Dict{Any,Int64} with 0 entries
because this code should remove all the entries from the dict. Why isn't that happening in your case?
from cudart.jl.
The message gets printed before the error in to_host, so that's cuda_ptrs just before the crash.
from cudart.jl.
Ah, I misunderstood when you were running it. That's informative too (but also worth checking: you should have an empty dict right after you successfully complete test1()
).
I can't see anything that looks wrong with that output. I'm pretty baffled overall. Does Pkg.test("CUDArt")
pass for you? What does nvcc --version
say for you? How about the deviceQuery
test?
from cudart.jl.
julia> Pkg.test("CUDArt")
INFO: Testing CUDArt
juliarc = "/***/.juliarc.jl"
INFO: CUDArt tests passed
INFO: No packages to install, update or remove
~ $ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Thu_Mar_13_11:58:58_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1
~ $ ./deviceQuery
Detected 4 CUDA Capable device(s)
Device 0: "Tesla M2090"
CUDA Driver Version / Runtime Version 6.0 / 6.0
CUDA Capability Major/Minor version number: 2.0
<snip...>
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 4, Device0 = Tesla M2090, Device1 = Tesla M2090, Device2 = Tesla M2090, Device3 = Tesla M2090
Result = PASS
I should point out that I don't have direct control over the machine here, it's more like a departmental server that I can use.
I notice the runtime version is 6.0, while the wrappers were generated for 6.5. I think this is probably not the reason because the actual API is probably just the same.
At first I thought maybe it's some pointer alignment issue (the host pointer is not divisible by 128 = 0x80), but the other host pointers are also not aligned and do not cause errors.
from cudart.jl.
In that example, the device is still initialized (i.e. cudaDeviceReset was not called, which is a real problem):
function isInitialized(dev)
device(dev)
try
CUDArt.rt.cudaSetDeviceFlags(0)
return false
catch ex
@show ex
return true
end
end
julia> Test.isInitialized(0)
WARNING: CUDA error triggered from:
in checkerror at /***/.julia/v0.4/CUDArt/src/libcudart-6.5.jl:15
in isInitialized at /***/test.jl:8ex = "cannot set while device is active in this process"
true
from cudart.jl.
Related Issues (20)
- Tests fail on Windows with 0.6 HOT 1
- Info about upcoming removal of packages in the General registry
- Support for ptx modules with external functions HOT 2
- Does CUDArt support cuda 8.0? HOT 1
- triggering gc based on gpu memory
- CUDArt assumptions not robust
- Precompile Error HOT 1
- Intermittent GC-related test failure (`isempty(cuda_ptrs)`) HOT 2
- New tag HOT 2
- Updated build script for visual studio 17 but get compile errors HOT 2
- error could not load library "libnvidia-ml" HOT 4
- Makefile needs to select correct gcc compiler HOT 1
- Rename types to CuArray, CuMatrix and so forth for consistency with CUDAdrv?
- CUDArt should not rely on `nvidia-smi` or `nvml` on Mac OSX HOT 31
- CUDArt fails to build when no CUDA device is present
- gcc5.4.0 support HOT 1
- OOB during package build HOT 7
- No method matching reset(::Cudadrv.CuPrimaryContext) HOT 3
- GCC Version On CUDA 8.0 HOT 3
- Unified Memory support HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cudart.jl.