Comments (11)
I have just updated the gist (file unifiedarray.jl
), I didn't realize that was still the old version, sorry
from cudart.jl.
@cdsousa Done, see https://gist.github.com/barche/9cc583ad85dd2d02782642af04f44dd7#file-add_cuda-jl
from cudart.jl.
To confirm, I tried on my GTX 1060 and it still worked.
from cudart.jl.
FYI: CUDArt is somewhat unmaintained, me / @timholy / @vchuravy occasionally check in small compatibility fixes and tag new releases, but my time at least is spend on CUDAdrv...
That said...
Where does this difference in performance come from, and is it possible to keep the array abstraction and have it perform as well as the pointer version?
In your first example, you pass a literal pointer (to global memory) to the kernel. This pointer itself is a bitstype (primitive type in modern nomenclature), which means it is passed by value and resides in parameter space, a constant memory that doesn't require synchronization for thread accesses. Dereferencing the pointer however does need synchronization, as it points to global memory.
Your second example passes a UnifiedArray
, which is not a primitive type, but an aggregate type (LLVM nomenclature) which Julia passes by pointer. This means that there is an extra indirection, first to dereference the pointer to the UnifiedArray
, then to access the underlying pointer.
However, this should have been fixed recently JuliaGPU/CUDAnative.jl#78, so I presume you were using an older version of CUDAnative?
Are there any plans to add an array based on the Unified Memory model?
No, because up until very recently there was no benefit, except for programmability which already was pretty seamless thanks to automatic conversion at the @cuda
boundary. However, recent GPUs use page faulting + "speculative" execution to prevent having to transfer memory right away, so it might be beneficial to do so. I won't be spending time on it though (priorities...), maybe you will?
Are there any plans to wrap the CUDA8 functions, such as
cudaMemPrefetchAsync
?
No, although I don't think it would be much work. At least for CUDAdrv (are you using CUDArt for a specific reason?).
from cudart.jl.
Ah, yes, it was because of an old version, both examples run at the same speed now. I was using the runtime API because that's what the NVIDIA beginner's tutorial proposed, but I now converted it to the driver API, which was in fact very easy using @apicall
.
I'm not sure I'm the one to implement a new GPU array type at this point, considering I'm still taking baby steps with CUDA here :)
from cudart.jl.
@barche, can you share the version using CUDAdrv.jl, please?
from cudart.jl.
Thanks @barche , but I supposed you had converted the example of "Unified Memory example" to CUDAdrv...
I have been trying it myself but I'm getting errors when calling cuMemPrefetchAsync
from the driver API using the @apicall
. That was what I hoped you had already solved.
from cudart.jl.
Doesn't it work by just changing the :cuda
calls to their :cu
counterparts, using @apicall
instead of CUDArt.rt.checkerror(ccall(...))
? Also try using cuda-memcheck
for possibly better error messages.
If you can't get it to work, I can have a quick look. I won't have time for designing proper abstractions anytime soon though, but at least some working code would be a good first step.
from cudart.jl.
Ah, thank you very much @barche, that's exactly what I was looking for.
Unfortunately, it is more or less what I was trying and thus throws the same error (ERROR_INVALID_VALUE) on the cuMemPrefetchAsync
.
I will try to follow @maleadt suggestion and try to understand what's going on. Maybe there is something special about the platform I'm experimenting with, a Jetson TX1. I had already successfully used unified memory but it was on C++/JetsonTX2/RuntimeAPI...
I'll put additional questions to disccourse.
And if I have time I would like to further develop and propose the abstractions to unified memory into CUDAdrv.jl.
from cudart.jl.
Ok, probably it is because "Maxwell architectures [..] support a more limited form of Unified Memory"
I'll try in a Jetson TX2.
Thank you both.
from cudart.jl.
I'll test tonight on my machine at home to confirm it still works, it has been a while since I tried this.
from cudart.jl.
Related Issues (20)
- Tests fail on Windows with 0.6 HOT 1
- Info about upcoming removal of packages in the General registry
- Support for ptx modules with external functions HOT 2
- Does CUDArt support cuda 8.0? HOT 1
- triggering gc based on gpu memory
- CUDArt assumptions not robust
- Precompile Error HOT 1
- Intermittent GC-related test failure (`isempty(cuda_ptrs)`) HOT 2
- New tag HOT 2
- Updated build script for visual studio 17 but get compile errors HOT 2
- error could not load library "libnvidia-ml" HOT 4
- Makefile needs to select correct gcc compiler HOT 1
- Rename types to CuArray, CuMatrix and so forth for consistency with CUDAdrv?
- CUDArt should not rely on `nvidia-smi` or `nvml` on Mac OSX HOT 31
- CUDArt fails to build when no CUDA device is present
- gcc5.4.0 support HOT 1
- OOB during package build HOT 7
- No method matching reset(::Cudadrv.CuPrimaryContext) HOT 3
- GCC Version On CUDA 8.0 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cudart.jl.