Comments (10)
Nicely documented! Made it very easy to see what you're doing and what you are concerned about.
I am not certain what's going on, but here's my hunch: is it possible that CUBLAS.gemm!
is asynchronous? Meaning, it returns immediately but the GPU is not yet done with the computation? And then your copy-back-to-CPU instruction blocks until it's done---therefore the copy-back is being blamed for time that's actually spent computing the product.
Also, a couple of possible efficiency improvements:
d_A = CudaArray(A)
already copies the data, no need to callcopy!
thereafter. See https://github.com/JuliaGPU/CUDArt.jl/blob/c7954d7a931fcf48e017a943e4ce4cd13ae6cc48/src/arrays.jl#L103.- Do you need to initialize
d_C
? If so, can you just callfill!(d_C, 0)
instead?
from cudart.jl.
A wild guess, since I literally started looking at CUDA+Julia about an hour ago, and this is a hunch from memory of old times spent in CUDA+C.
Typically, what you observe is exactly what one observes when not synchronizing device and host before and after timing. Without synchronizing, a lot of CUDA methods just dispatch the work and return almost immediately, the only ones that block and wait for a result are, well, those getting the result back. So it may very well be that your timing of d_C->C
actually times transfers to, computations on, and transfer back from the device all together.
Since I just started looking at Julia+CUDA, I don't know for sure if that's what's happening here, but the symptoms totally look like it.
from cudart.jl.
haha 1s @timholy
from cudart.jl.
Man, our timing was down to the second.
from cudart.jl.
Oh, you beat me on the second one.
from cudart.jl.
Though none of us has suggested a function to call to manually sync the device/stream yet. 3..2..1..
from cudart.jl.
You probably have to change your timing lines from @time copy!(d_B,B)
into something like
device_synchronize() ; @time (copy!(d_B,B) ; device_synchronize())
Edit: Just confirmed that this is indeed the case.
PS: I would be grateful of a copy of the full notebook (maybe as a gist) if possible, it looks like a good starting point for experimentation!
from cudart.jl.
Thanks @timholy and @lucasb-eyer for your well-timed responses :) I'll try out the tests you suggest to see if it is really just that GPU is dispatching and returning on the BLAS call and then blocking the copy. That certainly makes the most sense to me.
@lucasb-eyer : Here is a link to the notebook.
from cudart.jl.
Cheers!
from cudart.jl.
Oh, you're looking to use cuDNN? Me too, we should share experiences somehow. Send me an e-mail if you're interested.
from cudart.jl.
Related Issues (20)
- Tests fail on Windows with 0.6 HOT 1
- Info about upcoming removal of packages in the General registry
- Support for ptx modules with external functions HOT 2
- Does CUDArt support cuda 8.0? HOT 1
- triggering gc based on gpu memory
- CUDArt assumptions not robust
- Precompile Error HOT 1
- Intermittent GC-related test failure (`isempty(cuda_ptrs)`) HOT 2
- New tag HOT 2
- Updated build script for visual studio 17 but get compile errors HOT 2
- error could not load library "libnvidia-ml" HOT 4
- Makefile needs to select correct gcc compiler HOT 1
- Rename types to CuArray, CuMatrix and so forth for consistency with CUDAdrv?
- CUDArt should not rely on `nvidia-smi` or `nvml` on Mac OSX HOT 31
- CUDArt fails to build when no CUDA device is present
- gcc5.4.0 support HOT 1
- OOB during package build HOT 7
- No method matching reset(::Cudadrv.CuPrimaryContext) HOT 3
- GCC Version On CUDA 8.0 HOT 3
- Unified Memory support HOT 11
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cudart.jl.