Coder Social home page Coder Social logo

Comments (10)

timholy avatar timholy commented on July 20, 2024

Nicely documented! Made it very easy to see what you're doing and what you are concerned about.

I am not certain what's going on, but here's my hunch: is it possible that CUBLAS.gemm! is asynchronous? Meaning, it returns immediately but the GPU is not yet done with the computation? And then your copy-back-to-CPU instruction blocks until it's done---therefore the copy-back is being blamed for time that's actually spent computing the product.

Also, a couple of possible efficiency improvements:

from cudart.jl.

lucasb-eyer avatar lucasb-eyer commented on July 20, 2024

A wild guess, since I literally started looking at CUDA+Julia about an hour ago, and this is a hunch from memory of old times spent in CUDA+C.

Typically, what you observe is exactly what one observes when not synchronizing device and host before and after timing. Without synchronizing, a lot of CUDA methods just dispatch the work and return almost immediately, the only ones that block and wait for a result are, well, those getting the result back. So it may very well be that your timing of d_C->C actually times transfers to, computations on, and transfer back from the device all together.

Since I just started looking at Julia+CUDA, I don't know for sure if that's what's happening here, but the symptoms totally look like it.

from cudart.jl.

lucasb-eyer avatar lucasb-eyer commented on July 20, 2024

haha 1s @timholy 😄

from cudart.jl.

timholy avatar timholy commented on July 20, 2024

Man, our timing was down to the second.

from cudart.jl.

timholy avatar timholy commented on July 20, 2024

Oh, you beat me on the second one.

from cudart.jl.

lucasb-eyer avatar lucasb-eyer commented on July 20, 2024

Though none of us has suggested a function to call to manually sync the device/stream yet. 3..2..1..

from cudart.jl.

lucasb-eyer avatar lucasb-eyer commented on July 20, 2024

You probably have to change your timing lines from @time copy!(d_B,B) into something like

device_synchronize() ; @time (copy!(d_B,B) ; device_synchronize())

Edit: Just confirmed that this is indeed the case.

PS: I would be grateful of a copy of the full notebook (maybe as a gist) if possible, it looks like a good starting point for experimentation!

from cudart.jl.

eric-tramel avatar eric-tramel commented on July 20, 2024

Thanks @timholy and @lucasb-eyer for your well-timed responses :) I'll try out the tests you suggest to see if it is really just that GPU is dispatching and returning on the BLAS call and then blocking the copy. That certainly makes the most sense to me.

@lucasb-eyer : Here is a link to the notebook.

from cudart.jl.

lucasb-eyer avatar lucasb-eyer commented on July 20, 2024

Cheers!

from cudart.jl.

lucasb-eyer avatar lucasb-eyer commented on July 20, 2024

Oh, you're looking to use cuDNN? Me too, we should share experiences somehow. Send me an e-mail if you're interested.

from cudart.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.