Coder Social home page Coder Social logo

Comments (3)

edbennett avatar edbennett commented on July 19, 2024

Further digging shows that at least part of this is user error—I was compiling with --disable-accelerator-cshift, which unsurprisingly pushes the cshift done within the staples onto the host, meaning a lot of data transfer. I still see 35% of CUDA time in memory, with 65% in kernels; I will dig further to try and see where that remaining time is.

from grid.

edbennett avatar edbennett commented on July 19, 2024

Some more notes: After the CG, there are two distinct phases visible in the profile (which can be lined up with the log file):

  1. The force calculation on the fermion field. This shows a lot of device->host copying (51%, with 20% host->device and only 29% in kernels), which appears to be within Grid::SchurDifferentiableOperator. This is particularly long for the RHMC, but is also present for the non-rational plain HMC case.
  2. The force calculations on the gauge field for successive momentum updates. These do run on the GPU, but do not come close to fully occupying it. In the fundamental RHMC case there is almost no communication between the host and device here (13%; 87% compute), but for the adjoint HMC case the communication is still substantial (38%; 62% compute), albeit not as high as in the force calculation. I need to check how this behaves for adjoint RHMC; I suspect it will be similar to adjoint HMC. Removing this bottleneck may not speed things up by much however as the fundamental RHMC case doesn't saturate the GPU much more than the adjoint HMC case does.

Still to do:

  • Dig more to see if I've missed another configure option that will avoid point 1 above
  • Test the adjoint RHMC
  • Try and see why the GPU isn't saturated in point 2 above, possibly using ncu.

from grid.

edbennett avatar edbennett commented on July 19, 2024

Dig more to see if I've missed another configure option that will avoid point 1 above

Looking more closely, the wait time appears to be in calls to setCheckerboard from MpcDeriv and MpcDagDeriv. There is an alternative function, acceleratorSetCheckerboard, defined in Lattice_transfer.h along with setCheckerboard, but git grep indicates it is never used anywhere in the repository. Could this be used instead here, with a configure parameter similar to --enable-accelerator-cshift mentioned above? Or is the function not working, or unsuitable for other reasons?

Test the adjoint RHMC

As expected this behaves the same in adjoint RHMC as in adjoint HMC.

Try and see why the GPU isn't saturated in point 2 above, possibly using ncu.

Still to do—on the first attempt, my laptop couldn't open the ncu output as it's too big, so I need to work out how to filter it.

from grid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.