Comments (3)
Further digging shows that at least part of this is user error—I was compiling with --disable-accelerator-cshift
, which unsurprisingly pushes the cshift
done within the staples onto the host, meaning a lot of data transfer. I still see 35% of CUDA time in memory, with 65% in kernels; I will dig further to try and see where that remaining time is.
from grid.
Some more notes: After the CG, there are two distinct phases visible in the profile (which can be lined up with the log file):
- The force calculation on the fermion field. This shows a lot of device->host copying (51%, with 20% host->device and only 29% in kernels), which appears to be within
Grid::SchurDifferentiableOperator
. This is particularly long for the RHMC, but is also present for the non-rational plain HMC case. - The force calculations on the gauge field for successive momentum updates. These do run on the GPU, but do not come close to fully occupying it. In the fundamental RHMC case there is almost no communication between the host and device here (13%; 87% compute), but for the adjoint HMC case the communication is still substantial (38%; 62% compute), albeit not as high as in the force calculation. I need to check how this behaves for adjoint RHMC; I suspect it will be similar to adjoint HMC. Removing this bottleneck may not speed things up by much however as the fundamental RHMC case doesn't saturate the GPU much more than the adjoint HMC case does.
Still to do:
- Dig more to see if I've missed another
configure
option that will avoid point 1 above - Test the adjoint RHMC
- Try and see why the GPU isn't saturated in point 2 above, possibly using
ncu
.
from grid.
Dig more to see if I've missed another configure option that will avoid point 1 above
Looking more closely, the wait time appears to be in calls to setCheckerboard
from MpcDeriv
and MpcDagDeriv
. There is an alternative function, acceleratorSetCheckerboard
, defined in Lattice_transfer.h
along with setCheckerboard
, but git grep
indicates it is never used anywhere in the repository. Could this be used instead here, with a configure parameter similar to --enable-accelerator-cshift
mentioned above? Or is the function not working, or unsuitable for other reasons?
Test the adjoint RHMC
As expected this behaves the same in adjoint RHMC as in adjoint HMC.
Try and see why the GPU isn't saturated in point 2 above, possibly using ncu.
Still to do—on the first attempt, my laptop couldn't open the ncu
output as it's too big, so I need to work out how to filter it.
from grid.
Related Issues (20)
- Very low acceptance for SU(2) 1 adjoint flavour RHMC HOT 2
- NERSC and ILDG files always claim to be SU(3) HOT 2
- MPI2 romio321 library fails when reading >= 2GB per rank HOT 2
- Cannot compile the gparity and adjoint versions of the CompactWilsonCloverAction
- Compilation errors and warnings build targeting Nvidia GPUs HOT 2
- GPU Benchmark_ITT segfaults with MPI and ranks > 1 HOT 9
- Create a version of Benchmark_ITT including Clover instead of Wilson
- Grid fails to build for Nc != 3
- hipcc on Crusher: function bcopy undefined (compiler does not have openmp enabled?) HOT 1
- Certain operations involving SitePropagator::scalar_object won't compile with CUDA for Nc > 3
- make install doesn't install all headers due to duplicate Config.h and Version.h HOT 3
- Using ILDG checkpointer causes a crash during write HOT 2
- Develop is broken HOT 1
- ARM NEON is broken HOT 2
- Feature request: provenance tracking
- Add hint to shm error message
- Cuda error invalid device ordinal
- Recent commit causing Grid build to fail
- The configure options --enable-setdevice and --diable-setdevice have no effect
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grid.