Comments (5)
yes... it is incorrect at present. Thanks.
Hasn't been tested for some time, and fails under OpenSHMEM, and yo may have saved me a lot of time debugging this.
Also -- perhaps you can help me. I'm really annoyed that the OpenSHMEM and CraySHMEM argument ordering for shmem_align is reversed. Any comments ?
Peter
from grid.
I'm really annoyed that the OpenSHMEM and CraySHMEM argument ordering for shmem_align is reversed.
It was a Bug, fixed now. Fix will be available from CraySHMEM/7.5.1
This Bug went unnoticed because, shmem_align is one those least used routines in CraySHMEM and there are some fundamental functional differences between OpenSHMEM and CraySHMEM on this routine. By default, in CraySHMEM the maximum alignment value allowed on shmem_align routine is 64 bytes. And, we don't prefer users to self align anything more than 64 bytes and attempting anything more would error out. This is because, supporting alignment values greater than 64 bytes would create too much memory wastage.
That said, if there are any actual use cases which shows some performance benefits for alignments greater than 64 bytes, we can always look for ways to implement it. Let me know your shmem_align usage, I can look at it.
from grid.
The align is normally the L2 line size, which on Intel is 64Bytes, but I would prefer to support
128B on Power for example.
See no need to go to a page size for alignment, though, despite prefetches not crossing page boundaries. Rather, page sizes and cache line sizes should both go up after around 45 years....
I honestly think Intel should move to a larger L2 line size as L2 prefetch
overhead gets suppressed by the line size. Issuing L2 prefetch, L1 prefetch and load for each individual 512 bit vector is now ridiculous.
Clearly if the vectors went to 1024 bits, they and you would need >128B align, but... surely the above argument about line touching is true.
Or to put this in more "sexy" Hennessy and Patterson language, there is no way to obtain gain from spatial locality of reference in the memory system when the vector size is equal to the cache line size. :)
from grid.
That said, if there are any actual use cases which shows some performance benefits for alignments greater than 64 bytes, we can always look for ways to implement it. Let me know your shmem_align usage, I can look at it.
FYI, shmem_align() in Cray SHMEM is fixed. We can use any value for alignment size.
from grid.
Just a small comment -- Chris Kelly cleaned up the SHMEM comms recently and made it all work again. Closing thanks.
from grid.
Related Issues (20)
- Very low acceptance for SU(2) 1 adjoint flavour RHMC HOT 2
- NERSC and ILDG files always claim to be SU(3) HOT 2
- HMC on A100 spends large amounts of time in memory copy HOT 3
- MPI2 romio321 library fails when reading >= 2GB per rank HOT 2
- Cannot compile the gparity and adjoint versions of the CompactWilsonCloverAction
- Compilation errors and warnings build targeting Nvidia GPUs HOT 2
- GPU Benchmark_ITT segfaults with MPI and ranks > 1 HOT 9
- Create a version of Benchmark_ITT including Clover instead of Wilson
- Grid fails to build for Nc != 3
- hipcc on Crusher: function bcopy undefined (compiler does not have openmp enabled?) HOT 1
- Certain operations involving SitePropagator::scalar_object won't compile with CUDA for Nc > 3
- make install doesn't install all headers due to duplicate Config.h and Version.h HOT 3
- Using ILDG checkpointer causes a crash during write HOT 2
- Develop is broken HOT 1
- ARM NEON is broken HOT 2
- Feature request: provenance tracking
- Add hint to shm error message
- Cuda error invalid device ordinal
- Recent commit causing Grid build to fail
- The configure options --enable-setdevice and --diable-setdevice have no effect
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grid.