Comments (3)
That problem goes away with smaller local volumes but I think the default choice for this buffer is a bit too small.
from grid.
You should be able to increase it with the
--shm 512
flag as indicated in the message:
ShmBufferMalloc exceeded shared heap size -- try increasing with ** --shm MB ** flag
Parameter specified in units of MB (megabytes)
Current value is 128
Agree the default of 128 is a bit small; I calculate 400MB for 32^4 with back of the envelope
which is prone to error.
I'm afraid this ugliness is forced on us by discovering both Cray and OPA interconnects
give more bandwidth when using two ranks per node, but run intra node MPI fairly poorly.
from grid.
What definitely helps to cure both issues to some extent is to use thread level comms to saturate BW. However, it seems that Aries does that for you automatically when you leave physical cores and even hyperthreads. We did some tests yesterday day with. Slings qphix and it seems to have automatic message progression. I have used UMT on large scale yesterday for benchmarks (a radiation transport code) and even if I asked for 64 threads per node the system thread utilization for large runs was at about 260 threads all the time. So maybe that works quite well. Using core specialization should further help, but I have not tried that yet (and that is a slurm specific thing).
Am 04.11.2016 um 23:21 schrieb Peter Boyle [email protected]:
You should be able to increase it with the
--shm 512 flag
as indicated in the message:
ShmBufferMalloc exceeded shared heap size -- try increasing with --shm flag
Parameter specified in units of MB (megabytes)
Current value is 128Agree the default of 128 is a bit small; I calculate 400MB for 32^4 with back of the envelope
which is prone to error.I'm afraid this ugliness is forced on us by discovering both Cray and OPA interconnects
give more bandwidth when using two ranks per node, but run intra node MPI fairly poorly.—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.
from grid.
Related Issues (20)
- Very low acceptance for SU(2) 1 adjoint flavour RHMC HOT 2
- NERSC and ILDG files always claim to be SU(3) HOT 2
- HMC on A100 spends large amounts of time in memory copy HOT 3
- MPI2 romio321 library fails when reading >= 2GB per rank HOT 2
- Cannot compile the gparity and adjoint versions of the CompactWilsonCloverAction
- Compilation errors and warnings build targeting Nvidia GPUs HOT 2
- GPU Benchmark_ITT segfaults with MPI and ranks > 1 HOT 9
- Create a version of Benchmark_ITT including Clover instead of Wilson
- Grid fails to build for Nc != 3
- hipcc on Crusher: function bcopy undefined (compiler does not have openmp enabled?) HOT 1
- Certain operations involving SitePropagator::scalar_object won't compile with CUDA for Nc > 3
- make install doesn't install all headers due to duplicate Config.h and Version.h HOT 3
- Using ILDG checkpointer causes a crash during write HOT 2
- Develop is broken HOT 1
- ARM NEON is broken HOT 2
- Feature request: provenance tracking
- Add hint to shm error message
- Cuda error invalid device ordinal
- Recent commit causing Grid build to fail
- The configure options --enable-setdevice and --diable-setdevice have no effect
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grid.