Coder Social home page Coder Social logo

Comments (3)

azrael417 avatar azrael417 commented on July 18, 2024

That problem goes away with smaller local volumes but I think the default choice for this buffer is a bit too small.

from grid.

paboyle avatar paboyle commented on July 18, 2024

You should be able to increase it with the

--shm 512

flag as indicated in the message:

ShmBufferMalloc exceeded shared heap size -- try increasing with ** --shm MB ** flag
Parameter specified in units of MB (megabytes)
Current value is 128

Agree the default of 128 is a bit small; I calculate 400MB for 32^4 with back of the envelope
which is prone to error.

I'm afraid this ugliness is forced on us by discovering both Cray and OPA interconnects
give more bandwidth when using two ranks per node, but run intra node MPI fairly poorly.

from grid.

azrael417 avatar azrael417 commented on July 18, 2024

What definitely helps to cure both issues to some extent is to use thread level comms to saturate BW. However, it seems that Aries does that for you automatically when you leave physical cores and even hyperthreads. We did some tests yesterday day with. Slings qphix and it seems to have automatic message progression. I have used UMT on large scale yesterday for benchmarks (a radiation transport code) and even if I asked for 64 threads per node the system thread utilization for large runs was at about 260 threads all the time. So maybe that works quite well. Using core specialization should further help, but I have not tried that yet (and that is a slurm specific thing).

Am 04.11.2016 um 23:21 schrieb Peter Boyle [email protected]:

You should be able to increase it with the

--shm 512 flag

as indicated in the message:

ShmBufferMalloc exceeded shared heap size -- try increasing with --shm flag
Parameter specified in units of MB (megabytes)
Current value is 128

Agree the default of 128 is a bit small; I calculate 400MB for 32^4 with back of the envelope
which is prone to error.

I'm afraid this ugliness is forced on us by discovering both Cray and OPA interconnects
give more bandwidth when using two ranks per node, but run intra node MPI fairly poorly.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

from grid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.