I the following problem <div class="snippet-clipboard-content notranslate position

You should be able to increase it with the --shm 512 <p dir="aut

large heap memory consumption in mpi mode about grid HOT 3 CLOSED

paboyle commented on July 18, 2024

large heap memory consumption in mpi mode

from grid.

Comments (3)

azrael417 commented on July 18, 2024

That problem goes away with smaller local volumes but I think the default choice for this buffer is a bit too small.

from grid.

paboyle commented on July 18, 2024

You should be able to increase it with the

--shm 512

flag as indicated in the message:

ShmBufferMalloc exceeded shared heap size -- try increasing with ** --shm MB ** flag
Parameter specified in units of MB (megabytes)
Current value is 128

Agree the default of 128 is a bit small; I calculate 400MB for 32^4 with back of the envelope
which is prone to error.

I'm afraid this ugliness is forced on us by discovering both Cray and OPA interconnects
give more bandwidth when using two ranks per node, but run intra node MPI fairly poorly.

from grid.

azrael417 commented on July 18, 2024

What definitely helps to cure both issues to some extent is to use thread level comms to saturate BW. However, it seems that Aries does that for you automatically when you leave physical cores and even hyperthreads. We did some tests yesterday day with. Slings qphix and it seems to have automatic message progression. I have used UMT on large scale yesterday for benchmarks (a radiation transport code) and even if I asked for 64 threads per node the system thread utilization for large runs was at about 260 threads all the time. So maybe that works quite well. Using core specialization should further help, but I have not tried that yet (and that is a slurm specific thing).

Am 04.11.2016 um 23:21 schrieb Peter Boyle [email protected]:

You should be able to increase it with the

--shm 512 flag

as indicated in the message:

ShmBufferMalloc exceeded shared heap size -- try increasing with --shm flag
Parameter specified in units of MB (megabytes)
Current value is 128

Agree the default of 128 is a bit small; I calculate 400MB for 32^4 with back of the envelope
which is prone to error.

I'm afraid this ugliness is forced on us by discovering both Cray and OPA interconnects
give more bandwidth when using two ranks per node, but run intra node MPI fairly poorly.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or mute the thread.

from grid.

Recommend Projects

large heap memory consumption in mpi mode about grid HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent