Multiple workshops contain source code for the heat equation example that ostensibly d

python heat equation <a href="https://github.com/csc-training/hpc-python/blob/mast

python heat equation <a href="https://github.com/csc-training/hpc-python/

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Current situation and guidelines Code taken from OpenMP-offl

Homogenize source code used for stencil (heat equation) example about gpu-programming HOT 11 CLOSED

enccs commented on July 18, 2024

Homogenize source code used for stencil (heat equation) example

from gpu-programming.

Comments (11)

stepas-toliautas commented on July 18, 2024 2

I'm not sure how deeply we want / be able to explore these things in a relatively short course

This part would be half-day at most and likely less, so the answer is "not that deeply". I usually tend to be over-prepared, to be able to entertain questions from different levels. It's like with children's books; best of them have substance under the straight presentation.

and what is the expected background of users.

One point to discuss / check back: so far, I assume basic knowledge of CPU-parallel code, so at least #pragma omp parallel for ... or for (i = rowLen * workerId / numWorkers; i < rowLen * (workerId+1) / numWorkers; ++i) ... type of expressions. Do we J-word replicate that on GPU frameworks? Or do we try to include at least hints on the information presented during the lesson, on conceptual / hardware differences and so on? GoL would ostensibly be simpler, while heat transfer molds better to the use-cases. @wikfeldt @qianglise what would you say?

But understanding what they are doing might make it more engaging.

It would be best to understand both what user is doing and what the program does. The latter part is my previously-mentioned concern with GoL. After maybe half-page on physics and PDEs, the participant is left with clear mental image on how the 2D temperature map will look after 100 -- or a million -- iterations; the purpose of the algorithm is to take them there. On the other hand, single update of 2D GoL grid is probably too inexpensive to demonstrate (no point to parallelize if it takes 0.0 s anyway), while both end-of-life and 3D simulations are not as visually straightforward.

I am actually inclined to hack up a first-iteration, simple-as-possible GoL algorithm as an alternative to heat transfer and to compare how it presents; maybe it is indeed enough. The former would not carry the preconceptions baked into existing example, while the latter would tie better with the other, "intermediate-level" ENCCS lessons. I'll make the task for that, then.

from gpu-programming.

al42and commented on July 18, 2024 1

Might be too late to change the topic, but what about using Conway's Game of Life instead of a heat equation? The parallelization is almost identical (except we have integers instead of floats), but the idea is much easier to explain to an audience lacking a background in differential equations.

from gpu-programming.

qianglise commented on July 18, 2024

https://github.com/csc-training/hpc-python/blob/master/numpy/heat-equation/heat_main.py

from gpu-programming.

wikfeldt commented on July 18, 2024

Julia version (HeatEquation.jl) should be adapted
Python version (to be written) should closely match the template version

from gpu-programming.

qianglise commented on July 18, 2024

python heat equation
https://github.com/csc-training/hpc-python/blob/master/mpi/heat-equation/solution/heat-p2p.py

from gpu-programming.

stepas-toliautas commented on July 18, 2024

python heat equation https://github.com/csc-training/hpc-python/blob/master/mpi/heat-equation/solution/heat-p2p.py

Thanks for heads-up -- I had another, split-up source file version from CSC.
This is one of the reasons/ motivation for homogenizing: examples from several ENCCS lessons can be traced to CSC codes dating back to 2014 (!), and all of them are slightly different now. To be certain that we show GPU-porting techniques and not the quirks of 5 languages, I'm going to extract an algorithm/ checklist for heat equation code that can be used by anyone while porting or adapting ported examples. To not reinvent the wheel, I will start from OpenMP off-loading + SYCL, since these seem to be maintained/ validated in recent workshops.

from gpu-programming.

wikfeldt commented on July 18, 2024

interesting idea! i like a lot that GoL would be much more discipline-neutral. Only drawback AFAICT is that it'd require more work to write OpenMP, CUDA, HIP etc versions compared to heat-equation which has the CSC repo. On the other hand Stepas was anyways going to homogenise all the heat-equation versions and that takes time too. What do you think @stepas-toliautas ?

from gpu-programming.

stepas-toliautas commented on July 18, 2024

Might be too late to change the topic, but what about using Conway's Game of Life instead of a heat equation? The parallelization is almost identical (except we have integers instead of floats), but the idea is much easier to explain to an audience lacking a background in differential equations.

The GoL concept is indeed simpler, but for that same reason it limits potential or motivation to show possible optimizations / use-cases: there is one rule, one scale, just different board sizes. And one is basically limited to a single matrix update to show effects in an obvious way. (Also, to nitpick: if you have gliders and no periodic or fixed boundaries, the problem requires sparse data handling, and then "simplicity" goes out the window. But I digress.) In differential equations, you have slow vs. fast (compared to time-step) transfer, numerical stability details, which (not dumped on the users, but considered by us) may help showcase common GPU porting issues. Or, in other words, I find it harder to answer the question like: "OK, now it runs on GPU. Why should I care for it to run fast / efficient?"
Of course, I am coming from physics education background, which might mean that I overthink the problem without even realizing.

from gpu-programming.

al42and commented on July 18, 2024

@stepas-toliautas, I agree with most of the points raised. It's a toy problem with limited potential for extension.

But I'm not sure how deeply we want / be able to explore these things in a relatively short course, and what is the expected background of users. Most people can copy-paste the expression from a serial example of the code, so they can do the parallelization even without understanding the physics. But understanding what they are doing might make it more engaging.

there is one rule, one scale, just different board sizes

There are variations, and on 3D grids, there are several variations of neighborhoods used (common side / common edge / common vertex). Kinda equivalent to different stencils.

from gpu-programming.

stepas-toliautas commented on July 18, 2024

Current situation and guidelines

Code taken from OpenMP-offload / SYCL workshops and slightly adapted for clarity.
- Suggestion is to keep general structure from main():
  - that is, solution + periodic PNG snapshots + simple timings,
  - using generated initial conditions (as in setup -> field_generate()),
  - allowing for two run modes, default (./heat) and w/ provided x, y, t (./heat 2000 2000 2000),
- and do evolution by separate function (evolve()),
- while keeping everything else out of the way.
There's room for improvement still.
- Currently used code (from earlier workshops) forks during compile-time on the availability of libpng, but even without snapshots it still generates data to write to. It would be cleaner to run and benchmark the program if field_write() would be a no-op after turning off image writing; I did that as a second -D, but it could really be collapsed into a single decision.
- Tracking time is a bit awkward between OpenMP and SYCL now. I have no strong opinion on what would be better:
  - to define functions get_start(), get_end(), get_elapsed() and hide them away in utilities or where have you (this would look cleaner and allow for the same view in different languages, including C!)
  - or do not bother and use single C++ way (so... <chrono>?) for all C++ variants and idiomatic ways for other languages.

I will go for (CPU-)Python next, and hopefully that will result in enough generalization to quickly write the rest of the variants as needed/ wanted.

from gpu-programming.

stepas-toliautas commented on July 18, 2024

Closing as >50% done :) in favor of focused enhancements to an episode.

from gpu-programming.

Homogenize source code used for stencil (heat equation) example about gpu-programming HOT 11 CLOSED

Comments (11)

Current situation and guidelines

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent