Coder Social home page Coder Social logo

Comments (11)

stepas-toliautas avatar stepas-toliautas commented on July 18, 2024 2

I'm not sure how deeply we want / be able to explore these things in a relatively short course

This part would be half-day at most and likely less, so the answer is "not that deeply". I usually tend to be over-prepared, to be able to entertain questions from different levels. It's like with children's books; best of them have substance under the straight presentation.

and what is the expected background of users.

One point to discuss / check back: so far, I assume basic knowledge of CPU-parallel code, so at least #pragma omp parallel for ... or for (i = rowLen * workerId / numWorkers; i < rowLen * (workerId+1) / numWorkers; ++i) ... type of expressions. Do we J-word replicate that on GPU frameworks? Or do we try to include at least hints on the information presented during the lesson, on conceptual / hardware differences and so on? GoL would ostensibly be simpler, while heat transfer molds better to the use-cases. @wikfeldt @qianglise what would you say?

But understanding what they are doing might make it more engaging.

It would be best to understand both what user is doing and what the program does. The latter part is my previously-mentioned concern with GoL. After maybe half-page on physics and PDEs, the participant is left with clear mental image on how the 2D temperature map will look after 100 -- or a million -- iterations; the purpose of the algorithm is to take them there. On the other hand, single update of 2D GoL grid is probably too inexpensive to demonstrate (no point to parallelize if it takes 0.0 s anyway), while both end-of-life and 3D simulations are not as visually straightforward.

I am actually inclined to hack up a first-iteration, simple-as-possible GoL algorithm as an alternative to heat transfer and to compare how it presents; maybe it is indeed enough. The former would not carry the preconceptions baked into existing example, while the latter would tie better with the other, "intermediate-level" ENCCS lessons. I'll make the task for that, then.

from gpu-programming.

al42and avatar al42and commented on July 18, 2024 1

Might be too late to change the topic, but what about using Conway's Game of Life instead of a heat equation? The parallelization is almost identical (except we have integers instead of floats), but the idea is much easier to explain to an audience lacking a background in differential equations.

from gpu-programming.

qianglise avatar qianglise commented on July 18, 2024

https://github.com/csc-training/hpc-python/blob/master/numpy/heat-equation/heat_main.py

from gpu-programming.

wikfeldt avatar wikfeldt commented on July 18, 2024

Julia version (HeatEquation.jl) should be adapted
Python version (to be written) should closely match the template version

from gpu-programming.

qianglise avatar qianglise commented on July 18, 2024

python heat equation
https://github.com/csc-training/hpc-python/blob/master/mpi/heat-equation/solution/heat-p2p.py

from gpu-programming.

stepas-toliautas avatar stepas-toliautas commented on July 18, 2024

python heat equation https://github.com/csc-training/hpc-python/blob/master/mpi/heat-equation/solution/heat-p2p.py

Thanks for heads-up -- I had another, split-up source file version from CSC.
This is one of the reasons/ motivation for homogenizing: examples from several ENCCS lessons can be traced to CSC codes dating back to 2014 (!), and all of them are slightly different now. To be certain that we show GPU-porting techniques and not the quirks of 5 languages, I'm going to extract an algorithm/ checklist for heat equation code that can be used by anyone while porting or adapting ported examples. To not reinvent the wheel, I will start from OpenMP off-loading + SYCL, since these seem to be maintained/ validated in recent workshops.

from gpu-programming.

wikfeldt avatar wikfeldt commented on July 18, 2024

interesting idea! i like a lot that GoL would be much more discipline-neutral. Only drawback AFAICT is that it'd require more work to write OpenMP, CUDA, HIP etc versions compared to heat-equation which has the CSC repo. On the other hand Stepas was anyways going to homogenise all the heat-equation versions and that takes time too. What do you think @stepas-toliautas ?

from gpu-programming.

stepas-toliautas avatar stepas-toliautas commented on July 18, 2024

Might be too late to change the topic, but what about using Conway's Game of Life instead of a heat equation? The parallelization is almost identical (except we have integers instead of floats), but the idea is much easier to explain to an audience lacking a background in differential equations.

The GoL concept is indeed simpler, but for that same reason it limits potential or motivation to show possible optimizations / use-cases: there is one rule, one scale, just different board sizes. And one is basically limited to a single matrix update to show effects in an obvious way. (Also, to nitpick: if you have gliders and no periodic or fixed boundaries, the problem requires sparse data handling, and then "simplicity" goes out the window. But I digress.) In differential equations, you have slow vs. fast (compared to time-step) transfer, numerical stability details, which (not dumped on the users, but considered by us) may help showcase common GPU porting issues. Or, in other words, I find it harder to answer the question like: "OK, now it runs on GPU. Why should I care for it to run fast / efficient?"
Of course, I am coming from physics education background, which might mean that I overthink the problem without even realizing.

from gpu-programming.

al42and avatar al42and commented on July 18, 2024

@stepas-toliautas, I agree with most of the points raised. It's a toy problem with limited potential for extension.

But I'm not sure how deeply we want / be able to explore these things in a relatively short course, and what is the expected background of users. Most people can copy-paste the expression from a serial example of the code, so they can do the parallelization even without understanding the physics. But understanding what they are doing might make it more engaging.

there is one rule, one scale, just different board sizes

There are variations, and on 3D grids, there are several variations of neighborhoods used (common side / common edge / common vertex). Kinda equivalent to different stencils.

from gpu-programming.

stepas-toliautas avatar stepas-toliautas commented on July 18, 2024

Current situation and guidelines

  1. Code taken from OpenMP-offload / SYCL workshops and slightly adapted for clarity.
    • Suggestion is to keep general structure from main():
      • that is, solution + periodic PNG snapshots + simple timings,
      • using generated initial conditions (as in setup -> field_generate()),
      • allowing for two run modes, default (./heat) and w/ provided x, y, t (./heat 2000 2000 2000),
    • and do evolution by separate function (evolve()),
    • while keeping everything else out of the way.
  2. There's room for improvement still.
    • Currently used code (from earlier workshops) forks during compile-time on the availability of libpng, but even without snapshots it still generates data to write to. It would be cleaner to run and benchmark the program if field_write() would be a no-op after turning off image writing; I did that as a second -D, but it could really be collapsed into a single decision.
    • Tracking time is a bit awkward between OpenMP and SYCL now. I have no strong opinion on what would be better:
      • to define functions get_start(), get_end(), get_elapsed() and hide them away in utilities or where have you (this would look cleaner and allow for the same view in different languages, including C!)
      • or do not bother and use single C++ way (so... <chrono>?) for all C++ variants and idiomatic ways for other languages.

I will go for (CPU-)Python next, and hopefully that will result in enough generalization to quickly write the rest of the variants as needed/ wanted.

from gpu-programming.

stepas-toliautas avatar stepas-toliautas commented on July 18, 2024

Closing as >50% done :) in favor of focused enhancements to an episode.

from gpu-programming.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.