Coder Social home page Coder Social logo

enccs / gpu-programming Goto Github PK

View Code? Open in Web Editor NEW
49.0 8.0 19.0 8.73 MB

Meta-GPU lesson covering general aspects of GPU programming as well as specific frameworks

Home Page: https://enccs.github.io/gpu-programming/

License: Creative Commons Attribution 4.0 International

Makefile 3.31% CSS 1.00% Python 6.24% Batchfile 0.28% C 8.80% Fortran 30.84% C++ 24.67% Cuda 17.27% Shell 4.22% Julia 3.36%

gpu-programming's People

Contributors

al42and avatar code4yonglei avatar csccva avatar dennispan avatar heikonenj avatar hichamagueny avatar hokkanen avatar linguist89 avatar pojeda avatar qianglise avatar rkdarst avatar stepas-toliautas avatar weilipenguin avatar wikfeldt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gpu-programming's Issues

episode 4: GPU programming concepts

  • confusing point: why kernels? why no explicit loops?
  • drive home the logic with threadIds etc
  • memory: can't expect to have non-contiguous arrays and get performance
  • don't expect to get easy performance! First attempt might well be slower than CPU version

Game of Life as a stencil demo example

The idea is to try and code an understandable, simple example for CPU / GPU parallelization from scratch, without carrying mental load from the earlier, already GPU-ported examples, and the suggestion is Game Of Life.
Sketching OpenMP-offload, CUDA or SYCL and Python or Julia variants should be enough to see if the concept presents itself well on different types of GPU frameworks.

HIP-python with a bit more detail

The HIP-python part will complement the section "High-level language support". It will be addressed with a bit more detail. Some initial thoughts are:

-Accessing the GPU’s properties from HIP-python
-Managing/allocating memory from HIP-python
-Compiling kernels from HIP-python
-Launching kernels from HIP-python
-Creating Streams and Events from HIP-python
-Using HIP-python’s library e.g. hipBLAS
-...

Examples Ep: timings and comparison / discussion

To be completed after running example codes on the host cluster.

One question so far is: is it OK/ enough to show successful GPU load and not necessarily observable speedup?
At least for OpenMP offload (and possibly Numba-CUDA), explicit data placement is also needed to achieve wall times that are better than CPU counterparts. Default data rules (=less extra code) work and engage GPU, but are slower.

LICENSE file is a concatenation of several licenses

Apparently, one should have been chosen by the templating engine, but it did not do anything

https://github.com/ENCCS/gpu-programming/blob/bfbc1fd6c21788bcaebf877fa745c6a47e06ed64/LICENSE

{% if cookiecutter.lesson_license == 'CC-BY-4.0' -%}
Attribution 4.0 International
[....]

Creative Commons may be contacted at creativecommons.org.
{% elif cookiecutter.lesson_license == 'CC-BY-SA-4.0' %}
Attribution-ShareAlike 4.0 International
....
....

Creative Commons may be contacted at creativecommons.org.
{% endif %}

Same with LICENSE.code

Examples Ep: add CUDA variant

If someone is enthusiastic for Kokkos, that could be done as well, but so far there are

  • directive based (OpenMP offloading),
  • language based (Python-numba in progress),
  • "portable" kernel based (SYCL) implementations,

but no CUDA or HIP. So CUDA would be useful.

episodes 8-N: Diving into the frameworks

(prototype outline for each framework, each episode is one framework or group of frameworks)

  • introducing two example problems: heat equation and reduction
  • showing example problems in all different frameworks, with detailed explanation of steps

Examples Ep: add Julia variant

Is there enough by now to "translate" to Julia with little complication?
(GPU-ifying Python variant may be needed first.)

code examples for julia programming

  • For mac M2 GPU, if the Float64 type does not work, switch it to Float32.
  • For the write your own kernels, the grid in one block cell should be groups

episode 7: GPU programming options

  • low-level, directive-based, kernel-based, ... options.
  • low-level: CUDA, HIP
  • directive-based: OpenMP, OpenACC (incremental approach)
  • language-based: Python-numba, Julia-GPU, SYCL, TensorFlow/Pytorch

Issue in GPU concepts, NVIDIA architectures and branch divergence in kernel

On 4-gpu-concepts.rst:124:

On some architectures, all members of a :abbr:`warp` have to execute the 
same instruction, so-called "lock-step" execution. This is done to achieve 
higher performance, but there are some drawbacks. If a an **if** statement 
is present inside a warp will cause the warp to be executed more than once, 
one time for each branch. On architectures without lock-step execution, such 
as NVIDIA Volta (e.g., GeForce 16xx-series) or newer, warp divergence is less costly.

To my understanding, GeForce 16xx-series is not an example of a NVIDIA Volta or newer. This might need be verified and potentially modified. I would also maybe clarify the claim about the if-statement; from what I've understood, there would be branch divergence only if the if-statement is evaluated at runtime (not templated branch), and multiple threads withing a single warp actually execute different branch of the if statement.

Update first exercise from Directives ep.

The "Exercise: Change the levels of parallelism" from the "Directive-based models" episode should be updated to state more clearly what the users might want to change, where, and what is the expected output/ changes in output. Because many participants are confused here.

Examples Ep: adapt Numba-Python for GPU

Current code is accelerated by Numba JIT (almost to C++ performance), but is not offloaded to GPU yet.
Also, an alternative of main loop is in a comment, because one is faster with (CPU-)JIT, the other without it (using NumPy vectorization), and I'm not yet sure which form the GPU kernel will take.

lumi accounts

lesson learned not about the lesson material but about managing workshop projects on LUMI:
it'll probably be better to send direct individual puhuri invite links to participants, because the automatic emails sent via puhuri often get lost, overlooked by participants, stuck in spam filters etc

Homogenize source code used for stencil (heat equation) example

Multiple workshops contain source code for the heat equation example that ostensibly does the same (even with the same implementation of visualization, i. e., PNG writer) but very likely is already slightly different in each case.
Many of the variants have their roots in C, but some or all of these are actually interoperable with C++. So I'd like to get an overall feel what makes the most sense: maintain base/ serial code in C (as it was written in 2014 or earlier), in C++ (which offers useful programming paradigms not available in C) or both?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.