Coder Social home page Coder Social logo

ashvardanian / parallelreductionsbenchmark Goto Github PK

View Code? Open in Web Editor NEW
60.0 7.0 4.0 17.66 MB

Thrust, CUB, TBB, AVX2, CUDA, OpenCL, OpenMP, SyCL - all it takes to sum a lot of numbers fast!

Home Page: https://ashvardanian.com/posts/cuda-parallel-reductions/

C++ 95.34% C 3.81% Metal 0.09% CMake 0.77%
gpu-computing halide sycl vulkan glsl opencl gpu cuda thrust stl

parallelreductionsbenchmark's Introduction

Parallel Reductions on CPUs & GPUs

This repo contains educational examples and benchmarks of GPU backends. The older versions also included data-parallel operations and dense matrix multiplications (GEMM), but now it's just fast parallel reductions. Aside from baseline std::accumulate it compares:

  • AVX2 single-threaded, but SIMD-parallel code.
  • OpenMP reduction clause.
  • Thrust with it's thrust::reduce.
  • CUDA kernels with warp-reductions.
  • OpenCL kernels, eight of them.
  • Parallel STL <algorithm>s in GCC with Intel oneTBB.

Previously it also compared ArrayFire, Halide, Vulkan queues for SPIR-V kernels and SyCL. Examples were collected from early 2010s until 2019, and later updated in 2022.

Build & Run in 1 Line

Following script will, by default, generate a 1GB array of numbers, and reduce them using every available backend. All the classical Google Benchmark arguments are supported, including --benchmark_filter=opencl. All the needed library dependencies will be automatically fetched: GTest, GBench, Intel oneTBB, FMT and Thrust with CUB. It's expected, that you build this on an x86 machine with CUDA drivers installed.

cmake -B ./build_release
cmake --build ./build_release --config Release
./build_release/reduce_bench # To run all available benchmarks on default array size
./build_release/reduce_bench --benchmark_filter="" # Control Google Benchmark params
PARALLEL_REDUCTIONS_LENGTH=1000 ./build_release/reduce_bench # Try different array size

Need a more fine-grained control to run only CUDA-based backends?

cmake -DCMAKE_CUDA_COMPILER=nvcc -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -B ./build_release
cmake --build ./build_release --config Release
./build_release/reduce_bench --benchmark_filter=cuda

To debug or introspect, procedure is similar:

cmake -DCMAKE_CUDA_COMPILER=nvcc -DCMAKE_C_COMPILER=gcc -DCMAKE_CXX_COMPILER=g++ -DCMAKE_BUILD_TYPE=Debug -B ./build_debug
cmake --build ./build_debug --config Debug

And then run your favorite debugger.

Optional backends:

  • To enable Intel OpenCL on CPUs: apt-get install intel-opencl-icd.
  • To run on integrated Intel GPU, follow this guide.

parallelreductionsbenchmark's People

Contributors

ashvardanian avatar darvinharutyunyan avatar ishkhan42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

parallelreductionsbenchmark's Issues

Add support for partial builds

On some architectures, we won't have access to CUDA.
On others, OpenCL might be an issue.
Furthermore, the compiler flags for non-GCC & non-NVCC builds have to be properly configured.
That would require a complete rewrite of CMakeLists.txt.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.