Coder Social home page Coder Social logo

diwaspandey / mmul Goto Github PK

View Code? Open in Web Editor NEW

This project forked from coffeebeforearch/mmul

0.0 0.0 0.0 1.39 MB

Serial and parallel implementations of matrix multiplication

License: GNU General Public License v3.0

C++ 73.96% Makefile 5.82% Cuda 20.22%

mmul's Introduction

Matrix Multiplication (MMul) Benchmarks

This repository contains a number of serial and parallel benchmarks for matrix multiplication in C++. Matrix multiplication is a wonderful first operation to try your hand at optimizing for the following reasons:

  • It is a very common operation in popular fields (e.g., ML)
  • The optimizations are fairly easy to understand (they primarily deal with simple access patterns)
  • The optimizations are composable (they work better together!)
  • It is fairly easy to parallelize

And many more!

The benchmarks in this repository were written using Google Benchmark. For simplicity, all benchmarks assume square matrix of dimension N x N, where N is 384, 768, and 1152.

Benchmarks

The following section breaks down the benchmarks contained in each subdirectory.

Baseline

  • serial_mmul_bench
    • Baseline serial mmul implementation (using the classical triply-nested for loop)
  • parallel_mmul_bench
    • Baseline parallel mmul implementation (splits rows of output matrix across threads)

Blocked

  • blocked_mmul_bench
    • A serial mmul implementation which processes a block of elements at a time to exploit locality in the B matrix
  • blocked_aligned_mmul_bench
    • Same as blocked_mmul_bench but using 64-byte aligned allocations to prevent blocks from spanning cache lines
  • parallel_blocked_mmul_bench
    • A parallel blocked mmul implementation (splits rows of output matrix across threads)

Blocked Column

  • blocked_column_aligned_mmul_bench
    • A serial mmul implementation which processes a block of elements at a time, but traverses output blocks of elements in column-major order to exploit locality in the columns of B between blocks of output elements
  • parallel_blocked_column_mmul_bench
    • A parallel blocked column implementation (splits columns between threads) where work is statically mapped

Blocked Column Multi Output

  • blocked_column_multi_output_aligned_mmul_bench
    • A serial mmul implementation which processes a tile of output elements at a time, exploiting locality from each tile of B across output tile elements
  • parallel_blocked_column_multi_output_mmul_bench
    • A parallel blocked column multi output implementation (splits columns between threads) where work is statically mapped

GPU Implementations (CUDA)

  • baseline_cuda_mmul
    • A naive mmul implementation for NVIDIA GPUs written in CUDA
  • shmem_cuda_mmul
    • A cache-tiled mmul implementation for NVIDIA GPUs using shared memory

Contact Information

mmul's People

Contributors

coffeebeforearch avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.