Coder Social home page Coder Social logo

hankbo / matrix-matrix-multiply Goto Github PK

View Code? Open in Web Editor NEW

This project forked from romz-pl/matrix-matrix-multiply

0.0 0.0 0.0 28 KB

Algorithms for matrix matrix multiplication, dgemm, only for AVX-256

License: MIT License

C++ 96.41% C 0.67% CMake 2.92%

matrix-matrix-multiply's Introduction

Algorithms for matrix matrix multiplication, dgemm

The algorithms are taken form the books:

  1. David A. Patterson, John L. Hennessy "Computer Organization and Design. The hardware software interface. RISK-V Edition",
  2. David A. Patterson, John L. Hennessy "Computer Organization and Design. The hardware software interface. MIPS Edition"

There are the following algorithms implemented:

  1. Basic, unoptimized, see src/basic.cpp
  2. Using AVX with 256-bit intrinsics, see src/avx256.cpp
  3. Using AVX with 512-bit intinsics, see src/avx512.cpp
  4. Using AVX with 512-bit intinsics with loop unrolling, see src/avx512_subword_parallel.cpp
  5. Basic, unoptimized with blocking (use blocks), see src/basic_blocked.cpp

How to build?

To build the system, execute the following commands:

  1. git clone https://github.com/romz-pl/matrix-matrix-multiply
  2. cd matrix-matrix-multiply
  3. mkdir build
  4. cd build
  5. cmake ..
  6. make
  7. ./src/dgemm

The command ./src/dgemm executes the programm.

Results

  1. For Core i7 CPU, with matrix size equal to 128, I obtained the following results averaged over 1000 randomly generated matrices:
         dgemm_basic:  elapsed-time=      1661
 dgemm_basic_blocked:  elapsed-time=      1260     speed-up=   1.31825
        dgemm_avx256:  elapsed-time=       443     speed-up=   3.74944
        dgemm_avx512:  elapsed-time=       233     speed-up=   7.12876
      dgemm_unrolled:  elapsed-time=       106     speed-up=   15.6698
       dgemm_blocked:  elapsed-time=       100     speed-up=     16.61
  1. For Core i7 CPU, with matrix size equal to 640, I obtained the following results averaged over 10 randomly generated matrices:
         dgemm_basic:  elapsed-time=    241958
 dgemm_basic_blocked:  elapsed-time=    162224     speed-up=   1.49151
        dgemm_avx256:  elapsed-time=     66246     speed-up=   3.65242
        dgemm_avx512:  elapsed-time=     35604     speed-up=   6.79581
      dgemm_unrolled:  elapsed-time=     16634     speed-up=    14.546
       dgemm_blocked:  elapsed-time=     12981     speed-up=   18.6394
  1. For Core i7 CPU, with matrix size equal to 1280, I obtained the following results averaged over 5 randomly generated matrices:
         dgemm_basic:  elapsed-time=   4592295
 dgemm_basic_blocked:  elapsed-time=   1626700     speed-up=   2.82307
        dgemm_avx256:  elapsed-time=   1227037     speed-up=   3.74259
        dgemm_avx512:  elapsed-time=    637091     speed-up=   7.20822
      dgemm_unrolled:  elapsed-time=    558080     speed-up=   8.22874
       dgemm_blocked:  elapsed-time=    181634     speed-up=   25.2832
  1. For Core i7 CPU, with matrix size equal to 2560, I obtained the following results for one randomly generated matrices:
         dgemm_basic:  elapsed-time=  62731813
 dgemm_basic_blocked:  elapsed-time=  16474759     speed-up=   3.80775
        dgemm_avx256:  elapsed-time=  17050012     speed-up=   3.67928
        dgemm_avx512:  elapsed-time=   9012450     speed-up=   6.96057
      dgemm_unrolled:  elapsed-time=   5958033     speed-up=   10.5289
       dgemm_blocked:  elapsed-time=   1837494     speed-up=   34.1399
  1. For Core i7 CPU, with matrix size equal to 5120, I obtained the following results for one randomly generated matrices:
        dgemm_basic:  elapsed-time=1154120417
 dgemm_basic_blocked:  elapsed-time= 137582063     speed-up=    8.3886
        dgemm_avx256:  elapsed-time= 297156247     speed-up=   3.88388
        dgemm_avx512:  elapsed-time= 144941094     speed-up=   7.96269
      dgemm_unrolled:  elapsed-time=  97428303     speed-up=   11.8458
       dgemm_blocked:  elapsed-time=  18558107     speed-up=   62.1896

matrix-matrix-multiply's People

Contributors

romz-pl avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.