Coder Social home page Coder Social logo

anonymousywl / tsm2x-imp Goto Github PK

View Code? Open in Web Editor NEW

This project forked from codyjrivera/tsm2x-imp

0.0 0.0 0.0 3.87 MB

Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA

License: MIT License

Makefile 2.23% Cuda 84.58% C++ 6.49% Python 6.70%

tsm2x-imp's Introduction

TSM2X: High-Performance Tall-and-Skinny Matrix-Matrix Multiplication on GPUs

by Cody Rivera [[email protected]], Jieyang Chen [[email protected]], and Dingwen Tao [[email protected]]

This repository contains an implementation of two irregular-shape matrix-matrix multiplication algorithms, TSM2R and TSM2L. TSM2R is designed to efficiently multiply a large square (or near-square) matrix by a tall-and-skinny matrix, or more specifically, an (m * k) and (k * n) matrix-matrix multiplication where m and k are approximately equal, and n is much smaller than k. TSM2L is designed to efficiently multiply a tall-and-skinny matrix by a small square matrix, or more specifically, an (m * k) and (k * n) matrix-matrix multiplication where k is much smaller than m, and k and n are approximately equal.

We propose TSM2R and TSM2L in our preprint, "TSM2X: High-Performance Tall-and-Skinny Matrix-MatrixMultiplication on GPUs." [1]. Our work extends an ICS conference paper [2], which introduces TSM2R, by expanding its techniques for different matrix sizes as well as porting the algorithm to the Nvidia Tesla V100.

We have implemented the kernels as templates, with the parameters t1, t2, and t3 as template variables [1]. The program will select an optimal kernel depending on the size of the input matrices. This repository currently provides a set of optimal kernels for the Nvidia V100 GPU only.

Instructions:

This implementation is designed for Unix platforms, and can be built using make. The usage of this program is: ./multiply [-d] [-i] a.mtx b.mtx c.mtx, where a.mtx and b.mtx are input matrices and c.mtx is an output matrix. -d indicates that the matrices are double-precision, while -i indicates that TSM2L (instead of TSM2R) is to be used.

The format of the matrices is binary, with a structure as follows:

template <typename FloatType>
struct matrixFormat {
    uint32_t rows, cols;
    FloatType values[rows * cols];
};

The matrix is stored in column-major format. All multibyte values are little-endian.

You may use the provided gen.cpp program to generate input matrices. The usage is ./gen [-d] -r ROW_COUNT -c COL_COUNT file, where -d signifies double precision.

You may also use the provided print.cpp program to print matrices. The usage is ./print [-d] file.

To evaluate performance across a range of inputs, a Python3 script test.py is provided. The script can be invoked with python3 test.py. The program requires that ../multiply and ../gen exist, and writes its output to CSV files.

Notes:

[1] Cody Rivera, Jieyang Chen, Nan Xiong, Shuaiwen Leon Song, and Dingwen Tao. "TSM2X: High-Performance Tall-and-Skinny Matrix-MatrixMultiplication on GPUs." 2020. arXiv:2002.03258 [cs.DC].

[2] Jieyang Chen, Nan Xiong, Xin Liang, Dingwen Tao, Sihuan Li, Kaiming Ouyang, Kai Zhao, Nathan DeBardeleben, Qiang Guan, and Zizhong Chen. "TSM2: optimizing tall-and-skinny matrix-matrix multiplication on GPUs." In Proceedings of the ACM International Conference on Supercomputing (ICS), pp. 106-116. ACM, 2019. https://doi.org/10.1145/3330345.3330355

tsm2x-imp's People

Contributors

codyjrivera avatar dingwentao avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.