superscientificsoftwarelaboratory / tilespgemm Goto Github PK

Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu.

Makefile 0.16% C 70.29% C++ 27.27% Cuda 2.12% Shell 0.16%

tilespgemm's People

Contributors

Stargazers

Watchers

Forkers

anonymousywl elvircrn zhaozhixiang-heu anywaykkk microzhy wangdaoxia2023

tilespgemm's Issues

run tilespgemm in cuda 12.2

hi.
I am a student who wants to use your "TileSPGEMM"code.

But unfortunately I am facing this error.

nvcc -O3 -w -arch=compute_61 -code=sm_86 -gencode=arch=compute_61,code=sm_86 -Xcompiler -fopenmp -Xcompiler -mfma main.cu -o test -I/usr/local/cuda-12.2/include -L/usr/local/cuda-12.2/lib64 -lcudart -lcusparse -D VALUE_TYPE=double
/usr/local/cuda-12.2/include/cuda_bf16.hpp(575): error: no instance of overloaded function "__half::__half" matches the specified type
attribute((host)) attribute((device)) inline attribute((always_inline)) __half::__half(const __nv_bfloat16 f)
^

1 error detected in the compilation of "main.cu".
make: *** [Makefile:29: make] Error 2

Can you please guide me? my linux is ubuntu 22.04 and my gpu is GTX 1660 TI. and i have cuda toolkit 12.2

thanks.

error:identifier "__builtin_ia32_rndscaless_round" is undefined

My environment is that:
OS: ubuntu 20.04.1
cuda: 11.5
gcc: 9.4.0
However, I meet the problem with AVX-512 when I run "make":
`sh-5.0$ make
nvcc -O3 -w -arch=compute_61 -code=sm_86 -gencode=arch=compute_61,code=sm_86 -Xcompiler -fopenmp main.cu -o test -I/usr/local/cuda-11.5/include -L/usr/local/cuda-11.5/lib64 -lcudart -lcusparse -D VALUE_TYPE=double
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9146): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9155): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14797): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14806): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1365): error: identifier "__builtin_ia32_fpclassss" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1372): error: identifier "__builtin_ia32_fpclasssd" is undefined`

error when compiling

Hi Niu!
You've done a great work in SpGEMM using Tile strategy . While I want to do some experiment in my own environment, I met with some error after 'make'. I'm testing on a NVIDIA GeForce RTX 2080 Ti with compute capacity of 7.5 in unbuntu20.04 with cuda-11.4 .
error output:

@Server:~/TileSpGEMM/src$ make
nvcc -O3 -w -arch=compute_61 -code=sm_75 -gencode=arch=compute_61,code=sm_75 -Xcompiler -fopenmp -Xcompiler -mfma main.cu -o test -I/home/zhanglx/cuda-11.4/include -L/home/zhanglx/cuda-11.4/lib64  -lcudart  -lcusparse   -D VALUE_TYPE=double
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9146): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9155): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14797): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14806): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1365): error: identifier "__builtin_ia32_fpclassss" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1372): error: identifier "__builtin_ia32_fpclasssd" is undefined

6 errors detected in the compilation of "main.cu".
make: *** [Makefile:29: make] Error 1

1、maybe it's because of the compute capacity not matching with yours. 2、maybe it's because of the gcc --version not matching.
It would be highly appreciated if you can provide further information of your environment or give me some suggestion to solve the compilation error

Failed internal tests

Ont his version of the code: #2

and (at least) on the following matrices from https://sparse.tamu.edu/Williams:

cant/cant.mtx
pdb1HYS/pdb1HYS.mtx

with the flag -D CHECK_RESULT=1, the code produced the following output, noting that the tests have failed:

Input:

./test -d 0 -aat 0 cant/cant.mtx

Output:

--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- cant/cant.mtx --------------
input matrix A: ( 62451, 62451 ) nnz = 4007383
 loadfile time    = 0.67493 sec
the tilesize = 16
SpGEMM nnzCub = 269486473
CSR to Tile conversion uses 28.78 ms
tile space overhead = 37.74 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is  0.37 ms-------------------------

step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is  4.06 ms-------------------------

step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is  48.40 ms------------------------

-----------------------Malloc uses 0.71 ms-------------------------------
Non-empty tiles of C = 194910
nnzC = 17440029
CUDA  TileSpGEMM runtime is 53.63 ms, gflops = 10.05
-------------------------------check----------------------------------------
tile to CSR conversion complete!

--------------- SpGEMM (using cuSPARSE) ---------------
 - cuda SpGEMM start! Benchmark runs 1 times.
 - cuda SpGEMM completed!

nnzC = 0, nnzCub = 269486473, Compression rate =  inf
CUDA  cuSPARSE SpGEMM runtime is 1.3550 ms, GFlops = 397.7660
cuSPARSE failed!
---------------------------------------------------------------
---------------------------------------------------------------

Input:

./test -d 0 -aat 0 pdb1HYS/pdb1HYS.mtx

Output:

--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- pdb1HYS/pdb1HYS.mtx --------------
input matrix A: ( 36417, 36417 ) nnz = 4344765
 loadfile time    = 0.69516 sec
the tilesize = 16
SpGEMM nnzCub = 555322659
CSR to Tile conversion uses 33.98 ms
tile space overhead = 40.01 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is  0.34 ms-------------------------

step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is  6.93 ms-------------------------

step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is  93.50 ms------------------------

-----------------------Malloc uses 0.95 ms-------------------------------
Non-empty tiles of C = 221571
nnzC = 19594581
CUDA  TileSpGEMM runtime is 101.79 ms, gflops = 10.91
-------------------------------check----------------------------------------
tile to CSR conversion complete!

--------------- SpGEMM (using cuSPARSE) ---------------
 - cuda SpGEMM start! Benchmark runs 1 times.
 - cuda SpGEMM completed!

nnzC = 0, nnzCub = 555322659, Compression rate =  inf
CUDA  cuSPARSE SpGEMM runtime is 1.3250 ms, GFlops = 838.2229
cuSPARSE failed!
---------------------------------------------------------------
---------------------------------------------------------------

However, when run against https://sparse.tamu.edu/SNAP/CollegeMsg,

Input:

./test -d 0 -aat 0 CollegeMsg/CollegeMsg.mtx

Output

--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- /home/elvircrn/tug/thesis/repo/matrices/CollegeMsg/CollegeMsg.mtx --------------
input matrix A: ( 1899, 1899 ) nnz = 20296
 loadfile time    = 0.00273 sec
the tilesize = 16
SpGEMM nnzCub = 744395
CSR to Tile conversion uses 1.14 ms
tile space overhead = 0.61 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is  0.20 ms-------------------------

step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is  0.90 ms-------------------------

step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is  3.51 ms------------------------

-----------------------Malloc uses 0.46 ms-------------------------------
Non-empty tiles of C = 14154
nnzC = 407071
CUDA  TileSpGEMM runtime is 5.17 ms, gflops = 0.29
-------------------------------check----------------------------------------
tile to CSR conversion complete!

--------------- SpGEMM (using cuSPARSE) ---------------
 - cuda SpGEMM start! Benchmark runs 1 times.
 - cuda SpGEMM completed!

nnzC = 407071, nnzCub = 744395, Compression rate = 1.83
CUDA  cuSPARSE SpGEMM runtime is 1.7550 ms, GFlops = 0.8483

Validating results...
[PASSED] nnzC = 407071
[PASSED] row_pointer
[PASSED] column_index & value
---------------------------------------------------------------
---------------------------------------------------------------

the code passes it's own tests.

Let me know if more information is necessary. Therefore, I was unable to reproduce the results from the paper given this setup. Please let me know if I have made an error at some point.

Thanks,
Elvir

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.