Coder Social home page Coder Social logo

superscientificsoftwarelaboratory / tilespgemm Goto Github PK

View Code? Open in Web Editor NEW
33.0 33.0 6.0 760 KB

Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu.

Makefile 0.16% C 70.29% C++ 27.27% Cuda 2.12% Shell 0.16%

tilespgemm's People

Contributors

luzy0726 avatar yuyaoniu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tilespgemm's Issues

run tilespgemm in cuda 12.2

hi.
I am a student who wants to use your "TileSPGEMM"code.

But unfortunately I am facing this error.

nvcc -O3 -w -arch=compute_61 -code=sm_86 -gencode=arch=compute_61,code=sm_86 -Xcompiler -fopenmp -Xcompiler -mfma main.cu -o test -I/usr/local/cuda-12.2/include -L/usr/local/cuda-12.2/lib64 -lcudart -lcusparse -D VALUE_TYPE=double
/usr/local/cuda-12.2/include/cuda_bf16.hpp(575): error: no instance of overloaded function "__half::__half" matches the specified type
attribute((host)) attribute((device)) inline attribute((always_inline)) __half::__half(const __nv_bfloat16 f)
^

1 error detected in the compilation of "main.cu".
make: *** [Makefile:29: make] Error 2

Can you please guide me? my linux is ubuntu 22.04 and my gpu is GTX 1660 TI. and i have cuda toolkit 12.2

thanks.

error:identifier "__builtin_ia32_rndscaless_round" is undefined

My environment is that:
OS: ubuntu 20.04.1
cuda: 11.5
gcc: 9.4.0
However, I meet the problem with AVX-512 when I run "make":
`sh-5.0$ make
nvcc -O3 -w -arch=compute_61 -code=sm_86 -gencode=arch=compute_61,code=sm_86 -Xcompiler -fopenmp main.cu -o test -I/usr/local/cuda-11.5/include -L/usr/local/cuda-11.5/lib64 -lcudart -lcusparse -D VALUE_TYPE=double
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9146): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9155): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14797): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14806): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1365): error: identifier "__builtin_ia32_fpclassss" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1372): error: identifier "__builtin_ia32_fpclasssd" is undefined`

error when compiling

Hi Niu!
You've done a great work in SpGEMM using Tile strategy . While I want to do some experiment in my own environment, I met with some error after 'make'. I'm testing on a NVIDIA GeForce RTX 2080 Ti with compute capacity of 7.5 in unbuntu20.04 with cuda-11.4 .
error output:

@Server:~/TileSpGEMM/src$ make
nvcc -O3 -w -arch=compute_61 -code=sm_75 -gencode=arch=compute_61,code=sm_75 -Xcompiler -fopenmp -Xcompiler -mfma main.cu -o test -I/home/zhanglx/cuda-11.4/include -L/home/zhanglx/cuda-11.4/lib64  -lcudart  -lcusparse   -D VALUE_TYPE=double
/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9146): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(9155): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14797): error: identifier "__builtin_ia32_rndscaless_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512fintrin.h(14806): error: identifier "__builtin_ia32_rndscalesd_round" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1365): error: identifier "__builtin_ia32_fpclassss" is undefined

/usr/lib/gcc/x86_64-linux-gnu/9/include/avx512dqintrin.h(1372): error: identifier "__builtin_ia32_fpclasssd" is undefined

6 errors detected in the compilation of "main.cu".
make: *** [Makefile:29: make] Error 1

1、maybe it's because of the compute capacity not matching with yours. 2、maybe it's because of the gcc --version not matching.
It would be highly appreciated if you can provide further information of your environment or give me some suggestion to solve the compilation error

Failed internal tests

Ont his version of the code: #2

and (at least) on the following matrices from https://sparse.tamu.edu/Williams:

  • cant/cant.mtx
  • pdb1HYS/pdb1HYS.mtx

with the flag -D CHECK_RESULT=1, the code produced the following output, noting that the tests have failed:

Input:

./test -d 0 -aat 0 cant/cant.mtx

Output:

--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- cant/cant.mtx --------------
input matrix A: ( 62451, 62451 ) nnz = 4007383
 loadfile time    = 0.67493 sec
the tilesize = 16
SpGEMM nnzCub = 269486473
CSR to Tile conversion uses 28.78 ms
tile space overhead = 37.74 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is  0.37 ms-------------------------

step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is  4.06 ms-------------------------

step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is  48.40 ms------------------------

-----------------------Malloc uses 0.71 ms-------------------------------
Non-empty tiles of C = 194910
nnzC = 17440029
CUDA  TileSpGEMM runtime is 53.63 ms, gflops = 10.05
-------------------------------check----------------------------------------
tile to CSR conversion complete!

--------------- SpGEMM (using cuSPARSE) ---------------
 - cuda SpGEMM start! Benchmark runs 1 times.
 - cuda SpGEMM completed!

nnzC = 0, nnzCub = 269486473, Compression rate =  inf
CUDA  cuSPARSE SpGEMM runtime is 1.3550 ms, GFlops = 397.7660
cuSPARSE failed!
---------------------------------------------------------------
---------------------------------------------------------------

Input:

./test -d 0 -aat 0 pdb1HYS/pdb1HYS.mtx

Output:

--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- pdb1HYS/pdb1HYS.mtx --------------
input matrix A: ( 36417, 36417 ) nnz = 4344765
 loadfile time    = 0.69516 sec
the tilesize = 16
SpGEMM nnzCub = 555322659
CSR to Tile conversion uses 33.98 ms
tile space overhead = 40.01 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is  0.34 ms-------------------------

step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is  6.93 ms-------------------------

step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is  93.50 ms------------------------

-----------------------Malloc uses 0.95 ms-------------------------------
Non-empty tiles of C = 221571
nnzC = 19594581
CUDA  TileSpGEMM runtime is 101.79 ms, gflops = 10.91
-------------------------------check----------------------------------------
tile to CSR conversion complete!

--------------- SpGEMM (using cuSPARSE) ---------------
 - cuda SpGEMM start! Benchmark runs 1 times.
 - cuda SpGEMM completed!

nnzC = 0, nnzCub = 555322659, Compression rate =  inf
CUDA  cuSPARSE SpGEMM runtime is 1.3250 ms, GFlops = 838.2229
cuSPARSE failed!
---------------------------------------------------------------
---------------------------------------------------------------

However, when run against https://sparse.tamu.edu/SNAP/CollegeMsg,

Input:

./test -d 0 -aat 0 CollegeMsg/CollegeMsg.mtx

Output

--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- /home/elvircrn/tug/thesis/repo/matrices/CollegeMsg/CollegeMsg.mtx --------------
input matrix A: ( 1899, 1899 ) nnz = 20296
 loadfile time    = 0.00273 sec
the tilesize = 16
SpGEMM nnzCub = 744395
CSR to Tile conversion uses 1.14 ms
tile space overhead = 0.61 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is  0.20 ms-------------------------

step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is  0.90 ms-------------------------

step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is  3.51 ms------------------------

-----------------------Malloc uses 0.46 ms-------------------------------
Non-empty tiles of C = 14154
nnzC = 407071
CUDA  TileSpGEMM runtime is 5.17 ms, gflops = 0.29
-------------------------------check----------------------------------------
tile to CSR conversion complete!

--------------- SpGEMM (using cuSPARSE) ---------------
 - cuda SpGEMM start! Benchmark runs 1 times.
 - cuda SpGEMM completed!

nnzC = 407071, nnzCub = 744395, Compression rate = 1.83
CUDA  cuSPARSE SpGEMM runtime is 1.7550 ms, GFlops = 0.8483

Validating results...
[PASSED] nnzC = 407071
[PASSED] row_pointer
[PASSED] column_index & value
---------------------------------------------------------------
---------------------------------------------------------------

the code passes it's own tests.

Let me know if more information is necessary. Therefore, I was unable to reproduce the results from the paper given this setup. Please let me know if I have made an error at some point.

Thanks,
Elvir

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.