Ont his version of the code: #2
and (at least) on the following matrices from https://sparse.tamu.edu/Williams:
- cant/cant.mtx
- pdb1HYS/pdb1HYS.mtx
with the flag -D CHECK_RESULT=1
, the code produced the following output, noting that the tests have failed:
Input:
./test -d 0 -aat 0 cant/cant.mtx
Output:
--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- cant/cant.mtx --------------
input matrix A: ( 62451, 62451 ) nnz = 4007383
loadfile time = 0.67493 sec
the tilesize = 16
SpGEMM nnzCub = 269486473
CSR to Tile conversion uses 28.78 ms
tile space overhead = 37.74 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is 0.37 ms-------------------------
step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is 4.06 ms-------------------------
step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is 48.40 ms------------------------
-----------------------Malloc uses 0.71 ms-------------------------------
Non-empty tiles of C = 194910
nnzC = 17440029
CUDA TileSpGEMM runtime is 53.63 ms, gflops = 10.05
-------------------------------check----------------------------------------
tile to CSR conversion complete!
--------------- SpGEMM (using cuSPARSE) ---------------
- cuda SpGEMM start! Benchmark runs 1 times.
- cuda SpGEMM completed!
nnzC = 0, nnzCub = 269486473, Compression rate = inf
CUDA cuSPARSE SpGEMM runtime is 1.3550 ms, GFlops = 397.7660
cuSPARSE failed!
---------------------------------------------------------------
---------------------------------------------------------------
Input:
./test -d 0 -aat 0 pdb1HYS/pdb1HYS.mtx
Output:
--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- pdb1HYS/pdb1HYS.mtx --------------
input matrix A: ( 36417, 36417 ) nnz = 4344765
loadfile time = 0.69516 sec
the tilesize = 16
SpGEMM nnzCub = 555322659
CSR to Tile conversion uses 33.98 ms
tile space overhead = 40.01 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is 0.34 ms-------------------------
step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is 6.93 ms-------------------------
step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is 93.50 ms------------------------
-----------------------Malloc uses 0.95 ms-------------------------------
Non-empty tiles of C = 221571
nnzC = 19594581
CUDA TileSpGEMM runtime is 101.79 ms, gflops = 10.91
-------------------------------check----------------------------------------
tile to CSR conversion complete!
--------------- SpGEMM (using cuSPARSE) ---------------
- cuda SpGEMM start! Benchmark runs 1 times.
- cuda SpGEMM completed!
nnzC = 0, nnzCub = 555322659, Compression rate = inf
CUDA cuSPARSE SpGEMM runtime is 1.3250 ms, GFlops = 838.2229
cuSPARSE failed!
---------------------------------------------------------------
---------------------------------------------------------------
However, when run against https://sparse.tamu.edu/SNAP/CollegeMsg,
Input:
./test -d 0 -aat 0 CollegeMsg/CollegeMsg.mtx
Output
--------------------------------!!!!!!!!------------------------------------
device_id = 0
---------------------------------------------------------------
Device [ 0 ] GeForce GTX 1650 Ti @ 1485.00 MHz
MAT: -------------- /home/elvircrn/tug/thesis/repo/matrices/CollegeMsg/CollegeMsg.mtx --------------
input matrix A: ( 1899, 1899 ) nnz = 20296
loadfile time = 0.00273 sec
the tilesize = 16
SpGEMM nnzCub = 744395
CSR to Tile conversion uses 1.14 ms
tile space overhead = 0.61 MB
step1 ----Calculate the number and tile-column index of tiles of matrixC---
step1 ---------------------- Runtime is 0.20 ms-------------------------
step2 --------Calculate the number of nonzeros of each tile of matrixC-----
step2 ---------------------- Runtime is 0.90 ms-------------------------
step3 ---------Calculate the val&col of nonzeros of matrixC-------------
step3 ---------------------- Runtime is 3.51 ms------------------------
-----------------------Malloc uses 0.46 ms-------------------------------
Non-empty tiles of C = 14154
nnzC = 407071
CUDA TileSpGEMM runtime is 5.17 ms, gflops = 0.29
-------------------------------check----------------------------------------
tile to CSR conversion complete!
--------------- SpGEMM (using cuSPARSE) ---------------
- cuda SpGEMM start! Benchmark runs 1 times.
- cuda SpGEMM completed!
nnzC = 407071, nnzCub = 744395, Compression rate = 1.83
CUDA cuSPARSE SpGEMM runtime is 1.7550 ms, GFlops = 0.8483
Validating results...
[PASSED] nnzC = 407071
[PASSED] row_pointer
[PASSED] column_index & value
---------------------------------------------------------------
---------------------------------------------------------------
the code passes it's own tests.
Let me know if more information is necessary. Therefore, I was unable to reproduce the results from the paper given this setup. Please let me know if I have made an error at some point.
Thanks,
Elvir