Coder Social home page Coder Social logo

Comments (12)

gau-nernst avatar gau-nernst commented on June 9, 2024 4

4070Ti Super, running Ubuntu 22.04.
torch==2.4.0.dev20240426+cu121
bfloat16, cutlass

Fixed k

m n k sparse_latency (ms) dense_latency (ms) speedup (d/s)
0 3072 3072 10240 1.10574 2.131 1.92722
1 4096 4096 10240 1.9605 3.73044 1.9028
2 5120 5120 10240 3.12083 6.10269 1.95547
3 6144 6144 10240 4.74411 8.79509 1.8539
4 7168 7168 10240 7.29741 11.9486 1.63738
5 8192 8192 10240 10.6073 15.4296 1.45462
6 9216 9216 10240 13.6835 19.1741 1.40125
7 10240 10240 10240 16.8367 23.4461 1.39256
8 11264 11264 10240 20.37 28.2801 1.38832
9 12288 12288 10240 24.1402 33.545 1.38959
10 13312 13312 10240 28.4292 39.2493 1.3806
11 14336 14336 10240 32.851 45.5614 1.38691
12 15360 15360 10240 37.7906 54.6426 1.44593
13 16384 16384 10240 42.789 63.5041 1.48412
14 17408 17408 10240 48.5377 69.684 1.43567
15 18432 18432 10240 54.2561 77.7116 1.43231
16 19456 19456 10240 60.3411 85.183 1.41169
17 20480 20480 10240 66.7151 97.5466 1.46214

Fixed mn

m n k sparse_latency (ms) dense_latency (ms) speedup (d/s)
0 10240 10240 2560 3.12135 6.23817 1.99855
1 10240 10240 3840 4.59394 9.28166 2.02041
2 10240 10240 5120 7.15086 12.251 1.71322
3 10240 10240 6400 10.5324 14.7059 1.39625
4 10240 10240 7680 13.0499 18.0573 1.38372
5 10240 10240 8960 15.3995 20.6897 1.34353
6 10240 10240 10240 16.8406 23.4697 1.39364
7 10240 10240 11520 19.2673 26.2984 1.36493
8 10240 10240 12800 20.9322 29.0503 1.38782
9 10240 10240 14080 23.14 31.9612 1.38121
10 10240 10240 15360 25.6844 34.6865 1.35049
11 10240 10240 16640 26.2421 37.4893 1.42859
12 10240 10240 17920 30.1967 40.3297 1.33556
13 10240 10240 19200 32.4673 43.1666 1.32954
14 10240 10240 20480 33.5382 46.002 1.37163

SAM ViT-B shapes

m n k sparse_latency (ms) dense_latency (ms) speedup (d/s)
0 32768 768 3072 1.22253 1.7901 1.46426
1 32768 2304 768 0.787232 1.33425 1.69486
2 32768 3072 768 1.04701 1.74003 1.66191
3 32768 768 768 0.271155 0.437884 1.61488
4 39200 2304 768 0.948154 1.5765 1.66271
5 39200 768 768 0.324627 0.510302 1.57196

I omit some redundant columns from the saved csv file. correct and contiguous columns are all True.

from ao.

jcaip avatar jcaip commented on June 9, 2024 1

@philipbutler as a sanity check - can you run using the 2.3 release instead of the nightlies?

I think this might be an issue with windows, but I'm not sure.

from ao.

msaroufim avatar msaroufim commented on June 9, 2024 1

Nice work @gau-nernst pretty cool to see results that seem uniformily faster
@philipbutler would highly recommend using WSL or dual booting (I personally dual boot), getting windows and cuda to work is just not worth it

from ao.

jcaip avatar jcaip commented on June 9, 2024 1

@gau-nernst 💯 Thanks for running these - that's awesome! For others reading, I'd like to collect these, with our A100 results somewhere. So please contribute and I'll collate these together in a nice doc. We can also collect block sparse microbenchmarks too, I know @cpuhrsch is interested in those.

@philipbutler Thank you for giving it a shot + your edits we're super helpful too :) . Yeah I think I agree with mark that dual booting linux is probably the easiest solution - but could you open an issue for tracking purposes (feel free to tag me) in pytorch about lack of windows support for semi-structured sparsity?

from ao.

philipbutler avatar philipbutler commented on June 9, 2024

Had to set up this PC, so had to do a clean Python install, and noticing neither pandas nor tqdm is in requirements.txt

from ao.

philipbutler avatar philipbutler commented on June 9, 2024

The benchmark command should use --dtype bf16

from ao.

philipbutler avatar philipbutler commented on June 9, 2024

Ran into RuntimeError: sparse_semi_structured_mad_op : CUTLASS not supported

Consider adding install CUDA 12.1 and the CUTLASS Quickstart to the steps.
Running through it now!

(I'm confused rn)

from ao.

philipbutler avatar philipbutler commented on June 9, 2024

Actually, @jcaip, does it make sense that to_sparse_semi_structured(torch.ones(256, 256).half().cuda()) works, but running the first benchmark script shows RuntimeError: sparse_semi_structured_mad_op : CUTLASS not supported ?

from ao.

jcaip avatar jcaip commented on June 9, 2024

That's strange to me @philipbutler let me think for a bit

Can you open powershell and run nvidia-smi and screenshot the results?

from ao.

philipbutler avatar philipbutler commented on June 9, 2024

@jcaip
image

from ao.

philipbutler avatar philipbutler commented on June 9, 2024

@jcaip Just making this easy as possible for future benchmarking, step 2 should say

import torch
from torch.sparse import to_sparse_semi_structured
to_sparse_semi_structured(torch.ones(256, 256).half().cuda())

from ao.

philipbutler avatar philipbutler commented on June 9, 2024

@philipbutler as a sanity check - can you run using the 2.3 release instead of the nightlies?

I think this might be an issue with windows, but I'm not sure.

@jcaip Same error with the 2.3 release

from ao.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.