Coder Social home page Coder Social logo

Comments (6)

skykongkong8 avatar skykongkong8 commented on September 23, 2024 1

@s-debadri As we discussed, please get started from here :) Thanks a lot!

from nntrainer.

skykongkong8 avatar skykongkong8 commented on September 23, 2024 1

Current Status : 08.04.2024

Unittest output using Galaxy S23 with #2541

GEMM dimension fp32 prev 8x8 f16-f32 8x16 full-f16
4096 square 2087 ms 7172 ms ... 1964 ms 1452 ms
2048 square 260 ms 413 ms ... 250 ms 185 ms
1024 square 34 ms 52 ms ... 30 ms 103 ms
768 square 13 ms 18 ms ... 11 ms 10 ms
256X1440X256 2869 mcrs 3807 mcrs ... 2544 mcrs 2055 mcrs
256X256X1440 2929 mcrs 3950 mcrs ... 2467 mcrs 2523 mcrs
8X1440X8 5 mcrs 5 mcrs ... 10 mcrs
8X8X1440 5 mcrs 4 mcrs ... 8 mcrs

from nntrainer.

skykongkong8 avatar skykongkong8 commented on September 23, 2024 1

Status Update: 24.04.2024

  • Macro style kernel
  • Adaptive loops for macros
  • More digits per loop

Unittest output using Galaxy S23 with local commit (TBA)

Latency

mean latency with TC = 100

dim KERNEL_8x16_ACC16 KERNEL_8x16_ACC8 cblas fp32
1024 23 ms 30 ms 32 ms
768 9 ms 12.8 ms 13.6 ms
256x1440x256 2054 mcrs 2664 mcrs 2701 mcrs
256x256x1440 2359 mcrs 2965 mcrs 3104 mcrs

mse w.r.t. sgemm

dim KERNEL_8x16_ACC16 KERNEL_8x16_ACC8
1024 0.00608169 0.00226737
768 0.00310214 0.0017091
256x1440x256 0.0149112 0.00518965
256x256x1440 0.00119428 0.000306849
  • Overall, this shows 150% boost-up with f16-f32 w.r.t. cblas fp32
  • Considering enlarged vector length from f32 to f16, and partial-accumulation, result above sounds reasonable.
  • However, this code takes a little bit of accuracy loss for its cost. Should be checked once more with model output.

from nntrainer.

taos-ci avatar taos-ci commented on September 23, 2024

:octocat: cibot: Thank you for posting issue #2488. The person in charge will reply soon.

from nntrainer.

skykongkong8 avatar skykongkong8 commented on September 23, 2024

Those who want to make comments / reviews on WIP branch, please leave it here, or let me know! :)

from nntrainer.

skykongkong8 avatar skykongkong8 commented on September 23, 2024

This issue is temporally resolved, and can be discussed in other issues.

from nntrainer.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.