Comments (6)
@s-debadri As we discussed, please get started from here :) Thanks a lot!
from nntrainer.
Current Status : 08.04.2024
Unittest output using Galaxy S23 with #2541
GEMM dimension | fp32 | prev | 8x8 | f16-f32 8x16 | full-f16 |
---|---|---|---|---|---|
4096 square | 2087 ms | 7172 ms | ... | 1964 ms | 1452 ms |
2048 square | 260 ms | 413 ms | ... | 250 ms | 185 ms |
1024 square | 34 ms | 52 ms | ... | 30 ms | 103 ms |
768 square | 13 ms | 18 ms | ... | 11 ms | 10 ms |
256X1440X256 | 2869 mcrs | 3807 mcrs | ... | 2544 mcrs | 2055 mcrs |
256X256X1440 | 2929 mcrs | 3950 mcrs | ... | 2467 mcrs | 2523 mcrs |
8X1440X8 | 5 mcrs | 5 mcrs | ... | 10 mcrs | |
8X8X1440 | 5 mcrs | 4 mcrs | ... | 8 mcrs |
from nntrainer.
Status Update: 24.04.2024
- Macro style kernel
- Adaptive loops for macros
- More digits per loop
Unittest output using Galaxy S23 with local commit (TBA)
Latency
mean latency with TC = 100
dim | KERNEL_8x16_ACC16 | KERNEL_8x16_ACC8 | cblas fp32 |
---|---|---|---|
1024 | 23 ms | 30 ms | 32 ms |
768 | 9 ms | 12.8 ms | 13.6 ms |
256x1440x256 | 2054 mcrs | 2664 mcrs | 2701 mcrs |
256x256x1440 | 2359 mcrs | 2965 mcrs | 3104 mcrs |
mse w.r.t. sgemm
dim | KERNEL_8x16_ACC16 | KERNEL_8x16_ACC8 |
---|---|---|
1024 | 0.00608169 | 0.00226737 |
768 | 0.00310214 | 0.0017091 |
256x1440x256 | 0.0149112 | 0.00518965 |
256x256x1440 | 0.00119428 | 0.000306849 |
- Overall, this shows 150% boost-up with f16-f32 w.r.t. cblas fp32
- Considering enlarged vector length from f32 to f16, and partial-accumulation, result above sounds reasonable.
- However, this code takes a little bit of accuracy loss for its cost. Should be checked once more with model output.
from nntrainer.
cibot: Thank you for posting issue #2488. The person in charge will reply soon.
from nntrainer.
Those who want to make comments / reviews on WIP branch, please leave it here, or let me know! :)
from nntrainer.
This issue is temporally resolved, and can be discussed in other issues.
from nntrainer.
Related Issues (20)
- Channel Last Tensor save/read fails occasionally HOT 1
- Random Idea for Future Features: G-LoRA on NNTrainer. HOT 1
- Build fails with `Dplatform=android` HOT 1
- Knowledge Embedding Interface Specification for RAG in NNTrainer HOT 2
- Add Depthwise 2D Convolution Layer HOT 1
- Some confusion about random dataset HOT 8
- DRL algorithm with api HOT 8
- Add `-Denable-fp16=true` build to CI HOT 6
- Issues and Questions about Execution of LLaMA using NNTrainer HOT 13
- [ Tensor ] Accelerate fp16 matrix transpose with SIMD HOT 3
- [ HGEMM ] Half-Precision GEMM Roadmap HOT 4
- Q&A] How to solve build failure for flatbuffers's Table. HOT 2
- Bug in `max_abs()` function in FP16 Tensor HOT 3
- Issue in running the resnet18 example. HOT 4
- Running examples on PC issue HOT 3
- Support hyper parameter for activation layer HOT 1
- Support loading weights from pytorch model HOT 1
- Add the calc_derivative and unittest of quick gelu HOT 1
- Support encoder on Ubuntu & Tizen HOT 1
- Support Convolution&Batchnorm Fusing for Optimized Inference Mode HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nntrainer.