Comments (11)
openblas considered the cache control&reuse,share job,multithreading , that three factors speed up the gemm
from paddle-lite.
Is the 'loss' you mentioned the accurancy loss cased by quantification?
from paddle-lite.
I mean if the input matrix need transpose ,I use this function cause a wrong result ,may cause the loss is bigger. do you knonw the function on "caffe caffe_cpu_gemm"
void caffe_cpu_gemm(const CBLAS_TRANSPOSE TransA,
const CBLAS_TRANSPOSE TransB, const int M, const int N, const int K,
const float alpha, const float* A, const float* B, const float beta,
float* C) {
LOG(INFO) << "Running for caffe_cpu_gemm trans start" ;
int lda = (TransA == CblasNoTrans) ? K : M;
int ldb = (TransB == CblasNoTrans) ? N : K;
cblas_sgemm(CblasRowMajor, TransA, TransB, M, N, K, alpha, A, lda, B,
ldb, beta, C, N);
}
from paddle-lite.
Our Gemmer is not support transposing matrix indeed, we put transpose procedure in the conversion of model, in order to accelerate matrix manipulation. @victorygogogo
from paddle-lite.
`/**
-
transpose matrix in advance
-
@param data
-
@param shape
-
@return
*/
float *trans_matrix(const float *data, vector shape) {int m = shape[0];
int n = shape[1];float *trans = new float[m * n];
for (int i = 0; i < n; ++i) {
for (int j = 0; j < m; ++j) {trans[i * m + j] = data[j * n + i]; }
}
return trans;
}` in caffe2mdl.cpp @victorygogogo
from paddle-lite.
OK ,thank you
from paddle-lite.
You're welcome.
from paddle-lite.
I found that ,I use this function ,it is slower than the openblas lib of gemm.by the way ,I use neon on a phone.
from paddle-lite.
You found our gemm is slower than openblas?
from paddle-lite.
@cocodark yes , I use neon ,but some matrix needs transpose,so I did transpose in this function to match the "cblas_sgemm" function. even if not use transpose ,I test 30 times cost time is about6.7s, if use the cblas_sgemm,it costs about 2.7s.
from paddle-lite.
@victorygogogo ,excellent research work, currently our gemm is accelerated by neon, we'll try the tricks mentioned by @wangshankun ,such as cache control&reuse 、 multithreading to make it faster.If you are interested in gemm optimization work, code contributions will be appreciated.
from paddle-lite.
Related Issues (20)
- Mac Catalyst x86 HOT 1
- 紫光展锐 7870 android_nnapi int8模型转换后推理coredump HOT 6
- 使用opt工具无法生成fp16的模型
- PaddleLite ascend310推理一直输出Failed to run the execution(2) HOT 3
- RK3568 RK3588在android11下进行paddleocr推理应用,paddlelite编译找不到合适tim-vx sdk版本和驱动 HOT 8
- paddleocr官方训练模型转为inference模型成功,然后转nb模型报错 HOT 1
- Raspberry Pi 5 (and 4) build_linux.sh building optimize tool fails HOT 1
- paddleLite在arm上推理与onnxruntime推理耗时差异过大 HOT 4
- INT8 量化模型在 ARM Android上的性能严重缩水,无法复现在benchmark.md中的加速比 HOT 2
- Lite/lite/operators/fill_constant_op.cc:44 InferShapeImpl] no valid out_shape. Must set one of shape_tensor, or shape_tensor_list, or shape. HOT 1
- Check failed: kernels.size() > 0 (0 vs. 0) HOT 1
- nb模型处理异常(iOS)引发的崩溃 HOT 4
- paddle ocr rec v3 模型推理崩溃(iOS) HOT 5
- 编译鸿蒙版本时,用鸿蒙提供的ohos工具编译Paddle Lite失败 HOT 10
- Facing crash with some devices only HOT 6
- 是否可以提供iOS metal相关的预编译库 HOT 8
- 关于arm下的paddle lite安装包文通 HOT 7
- loongarch64编译支持 HOT 2
- 线上发现一些Arm机型在获取CPU信息时越界。 HOT 1
- padddedetection 模型转换失败 HOT 13
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from paddle-lite.