Comments (8)
Can you show the trace result? or you can comment the thread::yield() function, and trace again
from inferllm.
Can you show the trace result? or you can comment the thread::yield() function, and trace again
Later. I've modified the base code a lot. It is like so:
main | add_task | add_task
thread1 | matmul_int8
thread2 | ................. matmul_int8 ( this is problematic part, if it can start much earlier, the total time will be reduced a lot)
...
threadx | ..matmul_int8
from inferllm.
It seems when dispatching a task, not all threads will be scheduled right now, some threads yield out. if so, comment the thread::yield and test again, it should be faster, but I test with the same result.
from inferllm.
It seems when dispatching a task, not all threads will be scheduled right now, some threads yield out. if so, comment the thread::yield and test again, it should be faster, but I test with the same result.
It could be an advantage of x64 processors. I mean these processors may be more responsive anyway.
And yes, removing yield works. But not always. It makes processors too busy on atomic operations sometimes. Racing will slow down the whole system. I am still experimenting on how to make the loop better.
I've got a pretty good result so far. With some other new optimizations, I got 5token/s.
from inferllm.
great !!!, Looking forward to your optimizations
from inferllm.
great !!!, Looking forward to your optimizations
Check out this demo. Add ZoneScopedNS()
in the lambda function of matmul will give you the trace result.
from inferllm.
@chenqy4933 Got 4.4 tokens with the master. I guess the optimization works. Though I did not get as high as 5 token/s.
from inferllm.
@chenqy4933 Got 4.4 tokens with the master. I guess the optimization works. Though I did not get as high as 5 token/s.
you can optimize it continue, I just optimized it with the CPU level yield.
from inferllm.
Related Issues (20)
- Please support RWKV for refs and compare.
- O3 optimization are slower on SG2042 HOT 3
- 【new feature】通义千问有没有计划支持 HOT 2
- 能否改为GPU辅助计算 HOT 6
- ChatGLM2 效果异常 HOT 1
- chatglm2 GPU版本的int4、int8量化模型预测结果异常 HOT 1
- 请问是否有计划支持Whisper? HOT 1
- mac os Big Sur 11.7.4 Linking Error , Undefined symbols HOT 1
- arm 平台输出乱码 HOT 3
- chatglm3有计划支持吗? HOT 3
- I got the error on centos 7: failed to tokenize string! HOT 2
- 可以在RV64指令集的CPU上运行吗
- 请问可以在不支持V扩展的RISC-V CPU上运行吗 HOT 1
- 移植问题
- 在运行llama2-13b的时候出现以下问题
- How to build wasm file?
- 这个有windows的部署教程吗
- 在树莓派3b+上部署,出现无法打开模型的问题 HOT 4
- 有打算支持qwen吗
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from inferllm.