Result: boost from 3.6~8token/s to <code class="notra

great ！！！， Looking forward to your optimizations <p dir

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Thread wakening may be bottom neck for large core systems about inferllm HOT 8 OPEN

megengine commented on June 24, 2024

Thread wakening may be bottom neck for large core systems

from inferllm.

Comments (8)

chenqy4933 commented on June 24, 2024

Can you show the trace result? or you can comment the thread::yield() function, and trace again

from inferllm.

xhebox commented on June 24, 2024

Can you show the trace result? or you can comment the thread::yield() function, and trace again

Later. I've modified the base code a lot. It is like so:

main      | add_task                                         | add_task
thread1 | matmul_int8 
thread2 | ................. matmul_int8 ( this is problematic part, if it can start much earlier, the total time will be reduced a lot)
...
threadx | ..matmul_int8

from inferllm.

chenqy4933 commented on June 24, 2024

It seems when dispatching a task， not all threads will be scheduled right now， some threads yield out. if so, comment the thread::yield and test again, it should be faster, but I test with the same result.

from inferllm.

xhebox commented on June 24, 2024

It seems when dispatching a task， not all threads will be scheduled right now， some threads yield out. if so, comment the thread::yield and test again, it should be faster, but I test with the same result.

It could be an advantage of x64 processors. I mean these processors may be more responsive anyway.

And yes, removing yield works. But not always. It makes processors too busy on atomic operations sometimes. Racing will slow down the whole system. I am still experimenting on how to make the loop better.

I've got a pretty good result so far. With some other new optimizations, I got 5token/s.

from inferllm.

chenqy4933 commented on June 24, 2024

great ！！！， Looking forward to your optimizations

from inferllm.

xhebox commented on June 24, 2024

great ！！！， Looking forward to your optimizations

#62

Check out this demo. Add ZoneScopedNS() in the lambda function of matmul will give you the trace result.

from inferllm.

xhebox commented on June 24, 2024

@chenqy4933 Got 4.4 tokens with the master. I guess the optimization works. Though I did not get as high as 5 token/s.

from inferllm.

chenqy4933 commented on June 24, 2024

@chenqy4933 Got 4.4 tokens with the master. I guess the optimization works. Though I did not get as high as 5 token/s.

you can optimize it continue， I just optimized it with the CPU level yield.

from inferllm.

Recommend Projects

Thread wakening may be bottom neck for large core systems about inferllm HOT 8 OPEN

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent