Ah, I just remembered that I fixed this in <a class="commit-link" data-hovercard-type=

Spinlock exponential backoff has no cap — can hang all threads that touch the lock about flume HOT 5 CLOSED

zesterer commented on August 15, 2024

Spinlock exponential backoff has no cap — can hang all threads that touch the lock

from flume.

Comments (5)

zesterer commented on August 15, 2024

As the now-maintainer of the spin crate and (surprisingly enough) a frequent advocate for not using it whenever possible, spinlocks are something I'm generally uncomfortable about using, but the benchmarks speak for themselves: mutexes perform worse on every benchmark I've been able to come up with on *nix systems.

I am aware that this is something that is likely to resolve itself in time, but as things stand today, Flume experiences something like a 30% slowdown when using a mutex in these scenarios during heavy load. You might say that the benchmarks are unrepresentative, but the benchmarks that we and crossbeam-channel both use are rather extensive and cover a lot of cases that look quite similar to code I've previously written using MPMCs. For all my distrust of spinlocks in the wild, this is a sufficient enough body of evidence that they are effective in this specific case.

I will cap the exponential backoff to some reasonable number, however.

from flume.

zesterer commented on August 15, 2024

Ah, I just remembered that I fixed this in 3f8a925.

from flume.

thomcc commented on August 15, 2024

Someone reminded me of this issue, and I had missed your comment on benchmarks justifying this.

mutexes perform worse on every benchmark I've been able to come up with on *nix systems.

Relying on benchmark numbers to justify use of spinlocks is a common trap for benchmarking concurrency. Spinlocks (especially ones like you're using here) are really great in benchmarking-style scenarios, but much worse behavior in real-world scenarios.

The reason for this is because in a spinlock like the one you have, where you handle contention by yielding to another thread, in a benchmark case, you have all of your doing the same thing (after all, this is the thing you're benchmarking), and so you're very likely to switch to a thread which is either:

Already holds the lock, in which case it's going to make progress and release the lock. This is unlikely to happen on the first try if you have several threads.
Waiting for the in a similar spin loop (probably even for the same lock), in which case it's going to yield almost immediately. (Giving the kernel another shot at guessing which thread is the one with the lock, and reduces the cost of it picking wrong substantially).

Unfortunately, this style of threading behavior is not typically true for real applications, while it's true of virtually all benchmarks (as having threads which are performing tasks unrelated to what you're benchmarking is virtually guaranteed to make your benchmarks... noisy, to say the least).

IME in real applications, you usually have a lot of threads doing different things, often with various priorities, and so it's unlikely in real-world code that switching to a random thread will happen to pick one which will lead to progress being made on the lock.

This means the cost of the scheduler "guessing wrong" about which thread to switch to in order to make progress on the lock is far higher, and you spend far more time waiting for a lock than you need to.

from flume.

awused commented on August 15, 2024

It should be possible to design a benchmark to at least partially expose the problem. I'd imagine saturating the CPU with unrelated CPU bound threads, potentially multiple per core, would start to show spinlocks in a worse light, especially if priorities are mixed. You may not get useful average numbers, they'll be too noisy, but the tail latencies should increase more with spinlocks compared to regular mutexes. Spinlocks will also probably vary a lot more based on the scheduler used.

I ended up here because of the recent change from spinning_top to spin prompted me to take another look since I was surprised anything was pulling in spinlocks at all.

from flume.

zesterer commented on August 15, 2024

Just to make clear: I'm not ignoring this thread. I just need to find the time to address it properly.

I've done some testing locally and the performance regression of using Mutex in benchmarks is not quite as severe as previously thought.

I'd imagine saturating the CPU with unrelated CPU bound threads, potentially multiple per core, would start to show spinlocks in a worse light

I've been testing this (saturating the machine with many other tasks) and I've yet to find significant evidence for this, which strikes me as quite strange since, intuitively, intelligent scheduling should do better than what effectively amounts to round robin or worse with spinning or yielding. I'm wondering whether Linux is being a bit too clever and has somehow worked out that my fake tasks are doing little more than spinning.

from flume.

Spinlock exponential backoff has no cap — can hang all threads that touch the lock about flume HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent