Coder Social home page Coder Social logo

Comments (5)

zesterer avatar zesterer commented on August 15, 2024

As the now-maintainer of the spin crate and (surprisingly enough) a frequent advocate for not using it whenever possible, spinlocks are something I'm generally uncomfortable about using, but the benchmarks speak for themselves: mutexes perform worse on every benchmark I've been able to come up with on *nix systems.

I am aware that this is something that is likely to resolve itself in time, but as things stand today, Flume experiences something like a 30% slowdown when using a mutex in these scenarios during heavy load. You might say that the benchmarks are unrepresentative, but the benchmarks that we and crossbeam-channel both use are rather extensive and cover a lot of cases that look quite similar to code I've previously written using MPMCs. For all my distrust of spinlocks in the wild, this is a sufficient enough body of evidence that they are effective in this specific case.

I will cap the exponential backoff to some reasonable number, however.

from flume.

zesterer avatar zesterer commented on August 15, 2024

Ah, I just remembered that I fixed this in 3f8a925.

from flume.

thomcc avatar thomcc commented on August 15, 2024

Someone reminded me of this issue, and I had missed your comment on benchmarks justifying this.

mutexes perform worse on every benchmark I've been able to come up with on *nix systems.

Relying on benchmark numbers to justify use of spinlocks is a common trap for benchmarking concurrency. Spinlocks (especially ones like you're using here) are really great in benchmarking-style scenarios, but much worse behavior in real-world scenarios.

The reason for this is because in a spinlock like the one you have, where you handle contention by yielding to another thread, in a benchmark case, you have all of your doing the same thing (after all, this is the thing you're benchmarking), and so you're very likely to switch to a thread which is either:

  1. Already holds the lock, in which case it's going to make progress and release the lock. This is unlikely to happen on the first try if you have several threads.
  2. Waiting for the in a similar spin loop (probably even for the same lock), in which case it's going to yield almost immediately. (Giving the kernel another shot at guessing which thread is the one with the lock, and reduces the cost of it picking wrong substantially).

Unfortunately, this style of threading behavior is not typically true for real applications, while it's true of virtually all benchmarks (as having threads which are performing tasks unrelated to what you're benchmarking is virtually guaranteed to make your benchmarks... noisy, to say the least).

IME in real applications, you usually have a lot of threads doing different things, often with various priorities, and so it's unlikely in real-world code that switching to a random thread will happen to pick one which will lead to progress being made on the lock.

This means the cost of the scheduler "guessing wrong" about which thread to switch to in order to make progress on the lock is far higher, and you spend far more time waiting for a lock than you need to.

from flume.

awused avatar awused commented on August 15, 2024

It should be possible to design a benchmark to at least partially expose the problem. I'd imagine saturating the CPU with unrelated CPU bound threads, potentially multiple per core, would start to show spinlocks in a worse light, especially if priorities are mixed. You may not get useful average numbers, they'll be too noisy, but the tail latencies should increase more with spinlocks compared to regular mutexes. Spinlocks will also probably vary a lot more based on the scheduler used.

I ended up here because of the recent change from spinning_top to spin prompted me to take another look since I was surprised anything was pulling in spinlocks at all.

from flume.

zesterer avatar zesterer commented on August 15, 2024

Just to make clear: I'm not ignoring this thread. I just need to find the time to address it properly.

I've done some testing locally and the performance regression of using Mutex in benchmarks is not quite as severe as previously thought.

I'd imagine saturating the CPU with unrelated CPU bound threads, potentially multiple per core, would start to show spinlocks in a worse light

I've been testing this (saturating the machine with many other tasks) and I've yet to find significant evidence for this, which strikes me as quite strange since, intuitively, intelligent scheduling should do better than what effectively amounts to round robin or worse with spinning or yielding. I'm wondering whether Linux is being a bit too clever and has somehow worked out that my fake tasks are doing little more than spinning.

from flume.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.