Coder Social home page Coder Social logo

Comments (14)

ben-manes avatar ben-manes commented on June 18, 2024

Thanks. I won't be able to take a look at this until tonight.

Most of the unit tests use a Ticker to manipulate time. Can you verify that it fails with that, too? If not then it would indicate an unexpected visibility problem.

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

Oh, this is the async cache. Sorry, I need to play with the code to get a sense of what is going on. I just reproduced it failing and see it runs if I use a direct executor but fails on a single-thread executor. So definitely some race, but not sure what yet.

from caffeine.

ybayk avatar ybayk commented on June 18, 2024

Yes, that's async cache. Not sure how it behaves on a blocking cache as we do not use it.
Also whether the future is immediate or real one does not make any difference as we have the same issue with true futures.
So as far as I understood you do not need ticker based test anymore?

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

Nope, that was my misreading. Since it runs fine with a direct executor and fails with a single-threaded, its a race. I have to verify your code is valid because sleeping and waking are not exact time durations. The big question is whether the race is valid or not.

So my first step will be to strengthen the visibility to see if it passes. Right now reads and writes to the timestamp piggy back on other volatiles generating our memory barriers to avoid excessive cpu stalls.

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

I took a look and here's my current guess. I haven't been able to validate it yet, so hopefully I'm not too far off base and will have a patch released by the end of the weekend.

When a write to key1 occurs, it triggers a cache maintenance cycle (on the executor) to evict/expire/reorder entries in the policy's linked lists. The expiration check is racy, as key2 could have expired but then be resurrected by a concurrent update. The atomic removal (via computeIfPresent) doesn't validate that the entry is still expired, though it probably should. This race was assumed to be rare, so one or two extra evictions wouldn't be noticeable. That may have been a bad assumption if this happened often enough for you to investigate and write a test that triggered the race reliably.

I verified that the tests pass if the maintenance cycle is disabled. I haven't been able to get a hack that resolves the race to work reliably yet, which would let me then fix it properly. So I might either be tired or there is a different issue at play. Regardless I hope to figure it out, fix it, and have a release by the end of the weekend.

from caffeine.

ybayk avatar ybayk commented on June 18, 2024

Thank you for looking at it promptly. Yes, the issue seems to be common at least with our use case, where we deal with a few keys used side by side within a short period of time. We noticed it by observing our Cassandra queries being way too more frequent than expected, which immediately raised a question about cache layer not working properly.

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

I was exhausted last night and flipped a gt / lt sign which is why my quick hack of a fix didn't work. It was Yom Kippur and I was out late when we broke the fast.

So the above rational is correct. If I hack BoundedLocalCache#evictEntry to use the following your tests pass. I should be able to clean this up, wrap your code into an integration test, and have a release out in the timeframe I promised.

Thanks for doing the detective work to isolate the bug and report it.

data.computeIfPresent(node.getKeyReference(), (k, n) -> {
  if (actualCause == RemovalCause.EXPIRED) {
    long now = expirationTicker().read();
    if (expiresAfterWrite()) {
      long expirationTime = now - expiresAfterWriteNanos();
      if (n.getWriteTime() >= expirationTime) {
        resurrect[0] = true;
        return n;
      }
    }
  }
  if (n == node) {
    writer.delete(key, value, actualCause);
    removed[0] = true;
    return null;
  }
  return n;
});
if (resurrect[0]) {
  return;
}

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

I have the fix ready and refactored your tests a little. Because this relies on exact timings it can't run reliably as part of the build / CI yet. There is enough other work (concurrent tests, shared CI) that the timing thresholds can be exceeded. I'll fix this later by isolating the test in its own build step, though that will probably only run locally because the CI is too heavily loaded.

I'll get the next release put together tonight.

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

v1.3.3 released

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

Oh, also note that the git history looks a little weird since I had to rebase master on top. That was to isolate it as a direct patch to the previous release. The current work has been on v2 which will introduce an new eviction policy. I didn't want that to go out prematurely

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

Maven Central sync is broken MVNCENTRAL-840...

from caffeine.

ybayk avatar ybayk commented on June 18, 2024

all good. cache is now behaving ok with v1.3.3.

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

great! Thanks for the update

from caffeine.

ben-manes avatar ben-manes commented on June 18, 2024

@yurgis2
If you're using size based eviction, too, it would be interesting to hear your feedback when upgrading to v2's W-TinyLfu policy.

from caffeine.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.