Coder Social home page Coder Social logo

Comments (9)

anton-povarov avatar anton-povarov commented on September 22, 2024

tested with https://github.com/spaolacci/murmur3

result

entries: 12417, evacuated: 0, overwritten: 0
avg_value_len: 43.480773
bytes per item: 84.44680679713296 - 43.480773 - 16 = 24.966033797132965

and is about (non-scientifically) 20% faster than md5

from freecache.

anton-povarov avatar anton-povarov commented on September 22, 2024

Hm... actually tested it some more. And while a better hash function does solve the issue in single-threaded case, it's still very bad in multithreaded case.
With 10 goroutines, all inserting same keys, but slightly different values (so a zillion of evacuates happens) - overhead per item still grows to ridiculous values.

for example:
same code as above gist, but changed the following params

const cache_size = 10 * 1024 * 1024 // was 1 megabyte
n_keys := 10 * 1000 * 1000 // was 1 million

here's the result with: const n_routines = 10

entries: 42078, evacuated: 120957842, overwritten: 77988935
avg_value_len: 43.49973858
bytes per item: 249.1981558058843 - 43.49973858 - 16 = 189.69841722588433

vs const n_routines = 1

entries: 125323, evacuated: 0, overwritten: 0
avg_value_len: 43.4929523
bytes per item: 83.6698770377345 - 43.4929523 - 16 = 24.176924737734495

from freecache.

coocood avatar coocood commented on September 22, 2024

Thank you for your intensive testing on it.
So it seems like the memory utilization issue is not caused by hash function.
The difference of entry overhead seems too big for the difference of hash function distribution quality.

from freecache.

coocood avatar coocood commented on September 22, 2024

I think this is normal.

If you set n_routines to one, then every key is new, there is no overwrites, the entry length is exactly (24+keyLen+valueLen).

If you set n_routines to 10, the entry can be overwritten.

If the new value is larger than the old entry's capacity, it won't be updated in place, then the old entry's space is not utilized.

If the new value is smaller than the old entry's capacity, then there would be an in-place update, (capacity - valueLen) of space is not utilized.

Whenever the capacity is smaller than the new value, the capacity is doubled, so when you keep increasing the size of the new value, it won't keep writing new entries, most of the update would be in place update.

from freecache.

anton-povarov avatar anton-povarov commented on September 22, 2024

Even in single threaded case with fnvhash - utilization is poor. With zero overwrites/evictions. might be wrong stats i guess ? since with avg value of 43 bytes and key 16 bytes, we should be able to fit all the values into 1 meg.

from freecache.

coocood avatar coocood commented on September 22, 2024

I will try to find the answer tomorrow, you can print out every segment's entry count to see how it is distributed .

from freecache.

anton-povarov avatar anton-povarov commented on September 22, 2024

hacked it together for 1 mil items, 1 goroutine, 1 meg cache
data gathered with

        n := atomic.LoadInt64(&cache.segments[i].entryCount)
        fmt.Printf("%d %d\n", i, n)

fnvhash

antoxa@antoxa-suse:~/_Dev/go/src/antoxa> cat 1.txt | awk '{ print $2 }' | sort -n | uniq -c
    384 0
      2 470
      4 479
      6 480
      4 481
      2 482
     10 483
      2 484
      6 485
      4 486
      6 487
     14 488
     12 489
      6 490
     12 491
      2 492
      8 494
      4 495
      4 496
      6 497
      4 499
      4 501
      4 503
      2 508

murmur3

antoxa@antoxa-suse:~/_Dev/go/src/antoxa> cat 1.txt | awk '{ print $2 }' | sort -n | uniq -c
      4 471
      2 473
      2 475
      6 477
      8 478
     12 479
     12 480
     20 481
      6 482
     26 483
     14 484
     20 485
     24 486
     24 487
     18 488
     40 489
     30 490
     24 491
     18 492
     44 493
     44 494
     18 495
     18 496
     14 497
     20 498
     10 499
     12 500
      2 501
     10 502
      4 503
      2 504
      2 505
      2 508

from freecache.

coocood avatar coocood commented on September 22, 2024

Thank you for the great work to address this issue. The hash function definitely need to change.

from freecache.

coocood avatar coocood commented on September 22, 2024

Changed to MD5 in the latest commit 28070c0

from freecache.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.