Coder Social home page Coder Social logo

Comments (8)

patrickxb avatar patrickxb commented on August 19, 2024

Do you have a test that confirms this? If so, could you add it to consistent_test.go? Thanks!

from consistent.

or-else avatar or-else commented on August 19, 2024

Here are a few colliding strings:

"abear|17", "solidiform|0"
"abear|16", "solidiform|1"
"abear|15", "solidiform|2"
"abear|14", "solidiform|3"
"abear|13", "solidiform|4"
"abear|12", "solidiform|5"
"abear|11", "solidiform|6"
"abear|10", "solidiform|7"

I won't be able to do a pull request until some time next week.

Here is a simple script to find a bunch more if you care

import binascii

sample = {}

wcounter = 0
counter = 0
words = open('/usr/share/dict/words')
for w in words:
  wcounter += 1

  for i in range(20):
    counter += 1
    s = w.rstrip() + "|" + str(i)
    crc = binascii.crc32(s)
    if crc in sample:
      print "FOUND", sample[crc], s
    elif counter < 10000:
      sample[crc] = s

print "done checking:", wcounter, counter

from consistent.

edsrzf avatar edsrzf commented on August 19, 2024

Below is a Go test that currently fails.

@or-else, are you working on a pull request or would it be helpful if I made one?

func TestAddCollision(t *testing.T) {
    // These two strings produce several crc32 collisions after "|i" is
    // appended added by Consistent.eltKey.
    const s1 = "abear"
    const s2 = "solidiform"
    x := New()
    x.Add(s1)
    x.Add(s2)
    elt1, err := x.Get("abear")
    if err != nil {
        t.Fatal("unexpected error:", err)
    }

    y := New()
    // add elements in opposite order
    y.Add(s2)
    y.Add(s1)
    elt2, err := y.Get(s1)
    if err != nil {
        t.Fatal("unexpected error:", err)
    }

    if elt1 != elt2 {
        t.Error(elt1, "and", elt2, "should be equal")
    }
}

from consistent.

or-else avatar or-else commented on August 19, 2024

Hi @edsrzf

I thought about adding a test along the lines of what you suggested but then decided against it. It would not make the code more reliable because the test would be too brittle. Change the "|" to ":" or 0..20 to A..T and it would stop failing. A more reliable test is to have the hash function substituted with a dummy one which returns just 4 or 8 different values, like

func dummyCrc2(input []byte) int {
   return int(input[0] & 0x3)
}

But then what's the point of testing for it when it's easier to get rid of the problem all together either by getting rid of map[] or by using md5 or sha1 as a hash function. Collisions would still be possible in the latter case but extremely unlikely.

Original Tom White's implementation also had this problem then everybody copied it.
https://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html

from consistent.

edsrzf avatar edsrzf commented on August 19, 2024

That's a fair criticism, but I still think it's worth having a test, even if my test above is too brittle.

from consistent.

patrickxb avatar patrickxb commented on August 19, 2024

@or-else it sounds like you are proposing a substantial rewrite of this package. Since in its current state, it has been working fine for us for a long time, we're not going to spend any time rewriting it. If you'd like to submit a pull request with changes and show that the performance and memory benchmarks are equal or better, I'd be happy to merge it in.

I chose crc32 because it is fast. I just looked quickly at Brad Fitzpatrick's consistenthash implementation in github.com/golang/groupcache. He uses crc32 and a map. It looks quite similar.

While collisions are possible, based on our qualitative observations, the distribution is consistent enough for our needs.

And I agree with @edsrzf that a new test should be included with your changes to show that you fixed an issue.

Thanks.

from consistent.

or-else avatar or-else commented on August 19, 2024

@patrickxb, the package is just 250 lines. It's not a big deal.

Regarding groupcache and bradfitz, no one is immune from making mistakes. Tom White introduced the original bug then everybody copied it.

Regarding your observations, the probability of a collision in crc32 is on the scale of 1e-7 for 10 nodes. The fact that you have not hit the bug is not surprising. Nonetheless it's there.

from consistent.

imkira avatar imkira commented on August 19, 2024

I agree with @or-else on this one.
I mean, there will be obviously possibility of collisions (no matter how remote they are) but I believe that no matter the order in which you call Add, the resulting circle should be the same for the sake of the rule of least surprise. As proposed by @or-else , and as implemented in other consistent hashing libraries (eg., libketama) a sorted array of points viewed as a circle is widely adopted. I believe having an array that performs O(log n) for searches that has "consistent" (pun intended) behaviour far outperforms the benefits of O(1) of map.
Either that or people will have to be aware of this limitation and rather than dynamically adding a node to the hash ring any time they want, they will have to rebuild it with the sorted list of all nodes.

from consistent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.