Coder Social home page Coder Social logo

bloom's Issues

Multiply Overflow in Fingerprint

The Issue:

bloom/bloom.go

Line 201 in 691ea61

hn = (hn * g) % m

produces the an multiply overflow.

	for i := uint64(0); i < s.k; i++ {
		hn = (hn * g) % m
		fingerprint[i] = uint64(hn % s.m)
	}

hn * g overflows in the unit tests. I am not sure if this is really an issue or if this behaviour is intended.

To reproduce:

Add a overflow check function like (as taken
from https://www.programming-idioms.org/idiom/86/check-if-integer-multiplication-will-overflow)

func multiplyWillOverflow(x, y uint64) bool {
   if x <= 1 || y <= 1 {
     return false
   }
   d := x * y
   return d/y != x
}

Add the following after
hn = (hn * g) %m:

if multiplyWillOverflow(hn, g) {
  panic("Multiply overflow occured")
}

And run go test ./..

Bug for bit sizes being a power of two (and maybe other values)

Implementing the following test wouldn't make the test passing anymore.

--- FAIL: TestBugFalsePositives (1.24s)
    bloom_test.go:330: False positive probability is too high at  20.18335038131724 % vs  0.602097140729787 %
FAIL
exit status 1
FAIL	github.com/DCSO/bloom	1.763s
func TestBugFalsePositives(t *testing.T) {
	// this capacity + p would produce a power of 2 bit size
	capacity := uint64(109397)
	p := float64(0.01)
	fillingFactor := 0.9
	N := uint64(float64(capacity) * fillingFactor)
	filter, _ := GenerateExampleFilter(capacity, p, N)
	pAcceptable := math.Pow(1-math.Exp(-float64(filter.k)*float64(N)/float64(filter.m)), float64(filter.k))
	fingerprint := make([]uint64, filter.k)
	cnt := 0.0
	matches := 0.0
	for {
		cnt++
		value := GenerateTestValue(100)
		filter.Fingerprint(value, fingerprint)
		if filter.CheckFingerprint(fingerprint) {
			matches++
		}
		if cnt > float64(capacity)*10 {
			break
		}
	}
	//this might still fail sometimes...
	//we allow for a probability that is two times higher than the normally acceptable probability
	if matches/cnt > pAcceptable*2 {
		t.Error("False positive probability is too high at ", matches/cnt*100, "% vs ", pAcceptable*100, "%")
	}
}

After analysis, it is possible there is a flaw in the fingerprint generation.
Here is my theory (not verified mathematically). Your algorithm generates fingerprints with:

	for i := uint64(0); i < s.k; i++ {
		hn = (hn * g) % m
		fingerprint[i] = uint64(hn % s.m)

However the value of m used (i.e. 0xffffffffffffffc5) is so big that what the code does is equivalent for any x = (hn * g); x < m (which is very likely as we are working with uint64) to the following

	for i := uint64(0); i < s.k; i++ {
		hn = hn * g
		fingerprint[i] = uint64(hn % s.m)

So under those conditions hn is always a multiple of h0 (fnv hash) multiplied by g power i, very likely creating lots of collisions for very specific cases, such as this one.

NB: I did not verify if other bit sizes create the same behavior

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.