Comments (5)
See my comment from Jan 5, "low load fill and missing items".
The TL;DR is that this code has a fundamental design flaw in the remodel/kick routine that causes items to be lost, I proved it if you look at my comments and analysis, and data, so unless you want to write a proper remodeler and cyclic detection, this code is useless.
One way around this might be to just use more indexes, but once the remodel events happen, you will loose items with this code in it's current state. This flaw is also is the reason you can't get good load/fill levels and it crashes well below the expected 95% fill.
Also, to answer your other question, this code has a HUGH memory footprint 7x larger than necessary because it does not use pointers or static sized types (eg. uses type fingerprint []byte instead of [2]byte) so the fingerprint is 16 in memory even if it only uses 2 for a fingerprint len()=2 and cap()=16, plus every manipulation causes memory allocation and causes copies, designed this way, so there is no memory reuse so you have to wait for garbage collector, so tons of memory usage. I use activity monitor on my Mac to see how much ram is used bt the script, rough but works.
I wound up writing my own, as I said, but it is architected differently with a counter, so still cookoo hash, but counting cookoo hash, for 5MM entries hash+int64 counter consumes 80MG, not 5GB, so I can't really give you much help with this code other than you can try incorporating my cyclic node detection code, but you have to write a proper remodel loop to not loose anything during remodel and fix the hash methods:
// cyclic detects node repeats up to max iterations
func cyclic(max int) func(i int64, b []byte) bool {
type cyclic struct {
i int64
h string
}
hits := make(map[cyclic]int)
visited := func(i int64, b []byte) bool {
node := cyclic{i, string(b)}
hits[node]++
if len(hits) > max || hits[node] > 1 {
return false
}
return true
}
return visited
}
use like so:
// note: I suppose this could be a recursion, ... if you wanted to do it that way
for {
cy := cyclic(maxCyclic)
for {
... select
if !cy(i, k[:]) { break }
...swap
for {
...insert
}
}
}
from cfilter.
so unless you want to write a proper remodeler and cyclic detection, this code is useless.
I'm sorry you feel this way though I can sympathize, I wrote this when initially getting started with Go about a year ago and have not had occasion to revisit it since. The large memory footprint & "fundamental" design flaw (this isn't apparent to me as yet; not that I've had a chance to investigate) is evidence of this, yes.
@ajaybodhe: In light of this I encourage you to instead try https://github.com/seiflotfy/cuckoofilter in the interim, or perhaps @g-w-p's implementation if he/she is willing to share. Keeping this and #5 open for when I do get around to addressing them.
from cfilter.
I wound up writing my own, as I said, but it is architected differently with counters, but still cookoo hash style, so I can't really give you much help with this code other than you can try to fix it, but you have to write a proper remodel loop to not loose anything during remodel and fix the hash methods ... that was where the problem was when I was playing with this.
from cfilter.
This code had potential, but needs work. These were my analysis tools, what I used when I was working on my cookoo hash architecture, might help you if you choose to revisit. Dump the image and dump the data to catch the remodel events and you will or should find the error in the code.
/*
kind of handy and cool debug
*/
// Img creates a hash map to inspect the hashed data container randomness of distrobution
// Note: the width*depth must equal container size, it doesn't calculate this
func (c *Cache) Img(width, height int) {
img := image.NewRGBA(image.Rectangle{image.Point{0, 0}, image.Point{width, height}})
var w1, h1 int
for ib := range c.data {
for ik := range c.data[ib] {
if c.data[ib][ik] != *Empty {
img.Set(w1, h1, color.RGBA{255, 0, 0, 255})
}
w1++
if w1 > width {
w1 = 0
h1++
}
}
}
f, _ := os.Create("data/hash.png")
defer f.Close()
png.Encode(f, img)
}
/* debug purpose code */
func (c *Cache) Dump() {
for ib := range c.data {
fmt.Printf("%6d ", ib)
for ik := range c.data[ib] {
fmt.Print(hex.EncodeToString(c.data[ib][ik][:2]), " ")
}
fmt.Println()
//if ib > 24 {
// fmt.Println("...")
// break
//}
}
}
from cfilter.
No longer maintaining this.
from cfilter.
Related Issues (8)
- optimisations, memory and cpu HOT 1
- bug in bucket.go, rand bucket selection HOT 1
- low load fill and missing items HOT 7
- Italian spelling mistake HOT 1
- feature request: add a way to persist/load the filter HOT 8
- Algorithmic Improvements for Fast Concurrent Cuckoo Hashing HOT 3
- Insert fails when entering rebalance kick loop HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cfilter.