Comments (3)
I wanted to follow the original paper in the implementation of the BF math. Basically it is based on Google's Guava code. I think the result of that change is negligible. The waste is just 1 byte. In case of common use case BFs takes tens of MBs and more. It may be the case for very small BFs. Do you think it's worth to change that? What's your use case?
from bloom-filter-scala.
As I see it, the waste can be up to 63 bits (~8 byte) since UnsafeBitArray
allocatess/accesses memory Long
-wise (8 byte chunks). I think, it would be possible to replace getLong
and putLong
with getByte
and putByte
in which case, yes, the waste would be at most 7 bits. More "fine-grained" memory allocation and access than byte-wise (ideally bit-wise) is unfortunately not possible. That is why there might always be superfluous bits if numberOfBits
is not a multiple of 8.
My situation is the following: I am working with relatively small bloom filters and need to pack them (among other information) into a single UDP packet. That is why in my case it is necessary to be very carefully with space.
There is also another minor flaw regarding (de)serialization: Restoring a bloom filter should in my opinion accept arbitrary bytes fitting the length of the UnsafeByteArray
. If there are superfluous bits (or even bytes) at the end of the allocated memory, it must be guaranteed that these are always 0.
from bloom-filter-scala.
@cmarxer I've implemented byte based array of bits (instead of long based). Take a look please #21 It seems it's a bit faster.
I'm reluctant to changed the logic that calculates optimal number of bits. I want to keep it close to the original paper. I would suggest to call the BloomFilter
constructor directly instead of apply()
like this:
var nb = BloomFilter.optimalNumberOfBits(numberOfItems, falsePositiveRate)
// round nb to your requirements, eg 8 bit based
nb = ...
val nh = BloomFilter.optimalNumberOfHashes(numberOfItems, nb)
val bf = new BloomFilter[T](nb, nh)
from bloom-filter-scala.
Related Issues (20)
- got SIGSEGV using in Spark HOT 3
- Why the bloomfilter returns false? HOT 6
- Cuckoo Filter
- Scala 2.12.1 support? HOT 5
- serialization + kryo support HOT 6
- can the add() method return a boolean indicating a 'fresh' insert
- Issue when using `bloom-filter-scala` on String HOT 3
- Can't find implicit value for canGenerateHash HOT 1
- reconsider UnsafeBitArray HOT 4
- How to get the filter element size HOT 11
- NoClassDefFoundError Product when instatiating a new BloomFilter HOT 5
- approximateElementCount() is wrong after intersect or union HOT 2
- Morton Filters wanted HOT 1
- Meta-Learning Neural Bloom Filters wanted HOT 1
- Scala 2.13 in Maven Repo HOT 1
- Support JDK above 8.x HOT 7
- CanGenerateHashFromString is broken in JDK 9+ when string contains non-latin characters or +XX:-CompactStrings JVM flag is used HOT 7
- SIGSEGV Error in a Web Application HOT 1
- Is this bloom filter threadsafe? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bloom-filter-scala.