Coder Social home page Coder Social logo

rust-bloom-filter's Introduction

bloomfilter

Crates.io docs.rs License: ISC

A simple but fast implementation of the Bloom filter in Rust. The Bloom filter is a a space-efficient probabilistic data structure supporting dynamic set membership queries with false positives. It was introduced by Burton H. Bloom in 1970 (Bloom, 1970) and have since been increasingly used in computing applications and bioinformatics.

Documentation

Library documentation with examples is available on docs.rs.

Usage

Add this to your Cargo.toml:

[dependencies]
bloomfilter = "1"

Here is a simple example for creating a bloom filter with a false positive rate of 0.001 and query for presence of some numbers.

use bloomfilter::Bloom;

let num_items = 100000;
let fp_rate = 0.001;

let mut bloom = Bloom::new_for_fp_rate(num_items, fp_rate);
bloom.set(&10);   // insert 10 in the bloom filter
bloom.check(&10); // return true
bloom.check(&20); // return false

License

This project is licensed under the ISC license (LICENSE or https://opensource.org/licenses/ISC).

rust-bloom-filter's People

Contributors

carllerche avatar chris-ha458 avatar cosmichorrordev avatar dependabot-preview[bot] avatar derekchiang avatar ebedthan avatar jedisct1 avatar k12ish avatar ketralnis avatar matthiasbeyer avatar spacemeowx2 avatar thijsc avatar zonyitoo avatar zslayton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rust-bloom-filter's Issues

Improve README

Hi @jedisct1,

I believe the crate will benefit from a more detailed README. I can propose a PR if you agree.

load and save

how to save the created filter to a file for reuse?
and how to download it later?

Make API more standards conform like HashSet

While evaluating different bloom/cuckoo/whatever filters I realized just how different your API is :)

Most other libs closely follow the API of std HashSet

So adding an item is achieved through insert rather than set and existence is checked through contains rather than check. There are probably more functions that could be renamed or implemented to make it more standards conform.

This is not really a bug but it makes it some much easier to use and not misuse an API if it has a familiar face. As this is a breaking change and in the end just a personal choice you might not share, I did not prepare a PR and if starting with breaking changes you might want to add other stuff as well.

Provide `bloom.is_empty()` and `bloom.fill()`

Thank you for maintaining this library!

I'm solving a puzzle with code that looks like this:

let mut vec = vec![true; 10_000];
let mut items = vec![Item::new(); 10_000];
while vec.iter().any() {
    vec = items
        .enumerate()
        .filter(|&(index, _)| vec[index])
        .map(Item::recompute)
        .collect()
}

I'd like to use a bloom filter instead of bloom, could I implement two functions that help mimic this behaviour:

fn is_empty(&self) -> bool

  • Returns true if every element in the underlying bit vector is false

fn fill(&mut self)

  • Sets every element in the underlying bit vector to true

Cleaner syntax for `compute_bitmap_size`.

It works when using this syntax let bitmap_size: usize = Bloom::<()>::compute_bitmap_size(1000, 0.1);. However the syntax took me several attempts to get right.

A more intuitive syntax could be something like this

let bitmap_size: usize = Bloom::compute_bitmap_size(1000, 0.1);

However this doesn't compile

type annotations needed

cannot infer type for type parameter `T` declared on the struct `Bloom`rustc(E0282)
bloom_experiments.rs(26, 34): cannot infer type for type parameter `T` declared on the struct `Bloom`

The version of rust-bloom-filter on crates.io does not build on stable.

Compiling bloomfilter v0.0.9
/Users/kieran/.cargo/registry/src/github.com-1ecc6299db9ec823/bloomfilter-0.0.9/src/bloomfilter/lib.rs:21:5: 21:20 error: unresolved import `std::num::Float`. There is no `Float` in `std::num`
/Users/kieran/.cargo/registry/src/github.com-1ecc6299db9ec823/bloomfilter-0.0.9/src/bloomfilter/lib.rs:21 use std::num::Float;
                                                                                                              ^~~~~~~~~~~~~~~
/Users/kieran/.cargo/registry/src/github.com-1ecc6299db9ec823/bloomfilter-0.0.9/src/bloomfilter/lib.rs:22:5: 22:22 error: unresolved import `collections::bitv`. There is no `bitv` in `collections`
/Users/kieran/.cargo/registry/src/github.com-1ecc6299db9ec823/bloomfilter-0.0.9/src/bloomfilter/lib.rs:22 use collections::bitv;
                                                                                                              ^~~~~~~~~~~~~~~~~
error: aborting due to 2 previous errors
Could not compile `bloomfilter`.

To learn more, run the command again with --verbose.

Compiling straight from this git repo works fine though.

Compilation failes

I tried to install the holochain package from cargo which seems to depend on this package.
However it failes to compile.

Compiling bloomfilter v1.0.8
error[E0432]: unresolved import `serde`
  --> /home/feldi/.cargo/registry/src/github.com-1ecc6299db9ec823/bloomfilter-1.0.8/src/lib.rs:29:13
   |
29 |     pub use serde;
   |             ^^^^^ no external crate `serde`

error: aborting due to previous error

Rust version:

$ rustc --version
rustc 1.54.0 (a178d0322 2021-07-26)
$ cargo --version
cargo 1.54.0 (5ae8d74b3 2021-06-22)

Bloom::from_existing() is slow

I was hoping to be able to cheaply clone these objects, but there was no impl Clone, so instead I tried this:

fn clone_bloom<T>(bloom: &Bloom<T>) -> Bloom<T> {
    Bloom::from_existing(&bloom.bitmap(), bloom.number_of_bits(), bloom.number_of_hash_functions(), bloom.sip_keys())
}

It turns out that bloom.bitmap() is extremely expensive โ€“ I believe because it clones itself by iterating over every individual bit. Here's a flamegraph:

Screen Shot 2020-08-14 at 9 57 15 AM

Feature request: ability to accept "raw" pre-hashed values

In trying to construct a large bloomfilter I'm finding that the actual hashing is a significant bottleneck that can't easily be parallelised when access to the bloomfilter must be serialised.

It would be nice if there were a way to hash the structure myself before passing into the bloomfilter. In that case, hashing and modifying the bloom filter could be done in different threads. Today this isn't possible because even if I did hash them myself then bloomfilter would just to it again itself.

If you'd accept a patch that implements this I may be able to have a go of it myself and send a PR

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.