hashlookup / fleur Goto Github PK
View Code? Open in Web Editor NEWFleur implements a Bloom Filter library in C that is fully compatible with DCSO's Go and python implementations.
License: BSD 3-Clause "New" or "Revised" License
Fleur implements a Bloom Filter library in C that is fully compatible with DCSO's Go and python implementations.
License: BSD 3-Clause "New" or "Revised" License
DCSO uses only uses FNV1.
void FNV1a64Update(FNV164_CTX *context, const unsigned char *input, unsigned int inputLen)
can be removed
as well as the alternate switch in fnv_64_buf
.
fleur allocates large amounts of memory when using corrupted bloomfilter files on insertion.
Cfr attached gzipped files:
hang0_insert.bin.gz
hang1_insert.bin.gz
hang2_insert.bin.gz
This can be tested using the following command:
echo bla | ./fleurcli -c insert hang0_insert.bin
The last one, hang2_insert.bin
, is fixed by #9
using good prefix, or even suffix for C library is a must, and come up with something unique (and "fleur" is kind of cool), probably good idea to prefix with it, and then it can be used without any current or future symbol clashes.
Could be faster still if it didn't calloc/free a "Fingerprint" buffer each time you add or query an entry. Since its individual BloomFilters are not thread-safe anyway, it could allocate the buffer once at BloomFilter creation time and be done with it.
The usage of
static
variables makes the BloomFilter code unnecessarily dangerous to use when multiple threads are used even if every thread just wants to allocate and use totally separate bloom filters. Actually it could even cause issues with two bloom filters created after each other since evenInitialize
just returns a pointer to a single staticBloomFilter
:o I can't come up with a reason why you'd consider making anything in therestatic
.
fleur segfauts on insertion with the following corrupted (gzipped) bloomfilter files:
crash0_insert.bin.gz
crash1_insert.bin.gz
The command used to reproduce the crashes is echo bla | ./fleurcli -c insert crash0_insert.bin
Valgind output:
[...]
==1424957== Invalid read of size 8
==1424957== at 0x403D94: Add (fleur.c:116)
==1424957== by 0x40221E: main (fleurcli.c:147)
==1424957== Address 0x233413bc23a9218 is not stack'd, malloc'd or (recently) free'd
[...]
fleur should be able to merge two bloom filter of same dimension following bloom's description:
// Join adds the items of another Bloom filter with identical dimensions to
// the receiver. That is, all elements that are described in the
// second filter will also described by the receiver, and the number of elements
// of the receiver will grow by the number of elements in the added filter.
// Note that it is implicitly assumed that both filters are disjoint! Otherwise
// the number of elements in the joined filter must _only_ be considered an
// upper bound and not an exact value!
// Joining two differently dimensioned filters may yield unexpected results and
// hence is not allowed. An error will be returned in this case, and the
// receiver will be left unaltered.
There is also strange code like this loop https://github.com/hashlookup/fleur/blob/4ee2644a850381d928a... that jumped into my eye.
Oh and btw I believe bloom_path can cause memory safety issues because it is not terminated with a null after strncopy.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.