Coder Social home page Coder Social logo

Comments (10)

dnbaker avatar dnbaker commented on August 28, 2024 1

Here's the code I had. It's only valid for POD (std::is_trivial and std::is_standard_layout). Sorry I took a while to respond - I had to go find and extract it.

First, there's a couple of dummy structs which are defined, and namespacing:

namespace {
typedef struct {
    size_t index;
    uint32_t tag;
    bool used;
} SerialVictim;
struct dummytable {
    void *p;
    size_t n;
};
struct dummyfilter {
    void *p;
    size_t n;
    SerialVictim vc;
};
}

using namespace cuckoofilter;

Writing to disk:

size_t write_cuckoo_filter(const CuckooFilter<uint64_t, bits_per_item, SingleTable> *filter,
                           const char *path) {
    dummyfilter dfilter;
    dummytable tbl;
    static const size_t tags_per_bucket(4);
    const size_t bktsz((bits_per_item * tags_per_bucket + 7) >> 3); // I haz a bkt.
    static_assert(sizeof(dfilter) == sizeof(*filter), "Dummy table is the wrong size!\n");
    static_assert(sizeof(tbl) == sizeof(SingleTable<bits_per_item>), "Dummy table is the wrong size!\n");
    // Copy info to dummy structs.
    memcpy(&dfilter, filter, sizeof(*filter));
    memcpy(&tbl, dfilter.p, sizeof(tbl));

    FILE *fp(fopen(path, "wb"));
    const int used(dfilter.vc.used);
    size_t ret(fwrite(&dfilter.n, 1, sizeof(dfilter.n), fp));
    ret += fwrite(&dfilter.vc.index, 1, sizeof(dfilter.vc.index), fp);
    ret += fwrite(&dfilter.vc.tag, 1, sizeof(dfilter.vc.tag), fp);
    ret += fwrite(&used, 1, sizeof(used), fp);
    ret += fwrite(&tbl.n, 1, sizeof(tbl.n), fp);
    ret += fwrite(tbl.p, 1, bktsz * tbl.n, fp);
    fclose(fp);
    return ret;
}

Reading:

template <size_t bits_per_item>
CuckooFilter<uint64_t, bits_per_item, SingleTable> *
load_cuckoo_filter(const char *path) {
    FILE *fp(fopen(path, "rb"));
    const size_t tags_per_bucket(4uL);
    const size_t bktsz((bits_per_item * tags_per_bucket + 7) >> 3); // I haz a bkt.
    fseek(fp, 0, SEEK_END);
    const size_t fsize = ftell(fp);
    fseek(fp, 0, SEEK_SET);
    size_t num_bktz, num_items, index;
    uint32_t tag;
    int used;
    fscanf(fp, "%llu", &num_items);
    fscanf(fp, "%llu", &index);
    fscanf(fp, "%u", &tag);
    fscanf(fp, "%i", &used);
    fscanf(fp, "%llu", &num_bktz);
    size_t num_bktz_found((fsize - (sizeof(size_t) * 3) - sizeof(uint32_t) - sizeof(int)) / bktsz);
    if((fsize - (sizeof(size_t) * 3) - sizeof(uint32_t) - sizeof(int)) % bktsz) {
       fprintf(stderr, "Found %" PRIu64 " buckets. Expected %" PRIu64 ". fsize: %" PRIu64 ". bktsz: %" PRIu64 ". They be stealin' my bkt :'(....\n",
               num_bktz_found, num_bktz, fsize, bktsz);
       exit(EXIT_FAILURE);
    }
    void *buf = (uint8_t *)malloc(fsize - 1);
    size_t nread;
    if((nread = fread(buf, 1, num_bktz * bktsz, fp)) != num_bktz * bktsz) {
       fprintf(stderr, "Couldn't read buffer from file '%s' Bytes read: %" PRIu64 ". Expected: %" PRIu64 ".\n", path, nread, num_bktz * bktsz);
       exit(EXIT_FAILURE);
    }
    dummytable *table((dummytable *)malloc(sizeof(SingleTable<bits_per_item>)));
    *table = {buf, num_bktz};
    static_assert(sizeof(*table) == sizeof(uint64_t[2]), "Wrong size!\n");

    dummyfilter *ret((dummyfilter *)malloc(sizeof(dummyfilter)));
    *ret = {table, num_items, {index, tag, !!used}};
    fclose(fp);
    return (CuckooFilter<uint64_t, bits_per_item, SingleTable> *)(ret);
}

At destruction, you'll need to manually call your destructor (since it's a pointer), and std::free(ret->table).
It's pretty clunky, but it at least worked for my purposes. Let me know if you have any problems or need some help!

from cuckoofilter.

nextgens avatar nextgens commented on August 28, 2024

I guess it's a feature request more than anything else... but I concur; it'd be super-convenient if it was part of the API

from cuckoofilter.

dnbaker avatar dnbaker commented on August 28, 2024

I used some dummy structs and manual memcpy to do so in a project a while back. It's gross and bad form but works. Just make sure the layout's the same. (Until it's actually supported, of course.)

from cuckoofilter.

nextgens avatar nextgens commented on August 28, 2024

Would you mind sharing the code ?

from cuckoofilter.

dnbaker avatar dnbaker commented on August 28, 2024

I can pull that out of a private repo. I’ll look it over tomorrow and see if a pull request is easier.

If you’re doing sequence analysis, which I’m guessing based on your name, I could release the kmer database code around it if it meets your needs.

from cuckoofilter.

nextgens avatar nextgens commented on August 28, 2024

Thanks!

No, this time it's not for sequence analysis :)

from cuckoofilter.

roniemartinez avatar roniemartinez commented on August 28, 2024

👍 to this! Serializing/deserializing seems more efficient than rebuilding the filter. @dnbaker hope it is possible to share the code.

from cuckoofilter.

d4tocchini avatar d4tocchini commented on August 28, 2024

@dnbaker, @efficient

this is viable, yes? if so, something to point me to?

from cuckoofilter.

d4tocchini avatar d4tocchini commented on August 28, 2024

assuming a surgically placed memcpy, but not feeling confident whereabout

from cuckoofilter.

amiller-gh avatar amiller-gh commented on August 28, 2024

@dnbaker resurrecting an old thread :)

I'm also interested in serialization. Any chance you can dig up that old code, or do you have recommendations for another cuckoo filter library?

from cuckoofilter.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.