Coder Social home page Coder Social logo

Comments (5)

lemire avatar lemire commented on June 8, 2024 1

Do you get good results with gzip?

I have no experience compressing roaring bitmaps with generic codecs... assuredly, the impact will be data specific...

Related:

Compressing JSON: gzip vs zstd
https://lemire.me/blog/2021/06/30/compressing-json-gzip-vs-zstd/

from roaringformatspec.

toien avatar toien commented on June 8, 2024

thanks for fast reply!

i write a test (java/golang), populate roaringbitmap with random uint32 and serialize it, also using gzip compress it, but it tunrns out data almost not compressed.

// populate with random data
Random r = new Random();
RoaringBitmap rbm = new RoaringBitmap();

for (int i = 0; i < size; i++) {
  long rValue = r.nextLong() & 0xffffffffL;
  int casted = (int) rValue;
  rbm.add(casted);
}

// dump to disk
ByteBuffer buffer = ByteBuffer.allocate(rbm.serializedSizeInBytes());
rbm.serialize(buffer);

Path path = Paths.get(filepath);

Files.write(path, buffer.array());

// compress and dump
Path cpath = Paths.get(compressedFilepath);

Files.write(cpath, new byte[0], StandardOpenOption.CREATE, StandardOpenOption.TRUNCATE_EXISTING);

try (GZIPOutputStream gos = new GZIPOutputStream(new FileOutputStream(cpath.toFile()))) {
  gos.write(buffer.array());
}

here is result:

> ls -alht
-rw-r--r--   1 worker  staff    19M Mar 21 15:59 random-1000w.bin.gz
-rw-r--r--   1 worker  staff    20M Mar 21 15:59 random-1000w.bin

i am trying zstd

from roaringformatspec.

toien avatar toien commented on June 8, 2024

it seems that generic compress not fit for roaringbitmap

INFO: lz4 decompressed len: 20501162, compressed len:20574769
INFO: zstd decompressed len: 20501162, compressed len:20415315

from roaringformatspec.

lemire avatar lemire commented on June 8, 2024

Interestingly, it looks like lz4 makes things worse in your test!

from roaringformatspec.

derlaft avatar derlaft commented on June 8, 2024

RoaringFormatSpec : specification of the compressed-bitmap Roaring formats

roaring bitmaps are already a type of compression. therefore the entropy of the serialized data should already be rather close to the maximum (you can make an entropy graph for example using binwalk) and compressing it once more won't yield a significant result

from roaringformatspec.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.