quickwit-oss / bitpacking Goto Github PK

SIMD algorithms for integer compression via bitpacking. This crate is a port of a C library called simdcomp.

License: MIT License

Rust 97.76% PowerShell 0.43% Shell 1.81%

compression rust simd

bitpacking's Issues

BitPacker4x::num_bits_sorted SIGSEGV on empty arrays

When called with an empty array the num_bits_sorted triggers a segfault, this is not specified anywhere that the array must not be empty (or do I missed it?).

fn num_bits_sorted(&self, initial: u32, decompressed: &[u32]) -> u8 {
     unsafe { scalar::UnsafeBitPackerImpl::num_bits_sorted(initial, decompressed) }
}

If you want to easily reproduce the bug just create an empty vec![] and give it to BitPacker4x::num_bits_sorted.

Have a light scalar implementation

The current implementations generate large binaries because they have one specialized implementation for each bitwidth, and do loop unrolling.

Add a flag-enabled implementation that uses a more compact scalar implementation. This would be useful for web assembly for instance.

please use runtime dispatch via target_feature

It would be great if this crate didn't require using any compilation flags at all, and instead auto-configured itself automatically at runtime based on what the running CPU supports. This process is documented in the std::arch module docs.

Some other nits:

The README has no link to docs.
The README does not make it clear that using this crate requires Rust nightly.
It would be great if the API of this crate we much simpler. I don't think a trait is needed at all. You could just provide the current trait's methods as either top-level functions or methods of a single type. With runtime dispatch, you could fallback to a scalar implementation, which would let you support Rust stable pretty easily I think.

Remove simd bitpacking implementations for high bit width

For sorted stuff, having a super fast implementation is not necessary above bitwidth = 8. We can fallback to the scalar implementation.
For unsorted stuff, I think we can also restrict ourselves to bitwidth <= 8, although this is a trade-off.

This crate vs tantivy-bitpacker

I noticed that there is a separate crate under the Tantivy project that also does bitpacking. So far as I can tell, the other crate does 64 bit compression and does not use SIMD, and the Tantivy project uses both crates. Is this correct?

Can I ask about the future of this crate? Will you be consolidating the two? What's the plan?

Also, it appears that the master branch is at version 0.8.3, but crates.io has 0.8.4. Is there some other repo somewhere I've missed?

64 bit version?

I'm using BitPacker4x, and it appears to compress only 32 bit integers. Is there a way to compress an array of 64 bit integers?

Example for compressing/decompressing an arbitrary number of u32 int values

I have been able to compress and decompress using BitPacker4x, and the compression looks quite good on sorted u32 values. I have a large number of u32 values to compress (up to a few 1000), and to compress I looped through chunks equal to 128 u32 values in size. I need to store the final compressed result, and I did this by appending each to a result Vec<u8> which I can then store. This works fine.

Now I need to take the result Vec<u8> and decompress each chunk. At this point I no longer have the compressed_len for each chunk of bytes. Is there a way to do this so that I can decompress each compressed chunk and rebuild the original Vec<u32>? In the original library from Daniel Lemere in C++ he does support any arbitrary number of int values, that would be really nice here too. But if you can guide me on how to use it properly that will be very helpful.

Compilation require heavy amount of memory

Building bitpacking has a peek memory usage of 1.9Gio on my system, and caused issue in CI and for some users building on lower-end hardware (see Plume-org/Plume#632).
This seems to be directly related to heavy usage of macro for code expansion. Once code is expended, I measured a bit more than 330k loc. This is distributed approximately as 1/5 of bitpacker1x, 2/5 of bitpacker4x, and 2/5 bitpacker8x.
Commenting mod bitpacker8x;reduced peek memory usage to 1.2Gio, so it might be interesting to have features to enable/disable individual bitpackers depending on what is required. For instance, Tantivy seems to use only 4x, so it's probably possible to go sub 1G by just disabling unneeded bitpackers

error in bitbackerx8

There seems to be a copy-past error here, the index should be different from one another https://github.com/tantivy-search/bitpacking/blob/8cc0a1700e690de97b0bb4e0fb205024c0deaf3c/src/bitpacker8x.rs#L35-L36

Preparing for rust 1.27

Rust 1.27 will include support for simd instructions.
bitpacking will compile on stable so we can remove all simd files and README explanations.

BitPacker4x en/decoding problem

I have made a playground test where I show what I have done so far to en/decode a list of ordered u32s, the list of integers I use in this example are not en/decoded correctly it seems like there is a difference of exactly 1024 (2^10) between the original numbers 19984 and 21086 (which changed into 20062).

Where do you think I could have made a mistake, is it related to #23 (comment) ?

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=a11e9600e3c3f8e1d4420527668673d2

(Unfortunately bitpacking and zerocopy are not supported by the playground you will have to copy/paste it, sorry for the inconvenience)

Delta for AVX2

As suggest by `` on reddit, compute_delta as well as the integration on AVX2 would be more efficient if it was relying on the `_mm256_alignr_epi8` instruction.

Copy pasting the incomplete snippet from reddit.


fn compute_delta(curr: DataType, prev: DataType) -> DataType {
    unsafe {
        let shifted = _mm256_alignr_epi8(curr, _mm256_permute2x128_i256(prev, curr, 0x21), 12);
        _mm256_sub_epi32(curr, shifted)
    }
}

BitPacker4x incorrectly encodes bit width 32 sorted blocks

Easiest way to explain this is through a repro:

#[test]
fn test_delta_bit_width_32() {
    use bitpacking::BitPacker4x;
    let values = vec![i32::max_value() as u32 + 1; BitPacker4x::BLOCK_LEN];
    let bit_packer = BitPacker4x::new();
    let bit_width = bit_packer.num_bits_sorted(0, &values);
    assert_eq!(bit_width, 32);

    let mut block = vec![0u8; BitPacker4x::compressed_block_size(bit_width)];
    bit_packer.compress_sorted(0, &values, &mut block, bit_width);

    let mut decoded_values = vec![0x10101010; BitPacker4x::BLOCK_LEN];
    bit_packer.decompress_sorted(0, &block, &mut decoded_values, bit_width);

    assert_eq!(values, decoded_values);
}

fails with:

thread 'page::bitpacked::tests::test_delta_bit_width_32' panicked at 'assertion failed: `(left == right)`
  left: `[2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648, 2147483648]`,
 right: `[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]`', data/src/page/bitpacked.rs:296:9

Help / Question: is this applicable to parquet's RLE encoding?

Hi,

I am considering re-using this awesome crate to read parquet files. Parquet uses RLE encoding that seems to be equivalent to using BitPacker1x.

However, I am not 100% sure, and would really appreciate any guidance here.

parquet accepts blocks larger / smaller than 32 items. Do you handle that in tantivy? E.g. do you loop loop over the chunks of 32 items? I imagine that some operations at the boundaries is needed. The example I have is

    #[test]
    fn test_rle() {
        // Test data: 0-7 with bit width 3
        // 0: 000
        // 1: 001
        // 2: 010
        // 3: 011
        // 4: 100
        // 5: 101
        // 6: 110
        // 7: 111
        let num_bits = 3;
        let length = 8;
        // encoded: 0b10001000u8, 0b11000110, 0b11111010
        // we need to pad it with zeros.
        let data = vec![0b10001000u8, 0b11000110, 0b11111010, 0, 0, 0, 0, 0, 0, 0, 0, 0];

        let mut decompressed = vec![0u32; BitPacker1x::BLOCK_LEN];
        bitpacker.decompress(&values, &mut decompressed[..], num_bits);
        decompressed.truncate(length); // truncate to recover it.

        assert_eq!(decompressed, vec![0, 1, 2, 3, 4, 5, 6, 7]);
    }

I wonder how do we generalize it for data whose length is larger than 12. Essentially, I was hoping to have

fn decode(values: &[u8], length: u32, num_bits: u8) -> Vec<u32>

on which values does not have to be a multiple of 12. Any pointers in this direction?

Alternative data types for output?

This crate is super useful, thanks for writing/maintaining it!

I don't know if this is feasible but it would be nice if this crate could output to different types beyond u32. I needed to output to u8 to save space and I ended up rewriting a lot stuff for my purposes (I am only decompressing and always have the same width so I could simplify things).

Having done so, I think that it'd be possible to add another level of macro magic here and output to u8 and u16 for when those types are needed. I'm not too familiar with programming rust macros though, it might create a huge binary if it wasn't done carefully.

Just thought I'd put it out there as a potential enhancement, thanks again for the crate.

'unstable' Cargo feature does not exist

Was curious what benchmarks I would get, guided by README...

❯ cargo +nightly bench --features unstable
error: Package `bitpacking v0.8.3 (/tmp/bitpacking)` does not have the feature `unstable`

Is this the current way:

cargo +nightly bench

quickwit-oss / bitpacking Goto Github PK

bitpacking's Issues

BitPacker4x::num_bits_sorted SIGSEGV on empty arrays

Have a light scalar implementation

please use runtime dispatch via target_feature

Remove simd bitpacking implementations for high bit width

This crate vs tantivy-bitpacker

64 bit version?

Example for compressing/decompressing an arbitrary number of u32 int values

Compilation require heavy amount of memory

error in bitbackerx8

Preparing for rust 1.27

BitPacker4x en/decoding problem

Delta for AVX2

BitPacker4x incorrectly encodes bit width 32 sorted blocks

Help / Question: is this applicable to parquet's RLE encoding?

Alternative data types for output?

'unstable' Cargo feature does not exist

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent