Hi, I am considering re-using this awesome crate to read parquet fil

Thanks a lot for your quick answer, <a class="user-mention notranslate" data-hovercard

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Help / Question: is this applicable to parquet's RLE encoding? about bitpacking HOT 9 CLOSED

quickwit-oss commented on June 3, 2024

Help / Question: is this applicable to parquet's RLE encoding?

from bitpacking.

Comments (9)

fulmicoton commented on June 3, 2024 1

The format is different.
As far as I know there are no trick I know to use SIMD instructions in a simple way for the format used in Parquet.
(of course, some stuff can probably be done for some specific bit width)

from bitpacking.

jorgecarleitao commented on June 3, 2024 1

nvmd, it was a mistake on my side. FWIW, the working code that decodes arbitrary byte slices:

/// Decodes bitpacked bytes of `num_bits` into `u32`.
/// Returns the number of decoded values.
/// # Panics
/// Panics if `decompressed` has less than [`required_capacity`]
pub fn decode(compressed: &[u8], num_bits: u8, decompressed: &mut [u32]) -> usize {
    let bitpacker = BitPacker1x::new();

    let compressed_block_size = BitPacker1x::BLOCK_LEN * num_bits as usize / 8;

    let compressed_chunks = compressed.chunks_exact(compressed_block_size);
    let decompressed_chunks = decompressed.chunks_exact_mut(BitPacker1x::BLOCK_LEN);

    let remainder = compressed_chunks.remainder();
    let last_compressed_chunk = if remainder.is_empty() {
        vec![]
    } else {
        let mut last_compressed_chunk = remainder.to_vec();
        last_compressed_chunk
            .extend(std::iter::repeat(0).take(compressed_block_size - remainder.len()));
        last_compressed_chunk
    };
    let last_compressed_chunks = last_compressed_chunk.chunks_exact(compressed_block_size);
    debug_assert_eq!(last_compressed_chunks.remainder().len(), 0);

    compressed_chunks
        .chain(last_compressed_chunks)
        .zip(decompressed_chunks)
        .for_each(|(c_chunk, d_chunk)| {
            bitpacker.decompress(&c_chunk, d_chunk, num_bits);
        });
    compressed.len() / num_bits as usize * 8
}

/// Encodes `u32` values into a buffer using `num_bits`.
pub fn encode(decompressed: &[u32], num_bits: u8, compressed: &mut [u8]) -> usize {
    let bitpacker = BitPacker1x::new();

    let chunks = decompressed.chunks_exact(BitPacker1x::BLOCK_LEN);

    let remainder = chunks.remainder();

    let mut last_chunk = remainder.to_vec();
    let trailing = BitPacker1x::BLOCK_LEN - remainder.len();
    last_chunk.extend(std::iter::repeat(0).take(trailing));

    let mut compressed_len = 0;
    chunks
        .chain(std::iter::once(last_chunk.as_ref()))
        .for_each(|chunk| {
            let chunk_compressed =
                &mut compressed[compressed_len..compressed_len + BitPacker1x::BLOCK_LEN];
            compressed_len += bitpacker.compress(&chunk, chunk_compressed, num_bits);
        });
    decompressed.len() * num_bits as usize / 8
}

I am using this library read parquet files here: https://github.com/jorgecarleitao/parquet2 . Will close this issue as "yes".

from bitpacking.

fulmicoton commented on June 3, 2024

Tantivy only uses Bitpacker4x which uses simd instructions. It is limited to block of 128 x u32.

Bitpacker1x only works with blocks of 32 integers. That's because it manipulates u32 as the vessel for this bitpacking.
Packs of 32 ensures that regardless of the bit width we end up with a round multiple of u32 in output.

Parquet works at the byte level and therefore allows packs of multiples of 8.
You could probably use the bitpacking and either:

use your padding and truncation trick for the last block
use bitpacking for (run-len / 32) and complete with a different encoder.

I'd love to see what performance look like.

fn decode(values: &[u8], length: u32, num_bits: u8) -> Vec

Are you interested in compression and decompression?

from bitpacking.

jorgecarleitao commented on June 3, 2024

Thanks a lot for your quick answer, @fulmicoton .

Tantivy only uses Bitpacker4x which uses simd instructions. It is limited to block of 128 x u32.

So, by design Tantivy always stores in blocks and thus has no problem with the boundaries. Got it.

You could probably use the bitpacking and either:

use your padding and truncation trick for the last block

use bitpacking for (run-len / 32) and complete with a different encoder.

Sweet. What I am uncertain of: is it sufficient to run Bitpacker1x(chunk) and concatenate the results? I.e. something like this:

fn decode(values: &[u8], length: usize, num_bits: u8) -> Vec<u32> {
    let bitpacker = BitPacker1x::new();
    let mut decompressed = vec![0u32; length];
    let chunks = values.chunks_exact(BitPacker1x::BLOCK_LEN);
    chunks.enumerate().for_each(|(i, chunk)| {
         bitpacker.decompress(&chunk, &mut decompressed[i*BitPacker1x::BLOCK_LEN..(i +1)*BitPacker1x::BLOCK_LEN], num_bits);
    });
    // todo: handle remainder
    decompressed
}

or do we need to perform some shifts and & so that we carry-over some of the information between blocks?

Are you interested in compression and decompression?

Yes, compression and decompression (as we want to read and write to parquet).

from bitpacking.

fulmicoton commented on June 3, 2024

Yes it should be correct, but with this you are lacking the last 7 values.

from bitpacking.

fulmicoton commented on June 3, 2024

@jorgecarleitao contrary to what you suggested in this ticket https://issues.apache.org/jira/browse/ARROW-11897?jql=text%20~%20%22tantivy%22 Bitpacker1x does not use simd instructions. The entire decoding / encoding loops are inlined however.

from bitpacking.

jorgecarleitao commented on June 3, 2024

@fulmicoton . Perfect! I will then start from there.

Is there a reason why we can't use BitPacker4x and the same chunking procedure as above? When I wrote that comment I was thinking in using BitPacker4x. I though about starting with BitPacker1x just because it is easier to reason about. Or do they represent different encodings altogether?

from bitpacking.

jorgecarleitao commented on June 3, 2024

Hi, I have been trying to make this work, but I am stuck and you may know why.

I have the following

use bitpacking::BitPacker;
use bitpacking::BitPacker1x;


pub fn required_capacity(length: u32) -> usize {
    let chunks_n = (length as usize + BitPacker1x::BLOCK_LEN - 1) / BitPacker1x::BLOCK_LEN;
    chunks_n * BitPacker1x::BLOCK_LEN
}

pub fn decode(compressed: &[u8], num_bits: u8, decompressed: &mut [u32]) -> usize {
    let bitpacker = BitPacker1x::new();

    let compressed_block_size = BitPacker1x::BLOCK_LEN * num_bits as usize / 8;

    let compressed_chunks = compressed.chunks_exact(compressed_block_size);
    let decompressed_chunks = decompressed.chunks_exact_mut(BitPacker1x::BLOCK_LEN);

    let remainder = compressed_chunks.remainder();
    let last_compressed_chunk = if remainder.is_empty() {
        vec![]
    } else {
        let mut last_compressed_chunk = compressed.to_vec();
        last_compressed_chunk
            .extend(std::iter::repeat(0).take(compressed_block_size - remainder.len()));
        last_compressed_chunk
    };
    let last_compressed_chunks = last_compressed_chunk.chunks_exact(compressed_block_size);
    debug_assert_eq!(last_compressed_chunks.remainder().len(), 0);

    compressed_chunks
        .chain(last_compressed_chunks)
        .zip(decompressed_chunks)
        .for_each(|(c_chunk, d_chunk)| {
            bitpacker.decompress(&c_chunk, d_chunk, num_bits);
        });
    compressed.len() / num_bits as usize * 8
}

But it is not being able to correctly decode


#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn aaaa() {
        let length = 74;
        let expected = (0..length as u32).map(|x| x % 7).collect::<Vec<_>>();
        let data = &[
            0b10001000,
            0b11000110,
            0b00011010,
            0b11010001,
            0b01011000,
            0b00100011,
            0b00011010,
            0b01101011,
            0b01000100,
            0b01100011,
            0b10001101,
            0b01101000,
            0b10101100,
            0b00010001,
            0b10001101,
            0b00110101,
            0b10100010,
            0b10110001,
            0b01000110,
            0b00110100,
            0b11010110,
            0b10001000,
            0b11000110,
            0b00011010,
            0b11010001,
            0b01011000,
            0b00000011,
        ];

        let num_bits = 3;
        let mut decoded = vec![0; required_capacity(length)];
        decode(data, num_bits, &mut decoded);
        decoded.truncate(length as usize);
        assert_eq!(decoded, expected);
    }
}

I believe that this is related to the fact that around block boundaries there are carry-over bit offsets that may be discarded when the next block is decompressed.

from bitpacking.

fulmicoton commented on June 3, 2024

Let me know if you run some benchmarks with the previous method.
It is looking good. I wonder if the allocation you have to handle the remainder can be harmful for some payloads.

from bitpacking.

Help / Question: is this applicable to parquet's RLE encoding? about bitpacking HOT 9 CLOSED

Comments (9)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent