Coder Social home page Coder Social logo

Comments (9)

fulmicoton avatar fulmicoton commented on June 3, 2024 1

The format is different.
As far as I know there are no trick I know to use SIMD instructions in a simple way for the format used in Parquet.
(of course, some stuff can probably be done for some specific bit width)

from bitpacking.

jorgecarleitao avatar jorgecarleitao commented on June 3, 2024 1

nvmd, it was a mistake on my side. FWIW, the working code that decodes arbitrary byte slices:

/// Decodes bitpacked bytes of `num_bits` into `u32`.
/// Returns the number of decoded values.
/// # Panics
/// Panics if `decompressed` has less than [`required_capacity`]
pub fn decode(compressed: &[u8], num_bits: u8, decompressed: &mut [u32]) -> usize {
    let bitpacker = BitPacker1x::new();

    let compressed_block_size = BitPacker1x::BLOCK_LEN * num_bits as usize / 8;

    let compressed_chunks = compressed.chunks_exact(compressed_block_size);
    let decompressed_chunks = decompressed.chunks_exact_mut(BitPacker1x::BLOCK_LEN);

    let remainder = compressed_chunks.remainder();
    let last_compressed_chunk = if remainder.is_empty() {
        vec![]
    } else {
        let mut last_compressed_chunk = remainder.to_vec();
        last_compressed_chunk
            .extend(std::iter::repeat(0).take(compressed_block_size - remainder.len()));
        last_compressed_chunk
    };
    let last_compressed_chunks = last_compressed_chunk.chunks_exact(compressed_block_size);
    debug_assert_eq!(last_compressed_chunks.remainder().len(), 0);

    compressed_chunks
        .chain(last_compressed_chunks)
        .zip(decompressed_chunks)
        .for_each(|(c_chunk, d_chunk)| {
            bitpacker.decompress(&c_chunk, d_chunk, num_bits);
        });
    compressed.len() / num_bits as usize * 8
}

/// Encodes `u32` values into a buffer using `num_bits`.
pub fn encode(decompressed: &[u32], num_bits: u8, compressed: &mut [u8]) -> usize {
    let bitpacker = BitPacker1x::new();

    let chunks = decompressed.chunks_exact(BitPacker1x::BLOCK_LEN);

    let remainder = chunks.remainder();

    let mut last_chunk = remainder.to_vec();
    let trailing = BitPacker1x::BLOCK_LEN - remainder.len();
    last_chunk.extend(std::iter::repeat(0).take(trailing));

    let mut compressed_len = 0;
    chunks
        .chain(std::iter::once(last_chunk.as_ref()))
        .for_each(|chunk| {
            let chunk_compressed =
                &mut compressed[compressed_len..compressed_len + BitPacker1x::BLOCK_LEN];
            compressed_len += bitpacker.compress(&chunk, chunk_compressed, num_bits);
        });
    decompressed.len() * num_bits as usize / 8
}

I am using this library read parquet files here: https://github.com/jorgecarleitao/parquet2 . Will close this issue as "yes".

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

Tantivy only uses Bitpacker4x which uses simd instructions. It is limited to block of 128 x u32.

Bitpacker1x only works with blocks of 32 integers. That's because it manipulates u32 as the vessel for this bitpacking.
Packs of 32 ensures that regardless of the bit width we end up with a round multiple of u32 in output.

Parquet works at the byte level and therefore allows packs of multiples of 8.
You could probably use the bitpacking and either:

  • use your padding and truncation trick for the last block
  • use bitpacking for (run-len / 32) and complete with a different encoder.

I'd love to see what performance look like.

fn decode(values: &[u8], length: u32, num_bits: u8) -> Vec

Are you interested in compression and decompression?

from bitpacking.

jorgecarleitao avatar jorgecarleitao commented on June 3, 2024

Thanks a lot for your quick answer, @fulmicoton .

Tantivy only uses Bitpacker4x which uses simd instructions. It is limited to block of 128 x u32.

So, by design Tantivy always stores in blocks and thus has no problem with the boundaries. Got it.

You could probably use the bitpacking and either:

  • use your padding and truncation trick for the last block
  • use bitpacking for (run-len / 32) and complete with a different encoder.

Sweet. What I am uncertain of: is it sufficient to run Bitpacker1x(chunk) and concatenate the results? I.e. something like this:

fn decode(values: &[u8], length: usize, num_bits: u8) -> Vec<u32> {
    let bitpacker = BitPacker1x::new();
    let mut decompressed = vec![0u32; length];
    let chunks = values.chunks_exact(BitPacker1x::BLOCK_LEN);
    chunks.enumerate().for_each(|(i, chunk)| {
         bitpacker.decompress(&chunk, &mut decompressed[i*BitPacker1x::BLOCK_LEN..(i +1)*BitPacker1x::BLOCK_LEN], num_bits);
    });
    // todo: handle remainder
    decompressed
}

or do we need to perform some shifts and & so that we carry-over some of the information between blocks?

Are you interested in compression and decompression?

Yes, compression and decompression (as we want to read and write to parquet).

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

Yes it should be correct, but with this you are lacking the last 7 values.

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

@jorgecarleitao contrary to what you suggested in this ticket https://issues.apache.org/jira/browse/ARROW-11897?jql=text%20~%20%22tantivy%22 Bitpacker1x does not use simd instructions. The entire decoding / encoding loops are inlined however.

from bitpacking.

jorgecarleitao avatar jorgecarleitao commented on June 3, 2024

@fulmicoton . Perfect! I will then start from there.

Is there a reason why we can't use BitPacker4x and the same chunking procedure as above? When I wrote that comment I was thinking in using BitPacker4x. I though about starting with BitPacker1x just because it is easier to reason about. Or do they represent different encodings altogether?

from bitpacking.

jorgecarleitao avatar jorgecarleitao commented on June 3, 2024

Hi, I have been trying to make this work, but I am stuck and you may know why.

I have the following

use bitpacking::BitPacker;
use bitpacking::BitPacker1x;


pub fn required_capacity(length: u32) -> usize {
    let chunks_n = (length as usize + BitPacker1x::BLOCK_LEN - 1) / BitPacker1x::BLOCK_LEN;
    chunks_n * BitPacker1x::BLOCK_LEN
}

pub fn decode(compressed: &[u8], num_bits: u8, decompressed: &mut [u32]) -> usize {
    let bitpacker = BitPacker1x::new();

    let compressed_block_size = BitPacker1x::BLOCK_LEN * num_bits as usize / 8;

    let compressed_chunks = compressed.chunks_exact(compressed_block_size);
    let decompressed_chunks = decompressed.chunks_exact_mut(BitPacker1x::BLOCK_LEN);

    let remainder = compressed_chunks.remainder();
    let last_compressed_chunk = if remainder.is_empty() {
        vec![]
    } else {
        let mut last_compressed_chunk = compressed.to_vec();
        last_compressed_chunk
            .extend(std::iter::repeat(0).take(compressed_block_size - remainder.len()));
        last_compressed_chunk
    };
    let last_compressed_chunks = last_compressed_chunk.chunks_exact(compressed_block_size);
    debug_assert_eq!(last_compressed_chunks.remainder().len(), 0);

    compressed_chunks
        .chain(last_compressed_chunks)
        .zip(decompressed_chunks)
        .for_each(|(c_chunk, d_chunk)| {
            bitpacker.decompress(&c_chunk, d_chunk, num_bits);
        });
    compressed.len() / num_bits as usize * 8
}

But it is not being able to correctly decode


#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn aaaa() {
        let length = 74;
        let expected = (0..length as u32).map(|x| x % 7).collect::<Vec<_>>();
        let data = &[
            0b10001000,
            0b11000110,
            0b00011010,
            0b11010001,
            0b01011000,
            0b00100011,
            0b00011010,
            0b01101011,
            0b01000100,
            0b01100011,
            0b10001101,
            0b01101000,
            0b10101100,
            0b00010001,
            0b10001101,
            0b00110101,
            0b10100010,
            0b10110001,
            0b01000110,
            0b00110100,
            0b11010110,
            0b10001000,
            0b11000110,
            0b00011010,
            0b11010001,
            0b01011000,
            0b00000011,
        ];

        let num_bits = 3;
        let mut decoded = vec![0; required_capacity(length)];
        decode(data, num_bits, &mut decoded);
        decoded.truncate(length as usize);
        assert_eq!(decoded, expected);
    }
}

I believe that this is related to the fact that around block boundaries there are carry-over bit offsets that may be discarded when the next block is decompressed.

from bitpacking.

fulmicoton avatar fulmicoton commented on June 3, 2024

Let me know if you run some benchmarks with the previous method.
It is looking good. I wonder if the allocation you have to handle the remainder can be harmful for some payloads.

from bitpacking.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.