Comments (9)
The format is different.
As far as I know there are no trick I know to use SIMD instructions in a simple way for the format used in Parquet.
(of course, some stuff can probably be done for some specific bit width)
from bitpacking.
nvmd, it was a mistake on my side. FWIW, the working code that decodes arbitrary byte slices:
/// Decodes bitpacked bytes of `num_bits` into `u32`.
/// Returns the number of decoded values.
/// # Panics
/// Panics if `decompressed` has less than [`required_capacity`]
pub fn decode(compressed: &[u8], num_bits: u8, decompressed: &mut [u32]) -> usize {
let bitpacker = BitPacker1x::new();
let compressed_block_size = BitPacker1x::BLOCK_LEN * num_bits as usize / 8;
let compressed_chunks = compressed.chunks_exact(compressed_block_size);
let decompressed_chunks = decompressed.chunks_exact_mut(BitPacker1x::BLOCK_LEN);
let remainder = compressed_chunks.remainder();
let last_compressed_chunk = if remainder.is_empty() {
vec![]
} else {
let mut last_compressed_chunk = remainder.to_vec();
last_compressed_chunk
.extend(std::iter::repeat(0).take(compressed_block_size - remainder.len()));
last_compressed_chunk
};
let last_compressed_chunks = last_compressed_chunk.chunks_exact(compressed_block_size);
debug_assert_eq!(last_compressed_chunks.remainder().len(), 0);
compressed_chunks
.chain(last_compressed_chunks)
.zip(decompressed_chunks)
.for_each(|(c_chunk, d_chunk)| {
bitpacker.decompress(&c_chunk, d_chunk, num_bits);
});
compressed.len() / num_bits as usize * 8
}
/// Encodes `u32` values into a buffer using `num_bits`.
pub fn encode(decompressed: &[u32], num_bits: u8, compressed: &mut [u8]) -> usize {
let bitpacker = BitPacker1x::new();
let chunks = decompressed.chunks_exact(BitPacker1x::BLOCK_LEN);
let remainder = chunks.remainder();
let mut last_chunk = remainder.to_vec();
let trailing = BitPacker1x::BLOCK_LEN - remainder.len();
last_chunk.extend(std::iter::repeat(0).take(trailing));
let mut compressed_len = 0;
chunks
.chain(std::iter::once(last_chunk.as_ref()))
.for_each(|chunk| {
let chunk_compressed =
&mut compressed[compressed_len..compressed_len + BitPacker1x::BLOCK_LEN];
compressed_len += bitpacker.compress(&chunk, chunk_compressed, num_bits);
});
decompressed.len() * num_bits as usize / 8
}
I am using this library read parquet files here: https://github.com/jorgecarleitao/parquet2 . Will close this issue as "yes".
from bitpacking.
Tantivy only uses Bitpacker4x which uses simd instructions. It is limited to block of 128 x u32.
Bitpacker1x only works with blocks of 32 integers. That's because it manipulates u32 as the vessel for this bitpacking.
Packs of 32 ensures that regardless of the bit width we end up with a round multiple of u32 in output.
Parquet works at the byte level and therefore allows packs of multiples of 8.
You could probably use the bitpacking and either:
- use your padding and truncation trick for the last block
- use bitpacking for (run-len / 32) and complete with a different encoder.
I'd love to see what performance look like.
fn decode(values: &[u8], length: u32, num_bits: u8) -> Vec
Are you interested in compression and decompression?
from bitpacking.
Thanks a lot for your quick answer, @fulmicoton .
Tantivy only uses Bitpacker4x which uses simd instructions. It is limited to block of 128 x u32.
So, by design Tantivy always stores in blocks and thus has no problem with the boundaries. Got it.
You could probably use the bitpacking and either:
- use your padding and truncation trick for the last block
- use bitpacking for (run-len / 32) and complete with a different encoder.
Sweet. What I am uncertain of: is it sufficient to run Bitpacker1x(chunk)
and concatenate the results? I.e. something like this:
fn decode(values: &[u8], length: usize, num_bits: u8) -> Vec<u32> {
let bitpacker = BitPacker1x::new();
let mut decompressed = vec![0u32; length];
let chunks = values.chunks_exact(BitPacker1x::BLOCK_LEN);
chunks.enumerate().for_each(|(i, chunk)| {
bitpacker.decompress(&chunk, &mut decompressed[i*BitPacker1x::BLOCK_LEN..(i +1)*BitPacker1x::BLOCK_LEN], num_bits);
});
// todo: handle remainder
decompressed
}
or do we need to perform some shifts and &
so that we carry-over some of the information between blocks?
Are you interested in compression and decompression?
Yes, compression and decompression (as we want to read and write to parquet).
from bitpacking.
Yes it should be correct, but with this you are lacking the last 7 values.
from bitpacking.
@jorgecarleitao contrary to what you suggested in this ticket https://issues.apache.org/jira/browse/ARROW-11897?jql=text%20~%20%22tantivy%22 Bitpacker1x does not use simd instructions. The entire decoding / encoding loops are inlined however.
from bitpacking.
@fulmicoton . Perfect! I will then start from there.
Is there a reason why we can't use BitPacker4x
and the same chunking procedure as above? When I wrote that comment I was thinking in using BitPacker4x
. I though about starting with BitPacker1x
just because it is easier to reason about. Or do they represent different encodings altogether?
from bitpacking.
Hi, I have been trying to make this work, but I am stuck and you may know why.
I have the following
use bitpacking::BitPacker;
use bitpacking::BitPacker1x;
pub fn required_capacity(length: u32) -> usize {
let chunks_n = (length as usize + BitPacker1x::BLOCK_LEN - 1) / BitPacker1x::BLOCK_LEN;
chunks_n * BitPacker1x::BLOCK_LEN
}
pub fn decode(compressed: &[u8], num_bits: u8, decompressed: &mut [u32]) -> usize {
let bitpacker = BitPacker1x::new();
let compressed_block_size = BitPacker1x::BLOCK_LEN * num_bits as usize / 8;
let compressed_chunks = compressed.chunks_exact(compressed_block_size);
let decompressed_chunks = decompressed.chunks_exact_mut(BitPacker1x::BLOCK_LEN);
let remainder = compressed_chunks.remainder();
let last_compressed_chunk = if remainder.is_empty() {
vec![]
} else {
let mut last_compressed_chunk = compressed.to_vec();
last_compressed_chunk
.extend(std::iter::repeat(0).take(compressed_block_size - remainder.len()));
last_compressed_chunk
};
let last_compressed_chunks = last_compressed_chunk.chunks_exact(compressed_block_size);
debug_assert_eq!(last_compressed_chunks.remainder().len(), 0);
compressed_chunks
.chain(last_compressed_chunks)
.zip(decompressed_chunks)
.for_each(|(c_chunk, d_chunk)| {
bitpacker.decompress(&c_chunk, d_chunk, num_bits);
});
compressed.len() / num_bits as usize * 8
}
But it is not being able to correctly decode
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn aaaa() {
let length = 74;
let expected = (0..length as u32).map(|x| x % 7).collect::<Vec<_>>();
let data = &[
0b10001000,
0b11000110,
0b00011010,
0b11010001,
0b01011000,
0b00100011,
0b00011010,
0b01101011,
0b01000100,
0b01100011,
0b10001101,
0b01101000,
0b10101100,
0b00010001,
0b10001101,
0b00110101,
0b10100010,
0b10110001,
0b01000110,
0b00110100,
0b11010110,
0b10001000,
0b11000110,
0b00011010,
0b11010001,
0b01011000,
0b00000011,
];
let num_bits = 3;
let mut decoded = vec![0; required_capacity(length)];
decode(data, num_bits, &mut decoded);
decoded.truncate(length as usize);
assert_eq!(decoded, expected);
}
}
I believe that this is related to the fact that around block boundaries there are carry-over bit offsets that may be discarded when the next block is decompressed.
from bitpacking.
Let me know if you run some benchmarks with the previous method.
It is looking good. I wonder if the allocation you have to handle the remainder can be harmful for some payloads.
from bitpacking.
Related Issues (15)
- error in bitbackerx8 HOT 1
- Compilation require heavy amount of memory HOT 2
- BitPacker4x incorrectly encodes bit width 32 sorted blocks HOT 10
- Have a light scalar implementation HOT 2
- Alternative data types for output? HOT 6
- BitPacker4x::num_bits_sorted SIGSEGV on empty arrays HOT 7
- BitPacker4x en/decoding problem HOT 3
- Example for compressing/decompressing an arbitrary number of u32 int values HOT 3
- please use runtime dispatch via target_feature HOT 12
- 'unstable' Cargo feature does not exist HOT 1
- Delta for AVX2
- 64 bit version? HOT 2
- This crate vs tantivy-bitpacker HOT 2
- Preparing for rust 1.27
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bitpacking.