- Fastest transpose/shuffle
- Byte/Nibble transpose/shuffle for improving compression of binary data (ex. floating point data)
- โจ Scalar/SIMD Transpose/Shuffle 8,16,32,64,... bits
- ๐ Dynamic CPU detection and JIT scalar/sse/avx2 switching
- 100% C (C++ headers), usage as simple as memcpy
- Byte Transpose
- Fastest byte transpose
- Nibble Transpose
- nearly as fast as byte transpose
- more efficient in most binary data files, up to 6 times! faster than Bitshuffle
- more robust worst case scenario than bitshuffle
- Scalar and SIMD Transform
- Delta encoding for sorted lists
- Zigzag encoding for unsorted lists
- Xor encoding
- CPU: Skylake i7-6700 3.4GHz gcc 6.2 single thread
BOLD = pareto frontier.
c/t: cycles per 1000 bytes. E:Encode, D:Decode
./tpbench -s# file -B16K (# = 8,4,2)
Size | E Time c/t | D Time c/t | Transpose 64 bits AVX2 |
---|---|---|---|
16.000 | 199 | 134 | tpbyte 8 |
16.000 | 326 | 201 | Blosc_shuffle 8 |
16.000 | 394 | 260 | tpnibble 8 |
16.000 | 848 | 478 | Bitshuffle 8 |
Size | E Time c/t | D Time c/t | Transpose 32 bits AVX2 |
---|---|---|---|
16.000 | 121 | 102 | tpbyte 4 |
16.000 | 451 | 139 | Blosc_shuffle 4 |
16.000 | 345 | 229 | tpnibble 4 |
16.000 | 773 | 476 | Bitshuffle 4 |
Size | E Time c/t | D Time c/t | Transpose 16 bits AVX2 |
---|---|---|---|
16.000 | 95 | 71 | tpbyte 2 |
16.000 | 640 | 108 | Blosc_shuffle 2 |
16.000 | 329 | 198 | tpnibble 2 |
16.000 | 758 | 1177 | Bitshuffle 2 |
16.000 | 67 | 67 | memcpy |
MB/s: 1.000.000 bytes/second
./tpbench -s# file (# = 8,4,2)
Size | E Time MB/s | D Time MB/s | Transpose 64 bits AVX2 |
---|---|---|---|
100.000.000 | 8387 | 9408 | tpbyte 8 |
100.000.000 | 8134 | 8598 | Blosc_shuffle 8 |
100.000.000 | 7797 | 9145 | tpnibble 8 |
100.000.000 | 3548 | 3459 | Bitshuffle 8 |
100.000.000 | 13366 | 13366 | memcpy |
Size | E Time MB/s | D Time MB/s | Transpose 32 bits AVX2 |
---|---|---|---|
100.000.000 | 8398 | 9533 | tpbyte 4 |
100.000.000 | 8198 | 9307 | tpnibble 4 |
100.000.000 | 8193 | 8796 | Blosc_shuffle 4 |
100.000.000 | 3679 | 3666 | Bitshuffle 4 |
Size | E Time MB/s | D Time MB/s | Transpose 16 bits AVX2 |
---|---|---|---|
100.000.000 | 7878 | 9542 | tpbyte 2 |
100.000.000 | 8987 | 9412 | tpnibble 2 |
100.000.000 | 7739 | 9404 | Blosc_shuffle 2 |
100.000.000 | 3879 | 2547 | Bitshuffle 2 |
-
Scientific IEEE 754 64-Bit Double-Precision Floating-Point Datasets
./tpbench -s8 -z *.trace
File | File size | lz4 only | TpByte % | TpNibble % | Bitshuffle % |
---|---|---|---|---|---|
msg_bt | 266.389.432 | 94.5 | 77.2 | 76.5 | 81.6 |
msg_lu | 194.118.968 | 100.4 | 82.7 | 81.0 | 83.7 |
msg_sp | 290.105.856 | 100.4 | 79.2 | 77.5 | 80.2 |
msg_sppm | 278.995.864 | 18.9 | 14.5 | 14.9 | 19.5 |
msg_sweep3d | 125.731.224 | 98.7 | 50.7 | 36.7 | 80.4 |
num_brain | 141.840.000 | 100.4 | 82.6 | 81.1 | 84.5 |
num_comet | 107.347.968 | 92.8 | 83.3 | 78.8 | 76.3 |
num_control | 159.504.744 | 99.6 | 92.2 | 90.9 | 89.4 |
num_plasma | 35.089.600 | 75.2 | 0.7 | 0.7 | 84.5 |
obs_error | 62.160.816 | 78.7 | 81.0 | 77.5 | 84.4 |
obs_info | 18.930.528 | 92.3 | 75.4 | 70.6 | 82.4 |
obs_spitzer | 198.180.864 | 95.4 | 93.2 | 93.7 | 86.4 |
obs_temp | 39.934.272 | 100.4 | 93.1 | 93.8 | 91.7 |
git clone git://github.com/powturbo/TurboTranspose.git
cd TurboTranspose
make
or
make AVX2=1
nmake /f makefile.vs
or
nmake AVX2=1 /f makefile.vs
-
benchmark with other libraries
download or clone bitshuffle or blosc and typemake AVX2=1 BLOSC=1 or make AVX2=1 BITSHUFFLE=1
-
benchmark "transpose" functions
./tpbench [-s#] [-z] file s# = element size #=2,4,8,16,... (default 4) -z = only lz77 compression benchmark (bitshuffle package mandatory)
Byte transpose:
void tpenc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tpdec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)
Nibble transpose:
void tp4enc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tp4dec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)
in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)
- Linux: GNU GCC (>=4.6)
- clang (>=3.2)
- Windows: MinGW-w64
- Windows: Visual C++ (>=VS2008)
- All TurboTranspose functions are thread safe
Last update: 01 JUL 2017