Coder Social home page Coder Social logo

powturbo / turbo-transpose Goto Github PK

View Code? Open in Web Editor NEW
57.0 7.0 8.0 294 KB

Transpose: SIMD Integer+Floating Point Compression Filter

C 97.73% Makefile 2.27%
transpose shuffle compression data-compression simd sse avx2 lz77 floating-point matrix

turbo-transpose's Introduction

Integer + Floating Point Compression FilterBuild Status

  • Fastest transpose/shuffle
    • ๐Ÿ†• (2019.11) ALL TurboTranspose functions now available under 64 bits ARMv8 including NEON SIMD.
    • Byte/Nibble transpose/shuffle for improving compression of binary data (ex. floating point data)
    • โœจ Scalar/SIMD Transpose/Shuffle 8,16,32,64,... bits
    • ๐Ÿ‘ Dynamic CPU detection and JIT scalar/sse/avx2 switching
    • 100% C (C++ headers), usage as simple as memcpy
  • Byte Transpose
    • Fastest byte transpose
    • ๐Ÿ†• (2019.11) 2D,3D,4D transpose
  • Nibble Transpose
    • nearly as fast as byte transpose
    • more efficient, up to 10 times! faster than Bitshuffle
    • ๐Ÿ†• better compression (w/ lz77) and
      10 times! faster than one of the best floating-point compressors SPDP
    • can compress/decompress (w/ lz77) better and faster than other domain specific floating point compressors
  • Scalar and SIMD Transform
    • Delta encoding for sorted lists
    • Zigzag encoding for unsorted lists
    • Xor encoding
    • ๐Ÿ†• lossy floating point compression with user-defined error

Transpose Benchmark:

  • Benchmark Intel CPU: Skylake i7-6700 3.4GHz gcc 9.2 single thread
  • Benchmark ARM: ARMv8 A73-ODROID-N2 1.8GHz

- Speed test

Benchmark w/ 16k buffer

BOLD = pareto frontier.
E:Encode, D:Decode

    ./tpbench -s# file -B16K   (# = 8,4,2)
E cycles/byte D cycles/byte Transpose 64 bits AVX2
.199 .134 TurboTranspose Byte
.326 .201 Blosc byteshuffle
.394 .260 TurboTranspose Nibble
.848 .478 Bitshuffle 8
E cycles/byte D cycles/byte Transpose 32 bits AVX2
.121 .102 TurboTranspose Byte
.451 .139 Blosc byteshuffle
.345 .229 TurboTranspose Nibble
.773 .476 Bitshuffle
E cycles/byte D cycles/byte Transpose 16 bits AVX2
.095 .071 TurboTranspose Byte
.640 .108 Blosc byteshuffle
.329 .198 TurboTranspose Nibble
.758 1.177 Bitshuffle 2
.067 .067 memcpy

E MB/s D MB/s 16 bits ARM 2019.11
8192 16384 TurboTranspose Byte
8192 8192 blosc byteshuffle
1638 2341 TurboTranspose Nibble
356 287 blosc bitshuffle
16384 16384 memcpy
E MB/s D MB/s 32 bits ARM 2019.11
8192 8192 TurboTranspose Byte
8192 8192 blosc byteshuffle
1820 2341 TurboTranspose Nibble
372 252 blosc bitshuffle
E MB/s D MB/s 64 bits ARM 2019.11
4096 8192 TurboTranspose Byte
5461 5461 blosc byteshuffle
1490 1490 TurboTranspose Nibble
372 260 blosc bitshuffle

Transpose/Shuffle benchmark w/ large files (100MB).

MB/s: 1,000,000 bytes/second

    ./tpbench -s# file  (# = 8,4,2)
E MB/s D MB/s Transpose 16 bits AVX2 2019.11
9208 9795 TurboTranspose Byte
8382 7689 Blosc byteshuffle
9377 9584 TurboTranspose Nibble
2750 2530 Blosc bitshuffle
13725 13900 memcpy
E MB/s D MB/s Transpose 32 bits AVX2 2019.11
9718 9713 TurboTranspose Byte
9181 9030 Blosc byteshuffle
8750 9472 TurboTranspose Nibble
2767 2942 Blosc bitshuffle 4
E MB/s D MB/s Transpose 64 bits AVX2 2019.11
8998 9573 TurboTranspose Byte
8721 8586 Blosc byteshuffle 2
8252 9222 TurboTranspose Nibble
2711 2053 Blosc bitshuffle 2

E MB/s D MB/s 16 bits ARM 2019.11
872 3998 TurboTranspose Byte
678 3852 blosc byteshuffle
1365 2195 TurboTranspose Nibble
357 280 blosc bitshuffle
3921 3913 memcpy
E MB/s D MB/s 32 bits ARM 2019.11
1828 3768 TurboTranspose Byte
1769 3713 blosc byteshuffle
1456 2299 TurboTranspose Nibble
374 243 blosc bitshuffle
E MB/s D MB/s 64 bits ARM 2019.11
1793 3572 TurboTranspose Byte
1784 3544 blosc byteshuffle
1176 1267 TurboTranspose Nibble
331 203 blosc bitshuffle

- Compression test (transpose/shuffle+lz4)

๐Ÿ†• Download IcApp a new benchmark for TurboPFor+TurboTranspose
for testing allmost all integer and floating point file types.
Note: Lossy compression benchmark with icapp only.

- Speed test (file msg_sweep3d)
C size ratio % C MB/s D MB/s Name AVX2
11,348,554 18.1 2276 4425 TurboTranspose Nibble+lz
22,489,691 35.8 1670 3881 TurboTranspose Byte+lz
43,471,376 69.2 348 402 SPDP
44,626,407 71.0 1065 2101 bitshuffle+lz
62,865,612 100.0 13300 13300 memcpy
    ./tpbench -s4 -z *.sp
File File size lz % Tp8lz Tp4lz BSlz spdp1 spdp9 Tp4lzt eTp4lzt
msg_bt 133194716 94.3 70.4 66.4 73.9 70.0 67.4 54.7 32.4
msg_lu 97059484 100.4 77.1 70.4 75.4 76.8 74.0 61.0 42.2
msg_sppm 139497932 11.7 11.6 12.6 15.4 14.4 13.7 9.0 5.6
msg_sp 145052928 100.3 68.8 63.7 68.1 67.9 65.3 52.6 24.9
msg_sweep3d 62865612 98.7 35.8 18.1 71.0 69.6 13.7 9.8 3.8
num_brain 70920000 100.4 76.5 71.1 77.4 79.1 73.9 63.4 32.6
num_comet 53673984 92.4 79.0 77.6 82.1 84.5 84.6 70.1 41.7
num_control 79752372 99.4 89.5 90.7 88.1 98.3 98.5 81.4 51.2
num_plasma 17544800 100.4 0.7 0.7 75.5 30.7 2.9 0.3 0.2
obs_error 31080408 89.2 73.1 70.0 76.9 78.3 49.4 20.5 12.2
obs_info 9465264 93.6 70.2 61.9 72.9 62.4 43.8 27.3 15.1
obs_spitzer 99090432 98.3 90.4 95.6 93.6 100.1 100.7 80.2 52.3
obs_temp 19967136 100.4 89.5 92.4 91.0 99.4 100.1 84.0 55.8

Tp8=Byte transpose, Tp4=Nibble transpose, lz = lz4
eTp4Lzt = lossy compression with lzturbo and allowed error = 0.0001 (1e-4)
Slow but best compression: SPDP9 and lzt = lzturbo,39

File File size lz % Tp8lz Tp4lz BSlz spdp1 spdp9 Tp4lzt eTp4lzt
msg_bt 266389432 94.5 77.2 76.5 81.6 77.9 75.4 69.9 16.0
msg_lu 194118968 100.4 82.7 81.0 83.7 83.3 79.6 75.5 21.0
msg_sppm 278995864 18.9 14.5 14.9 19.5 21.5 19.8 11.2 2.8
msg_sp 290105856 100.4 79.2 77.5 80.2 78.8 77.1 71.3 12.4
msg_sweep3d 125731224 98.7 50.7 36.7 80.4 76.2 33.2 27.3 1.9
num_brain 141840000 100.4 82.6 81.1 84.5 87.8 83.3 77.0 16.3
num_comet 107347968 92.8 83.3 78.8 76.3 86.5 86.0 69.8 21.2
num_control 159504744 99.6 92.2 90.9 89.4 97.6 98.9 85.5 25.8
num_plasma 35089600 75.2 0.7 0.7 84.5 77.3 3.0 0.3 0.1
obs_error 62160816 78.7 81.0 77.5 84.4 87.9 62.3 23.4 6.3
obs_info 18930528 92.3 75.4 70.6 82.4 81.7 51.2 33.1 7.7
obs_spitzer 198180864 95.4 93.2 93.7 86.4 100.1 102.4 78.0 26.9
obs_temp 39934272 100.4 93.1 93.8 91.7 98.0 97.4 88.2 28.8

eTp4Lzt = lossy compression with allowed error = 0.0001

Compile:

    git clone git://github.com/powturbo/TurboTranspose.git
    cd TurboTranspose
Linux + Windows MingW
	make
    or
	make AVX2=1
Windows Visual C++
	nmake /f makefile.vs
    or
	nmake AVX2=1 /f makefile.vs
  • benchmark with other libraries
    download or clone bitshuffle or blosc and type

      make AVX2=1 BLOSC=1
      or
      make AVX2=1 BITSHUFFLE=1
    

Testing:

  • benchmark "transpose" functions

    ./tpbench [-s#] [-z] file
    s# = element size #=2,4,8,16,... (default 4) 
    -z = only lz77 compression benchmark (bitshuffle package mandatory)
    

Function usage:

Byte transpose:

void tpenc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tpdec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)

in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)

Nibble transpose:

void tp4enc( unsigned char *in, unsigned n, unsigned char *out, unsigned esize);
void tp4dec( unsigned char *in, unsigned n, unsigned char *out, unsigned esize)

in : input buffer
n : number of bytes
out : output buffer
esize : element size in bytes (2,4,8,...)

Environment:

OS/Compiler (64 bits):
  • Linux: GNU GCC (>=4.6)
  • Linux: Clang (>=3.2)
  • Windows: MinGW-w64 makefile
  • Windows: Visual c++ (>=VS2008) - makefile.vs (for nmake)
  • Windows: Visual Studio project file - vs/vs2017 - Thanks to PavelP
  • Linux ARM: 64 bits aarch64 ARMv8: gcc (>=6.3)
  • Linux ARM: 64 bits aarch64 ARMv8: clang
Multithreading:
  • All TurboTranspose functions are thread safe

References:

Last update: 25 Oct 2019

turbo-transpose's People

Contributors

powturbo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

turbo-transpose's Issues

License issue

Hi,

We are interested in use of your libraries in our project: https://github.com/ClickHouse/ClickHouse
Specifically in our column compression codecs: https://github.com/ClickHouse/ClickHouse/blob/master/src/Compression/ICompressionCodec.h

TurboPFor-Integer-Compression is interesting itself and Turbo-Transpose is interesting as optimisation of transpose functions here:
https://github.com/ClickHouse/ClickHouse/blob/master/src/Compression/CompressionCodecT64.cpp#L168

The problem is your libraries are under GPL and we cannot use them directly. Could we find some solution to change license restrictions for the libraries?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.