Coder Social home page Coder Social logo

powturbo / turbo-range-coder Goto Github PK

View Code? Open in Web Editor NEW
65.0 7.0 5.0 1.16 MB

TurboRC - Fastest Range Coder + Arithmetic Coding / Fastest Asymmetric Numeral Systems

License: GNU General Public License v3.0

C 94.84% Makefile 0.51% C++ 4.65%
range-coder entropy-coding arithmetic-coding entropy-coder arithmetic-coder encoding decoding huffman-coding asymmetric-numeral-systems data-compression

turbo-range-coder's Introduction

TurboRC: Turbo Range Coder + rANS Asymmetric Numeral Systems

Build ubuntu

======================================

  • Fastest (Branchless) Range Coder / Arithmetic Coder
    • 100% C (C++ headers).
    • OS/Arch: Linux amd/intel, arm, PowerPC, s390x, MacOs+Apple M1. Windows: Mingw, visual c++
    • No other Range Coder / Arithmetic Coder encode or decode faster with better compression
    • Up to 3 times faster than the next fastest range coder with similar compression ratio
    • Can work as bitwise or/and as multisymbol range coder
    • 32 or 64 bits range coder. Big + Little endian
    • Renormalization output 8,16 or 32 bits
    • Easy connection to bit, nibble or byte predictors.
    • Several built-in predictors: simple, dual speed, fsm
    • Built-in order0, order1, order2, Sliding Context, Context mixing,
      - Run Length Encoding, Gamma Coding, Rice Coding,
      - Bit entropy coding,
      - Turbo VLC: novel Variable Length Coding for large integers,
      - MTF (Move-To-Front) / QLFC (Quantized Local Frequency Coding)
    • Fast full 8/16 bits BWT: Burrows-Wheeler compression/decompression w/
      • preprocessing : lzp and utf-8
      • postprocessing: QLFC (Quantized Local Frequency Coding),
        RLE and Bit entropy coder
    • BWT :libsais + optimized libdivsufsort + optimized inverse bwt included
    • static + adaptive CDF - cumulative distribution functions
    • stdin/stdout file compressor included
    • TurboRC App for benchmarking all the functions and test allmost all byte, integer, floating point, date and timestamp file types.
      • read and convert text, csv or binary files to 8/16/32 bits before processing
      • set predictor and parameters at the command line
    • Preprocessing:
      • ๐Ÿ†• Linear Quantization for 16/32/64 floating point to 8/16/32/64 integer (2023.08)
  • Asymmetric Numeral Systems
    • ๐Ÿ†• Fastest Adaptive CDF rANS Asymmetric Numeral Systems / SSE, AVX2. (2023.05)
    • ๐Ÿ†• bitwise ANS (2023.05)
    • No other adaptive ANS encode or decode faster with better compression

LICENSE

  • GPL 3.0
  • A commercial license is available. Contact us at powturbo [AT] gmail.com for more information.

Usage examples

    ./turborc -e0   file           " benchmark all basic functions using the default simple predictor
    ./turborc -e20  file           " byte gamma coding + rc
    ./turborc -e1,2 file
    ./turborc -e0 -pss -r47 file   " use dual speed predictor with parameters 4 and 7
    ./turborc -e0 -psf -r1 file    " use FSM predictor with filename "FSM1.txt"
    ./turborc -e0 file -Os         " raw 16 bits input
    ./turborc -e0 file -Ou         " raw 32 bits input
    ./turborc -e0 file -Ft         " text file (one integer/line) 
    ./turborc -e0 file -Fc         " text file with multiple integer entries (separated by non-digits characters ex. 456,32,54)
    ./turborc -e0 file -Fc -v5     " like prev., display the first 100 values read
    ./turborc -e0 file -Fcf        " text file with multiple floating-point entries (separated by non-digits characters ex. 456.56,32.1,54)
    ./turborc -e0 file -Fru -Ob    " convert raw 32 bits input to bytes before processing possibly truncating large values
    ./turborc -e0 file -Ft -K3 -Ou " convert column 3 of a csv text file to 32 bits integers
    ./turborc -e0 file -pss -r47   " benchmark all basic functions using the dual speed predictor with paramters 4 and 7
    ./turborc -e0 file -psf -r1    " benchmark all basic functions using the fsm predictor with the paramter file FSM1.txt

Benchmark

see also Entropy Coder Benchmark

File: enwik8bwt generated BWT (wikipedia XML 100MB )
    > turborc -e0 enwik8bwt
C Size ratio% C MB/s D MB/s Name Description
23334248 23.33% 88.20 88.54 1:rc o0
22394444 22.39% 82.46 86.35 2:rcc o1
23116048 23.12% 74.11 79.26 3:rcc2 o2
22500640 22.50% 64.96 67.81 4:rcx o8b =o1 context slide
23213968 23.21% 55.70 61.45 5:rcx2 o16b=o2 context slide
21605020 21.61% 25.48 26.98 9:rcms o1 mixer/sse
21550184 21.55% 21.82 22.74 10:rcm2 o2 mixer/sse
20814372 20.81% 23.06 24.77 11:rcmr o2 8b mixer/sse run
20789560 20.79% 22.68 24.70 12:rcmrr o2 8b mixer/sse run > 2
23170048 23.17% 156.61 129.94 13:rcrle RLE o0
22004856 22.00% 128.43 114.98 14:rcrle1 RLE o1
23412436 23.41% 73.08 70.74 17:rcu3 varint8 3/5/8 bits
21088368 21.09% 79.78 93.89 18:rcqlfc QLFC
22275484 22.28% 91.81 96.60 19:bec Bit EC
32703468 32.70% 54.48 58.27 26:rcg-8 gamma
32271396 32.27% 124.84 110.15 27:rcgz-8 gamma zigzag
34195068 34.20% 66.13 65.23 28:rcr-8 rice
36864024 36.86% 78.31 70.00 29:rcrz-8 rice zigzag
63541712 63.54% 552.28 87.84 42:cdfsb static/decode search
63541712 63.54% 552.38 115.42 43:cdfsv static/decode division
63976686 63.98% 479.38 104.09 44:cdfsm static/decode division lut
63541720 63.54% 628.18 92.24 45:cdfsb static interlv/dec. search
24811052 24.81% 177.39 104.30 46:cdf byte adaptive
24811060 24.81% 191.80 98.96 47:cdfi byte adaptive interleaved
31004892 31.00% 158.06 72.18 48:cdf-8 vnibble
31004896 31.00% 159.56 73.53 49:cdfi-8 vnibble interleaved
24848864 24.85% 116.76 202.27 56:ans auto
24848864 24.85% 126.57 175.43 57:ans sse
23068372 23.07% 128.06 83.57 64:ans auto o1
23521656 23.52% 50.43 82.32 66:ansb bitwise ans
100000012 100.00% 16495.29 16050.82 79:memcpy

BWT Benchmark: TurboRC vs the best BWT compressors (2023.04)

- enwik8 - 100.000.000 bytes EN Wikipedia

(bold = pareto) MB=1.000.000

C Size ratio% C MB/s D MB/s Name
20698282 20.7 9.02 16.04 TurboRC 20e9
20749619 20.7 10.70 9.19 bzip3
20786596 20.8 13.72 19.26 bsc 0e2
20920306 20.9 16.92 29.48 bsc 0e1
21002082 21.0 17.42 36.21 TurboRC 20e8
21224212 21.2 19.03 37.14 bsc 0e0
21824818 21.8 17.89 38.13 TurboRC 20e6
22011302 22.0 19.39 40.86 TurboRC 20e5
29008758 29.0 20.72 43.39 bzip2

- Silesia - Compression Corpus (211 MB mixed binary + text)

C Size ratio% C MB/s D MB/s Name
48400486 22.8 9.63 16.08 TurboRC 20e9
48621296 22.9 14.51 18.05 bsc 0e2
48754005 23.0 12.49 11.73 bzip3
49142246 23.2 18.47 28.62 bsc 0e1
49589166 23.4 18.64 34.69 TurboRC 20e8
50110576 23.6 20.98 35.99 bsc 0e3
54592210 25.8 18.22 52.14 bzip2
C Size ratio% C MB/s D MB/s Name
18720206 17.9 10.89 19.07 TurboRC 20e9
18739661 17.9 12.69 11.12 bzip3
19080950 18.2 20.59 40.81 TurboRC 20e8
19255056 18.4 15.83 21.37 bsc 0e2
19371264 18.5 19.68 33.01 bsc 0e1
19614180 18.7 22.22 41.67 bsc 0e0
19673386 18.8 20.91 42.48 TurboRC 20e6
19809790 18.9 22.90 45.62 TurboRC 20e5
29433182 28.1 19.65 41.49 bzip2

- html8 : 100MB random html pages from Alexa 1m Top sites

C Size ratio% C MB/s D MB/s Name
13203250 13.2 15.85 26.16 TurboRC 20e9
13301850 13.3 17.61 16.32 bzip3
13442610 13.4 30.55 56.65 TurboRC 20e8
13601922 13.6 19.12 26.78 bsc 0e2
13688478 13.7 22.77 39.10 bsc 0e1
13918372 13.9 25.63 46.95 bsc 0e0
18162609 18.2 21.16 67.90 bzip2

- enwik9 - 1GB EN Wikipedia

C Size ratio% C MB/s D MB/s Name
163656130 16.4 8.03 15.75 TurboRC 20e9
163883906 16.4 12.92 22.30 bsc 0e2
164960746 16.5 15.18 34.07 bsc 0e1
165206106 16.5 15.06 37.90 TurboRC 20e8
167071950 16.7 16.54 42.30 bsc 0e0
169984250 17.0 11.27 9.89 bzip3
253977891 25.4 19.90 46.46 bzip2

- test1.txt - 1GB ZH (chineese) Wikipedia from GDCC2021

C Size ratio% C MB/s D MB/s Name
234873322 23.5 11.82 18.08 bsc 0e2
235628874 23.6 14.88 35.27 TurboRC 20e8
236077752 23.6 13.79 28.76 bsc 0e1
236200554 23.6 8.94 17.03 TurboRC 20e9
239463880 23.9 15.37 37.21 bsc 0e3
245841481 24.6 8.85 8.51 bzip3
254748922 25.5 15.69 39.18 TurboRC 20e6
257419802 25.7 18.03 41.98 TurboRC 20e5
359610522 36.0 21.51 42.38 bzip2

- Text log file:NASA access log 200MB

C Size ratio% C MB/s D MB/s Name
9082122 4.4 38.69 106.37 TurboRC 20e8m32
9138588 4.5 14.52 13.66 bzip3
9438810 4.6 19.41 43.66 bsc 0e2
9503102 4.6 20.74 53.42 bsc 0e1
9529266 4.6 20.24 65.88 TurboRC 20e8
9639310 4.7 20.81 38.94 TurboRC 20e9m32
9647322 4.7 21.45 58.95 bsc 0e3
9710206 4.7 8.73 18.14 TurboRC 20e9
9812018 4.8 20.35 66.12 TurboRC 20e6
9817630 4.8 20.87 68.17 TurboRC 20e5
11960479 5.8 15.71 83.16 bzip2

File Compression

Range Coder

    ./turborc -1 inputfile outputfile         "order 0 simple
    ./turborc -2 inputfile outputfile         "order 1 simple

Range Coder + RLE

    ./turborc -1 inputfile outputfile         "order 0 simple
    ./turborc -2 inputfile outputfile         "order 1 simple
    ./turborc -d inputfile outputfile          "decompress

BWT (Burrows-Wheeler) + QLFC (Quantized Local Frequency Coding) + TurboRC

    ./turborc -20e# inputfile outputfile -l# [-Os]  "bwt compression 
	           #:0:store, 2:bit ec, 3/4:RLE, 5/6:RLE o1, 7/8:QLFC, 9:Max
    ./turborc -d inputfile outputfile             "decompress

Compile:

    Download or clone TurboRC
	git clone --recursive https://github.com/powturbo/Turbo-Range-Coder.git
	cd Turbo-Range-Coder
Linux, MacOS, Windows (MingW), Clang,... (see also makefile)
	make
or
	make AVX2=1                                "compile for recent architectures >= haswell
Windows visual c++
	nmake /f makefile.vs
Windows visual studio c++
	project files in directory vs/vs2022

Function usage:

See examples in "turborc.c"

Environment:

OS/Compiler (32 + 64 bits):
  • Windows: Visual C++ (2022)
  • Windows: MinGW-w64 makefile
  • Linux amd/intel: GNU GCC (>=4.6)
  • Linux amd/intel: Clang (>=3.2)
  • Linux arm: aarch64
  • MaxOS: XCode (>=9) + Apple M1
  • PowerPC ppc64le
  • IBM s390x

References:

Last update: 06 AUG 2023

turbo-range-coder's People

Contributors

powturbo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

turbo-range-coder's Issues

Level 25 fails decompression on enwik8

./turborc -25 /tmp/enwik8 /tmp/enwik8.rc
ll /tmp/enwik8.rc
-rw-rw-r-- 1 fred fred 58290080 Mar 21 13:03 /tmp/enwik8.rc
rm /tmp/enwik8.rc.bak
./turborc -d /tmp/enwik8.rc /tmp/enwik8.rc.bak
ll /tmp/enwik8.rc.bak
-rw-rw-r-- 1 fred fred 100000000 Mar 21 13:04 /tmp/enwik8.rc.bak
md5sum /tmp/enwik8.rc.bak /tmp/enwik8
0f86d7c5a6180cf9584c1d21144d85b0 /tmp/enwik8.rc.bak
a1fa5ffddb56f4953e226637dabbb36a /tmp/enwik8

Benchmark: Turbo-Range-Coder - i9-13900KS, DDR5-7800MHz

File enwik8 (uniform distribution)

turborc -e0 enwik8

      size   ratio     E MB/s   D MB/s function prdid='s(5)'
  61250092  61.25%      95.08    75.61  1:rc        o0                                  
  45924756  45.92%      97.13    76.44  2:rcc       o1                                  
  36861388  36.86%      87.43    65.74  3:rcc2      o2                                  
  44876688  44.88%      88.30    68.08  4:rcx       o8b =o1 context slide               
  36829776  36.83%      74.57    58.56  5:rcx2      o16b=o2 context slide               
  45655424  45.66%      42.86    30.10  9:rcms      o1 mixer/sse                        
  35905340  35.91%      33.53    26.71 10:rcm2      o2 mixer/sse                        
  49386012  49.39%      36.93    26.86 11:rcmr      o2 8b mixer/sse run                 
  49813368  49.81%      35.81    25.64 12:rcmrr     o2 8b mixer/sse run > 2             
  61488516  61.49%      90.75    73.53 13:rcrle     RLE o0                              
  45280544  45.28%      90.07    70.55 14:rcrle1    RLE o1                              
  61542888  61.54%      80.60    66.28 17:rcu3      varint8 3/5/8 bits                  
  57813600  57.81%      43.44    44.13 18:rcqlfc    QLFC                                
  65292242  65.29%      53.71    54.82 18:bec       Bit EC                              
  21009342  21.01%      29.78    65.58 20:bwt                                           
  71659272  71.66%      65.51    56.15 26:rcg-8     gamma                               
  84138152  84.14%      66.53    52.24 27:rcgz-8    gamma zigzag                        
  74083592  74.08%      83.35    62.92 28:rcr-8     rice                                
  86625880  86.63%      75.46    48.54 29:rcrz-8    rice zigzag                         
  63541692  63.54%     604.88    77.71 42:cdfsb     static/decode search                
  63541692  63.54%     605.85    82.23 43:cdfsv     static/decode division              
  63948046  63.95%     468.79    72.88 44:cdfsm     static/decode division lut          
  63541700  63.54%     711.95    74.37 45:cdfsb     static interlv/dec. search          
  61250312  61.25%     254.42    89.73 46:cdf       byte   adaptive                     
  61250320  61.25%     248.56    89.34 47:cdfi      byte   adaptive interleaved         
  69876372  69.88%     183.35    70.31 48:cdf-8     vnibble                             
  69876376  69.88%     179.62    71.14 49:cdfi-8    vnibble interleaved                 
  61342502  61.34%     171.38   253.53 56:ans       rANS interleaved                                      
 100000000 100.00%   22959.30 23141.54 59:memcpy                                        

Does not compile on MacOSX

Hi, this does not compile on MacOSX for me.

OSX: 13.2 (22D49)
Apple clang version 14.0.3 (clang-1403.0.22.14.1)
Target: arm64-apple-darwin22.3.0
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

I have tried to fix it, but some issues remain.

If you like, I can make a fork and pull request for the fork? it contains some fixes I made to let it compile better. Just some missing headers. For example, it could not find malloc. But even after those fixes, it has issues I could not fix. Here below:

turborc.c:390:1: error: call to undeclared function 'rcrzsenc8'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
RCGEN2(rcrz, 8)
^
turborc.c:339:31: note: expanded from macro 'RCGEN2'
case RC_PRD_S : return p##senc##s( in, inlen, out);
^
:246:1: note: expanded from here
rcrzsenc8
^
turborc.c:390:1: error: call to undeclared function 'rcrzssenc8'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
turborc.c:340:31: note: expanded from macro 'RCGEN2'
case RC_PRD_SS: return p##ssenc##s(in, inlen, out, prm1,prm2);
^
:248:1: note: expanded from here
rcrzssenc8
^
turborc.c:497:70: error: call to undeclared function 'rccdfsmenc'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
case 44: TM("44:cdfsm static/decode division lut ",l=rccdfsmenc(in,n,out,cdf,m+1),n,l, CCPY:(m<16?rccdfsmldec(out,n,cpy, cdf, m+1):rccdfsmbdec(out,n,cpy, cdf, m+1))); break;

Issues compiling for arm64 macOS

I'm trying to compile it for my machine and having some issues:

  • Some per-source file flags in the makefile seem tied to Intel:
 $(L)anscdfs.o: $(L)anscdf.c $(L)anscdf_.h
-       $(CC) -c -O3 $(CFLAGS) -march=corei7-avx -mtune=corei7-avx -mno-aes -falign-loops=32 $(L)anscdf.c -o $(L)anscdfs.o  
+       $(CC) -c -O3 $(CFLAGS) -mno-aes -falign-loops=32 $(L)anscdf.c -o $(L)anscdfs.o  
 
 $(L)anscdfx.o: $(L)anscdf.c $(L)anscdf_.h
-       $(CC) -c -O3 $(CFLAGS) -march=haswell -falign-loops=32 $(L)anscdf.c -o $(L)anscdfx.o 
+       $(CC) -c -O3 $(CFLAGS) -falign-loops=32 $(L)anscdf.c -o $(L)anscdfx.o 
  • Architecture detection seems to look for aarch64 but not arm64, which is what the machine returns for uname -s
  • Frame pointers shouldn't be omitted for any macOS binaries
  • There are many system headers that needed to be included to different files, or else add CFLAGS+=-Wno-implicit-function-declaration
  • Issue with sse_neon.h:
./include_/sse_neon.h:232:85: error: invalid conversion between vector type 'uint64x2_t' (vector of 2 'uint64_t' values) and 'uint8x8_t' (vector of 8 'uint8_t' values) of different size
static ALWAYS_INLINE uint64_t  mm_movemask4_epu8(__m128i v) { return vgetq_lane_u64((uint64x2_t)vshrn_n_u16((uint8x16_t)v, 4), 0); } //uint8x16_t
  • Issue of using __m128i for arm64:
./include_/bitutil_.h:128:77: error: returning 'int' from a function with incompatible result type 'uint32x4_t' (vector of 4 'uint32_t' values)
static ALWAYS_INLINE __m128i mm_delta_epi64(__m128i v, __m128i sv) { return _mm_sub_epi64(v, _mm_alignr_epi8(v, sv,  8)); }

e.g. here:

  #if defined(__SSSE3__) || defined(__ARM_NEON)
#define mm_srai_epi64_63(_v_, _s_) _mm_srai_epi32(_mm_shuffle_epi32(_v_, _MM_SHUFFLE(3, 3, 1, 1)), 31)
  
static ALWAYS_INLINE __m128i mm_zzage_epi16(__m128i v) { return _mm_xor_si128( mm_slli_epi16(v,1),  mm_srai_epi16(   v,15)); }

Those last couple issues are the main ones preventing me from compiling it, the other ones were easy to patch

Two iota

TRC is lovely.

  1. these encoders can write past end of buffer. Is there a maximum over-run?

  2. This seems to fix over runs:
    #define OVERFLOW(in, inlen) if(op >= out+inlen-16) { memcpy(out, in, inlen); return inlen; }

  3. It seems cleaner to specify the outlen as a parameter (inlen for decompress).
    outlen might be reduced externally for application reasons and internally for maximum overrun.
    Omitting the memcpy()
    #define OVERFLOW(in, inlen) if(op >= outbound)return 0;

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.