Coder Social home page Coder Social logo

arthas1121 / asymmetricnumeralsystemstoolkit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jarekduda/asymmetricnumeralsystemstoolkit

0.0 1.0 0.0 28 KB

Testing various methods for choosing tANS entropy coding automata

License: GNU General Public License v2.0

C++ 100.00%

asymmetricnumeralsystemstoolkit's Introduction

AsymmetricNumeralSystemsToolkit

Testing various methods for choosing tANS entropy coding automata

ANS is new approach to entropy coding, which adds fractional bits into consideration into Huffman-like decoder, combining its speed with accuracy of arithmetic coding, like in implementation of Yann Collet. Another advantage in comparison to Huffman coding is that we choose the size of coding tables (L) here, corresponding to 2^depth of Huffman tree, and that there is no need to sort symbol probabilities (linear initialization).

The choice of such finite L state entropy coding automaton consists of 2-3 parts (which generally can be merged):

  • quantization of symbol probability distribution as p[s] ~ q[s]/L fractions (q is a natural number)
  • spreading these symbols in range [0, L-1], such that symbol s appears q[s] times
  • eventual scrambler to simultaneously encrypt encoded message: disturb the found spread in a pseudoranom way accordingly to cryptographic key.

This toolkit contains various choices of these functions, allows to test obtained compression rates, compare with Huffman. Currently it allows to choose betwen:

  1. Symbol probability distributions:
  • init_binary(p, n): n binary variables (2^n alphabet size)
  • init_power(rho,m): p[i] is proportional to rho^i
  • init_rand_unif(m): simple random distribution;
  1. quantizer (L=2^R):
  • quantize_fast(R): first q[s] = round(p[s]*L), then modify q for the most probable symbol to fulfill nomalization (linear complexity),
  • quantize_prec(R): find the quantization which minimizes sum_s (p[s] - q[s]/L)^2 / p[s] approximation of Kullback_Leibler distance (n log n complexity).
  1. symbol spread:
  • spread_range_i(), spread_range_d(), : Huffman coding can be seen as a special case of tANS: if spreading in size 2^k ranges - these spreads take it to general frequencies (in increasing or decreasing order),
  • spread_fast(): basic Yann Collet's random spread - very fast,
  • spread_prec(): very good using only q - wants to distributie symbols in 1/q[s] distance (still linear, but slower),
  • spread_tuned(): uses both q and p - wants to get close to 1/x stationary probability of states (still linear, best compression rate),
  • spread_tuned_s(): uses sort instead of buckets - can be slighlty better, but slower,
  • spread_tuned_p(): uses 1/i approximation of 1/(p*ln(1+1/i)) formula for tuned spread,
  • spread_uABS(): available arithmetic coding/decoding formulas for binary alphabet, e.g. used in Matt Mahoney's fpaqc.
  1. scramblers:
  • scrambler0(): for each 2i-1 and 2i positions, with some probability switch symbols - accordingly to PRNG initialized with cryptographic key (the current version assumes at most 256 size alphabet),
  • scrambler1(): takes blocks of four symbols and randomly rotate them cyclically (also 256 size alphabet limit) - faster and stronger disturbtion.

For example for 4 symbol and L=16 states: p=(0.04 0.16 0.16 0.64), q/L=(0.0625 0.1875 0.125 0.625).

methodsymbol spreaddH/H rate losscomment
- - ~0.011penalty of quantizer itself
Huffman 0011222233333333 ~0.080would give Huffman decoder
spread_range_i()0111223333333333~0.059 analogous to Huffman
spread_range_d()3333333333221110~0.022 decreasing order
spread_fast()0233233133133133~0.020 fast
spread_prec()3313233103332133~0.015generally close to quantization dH/H
spread_tuned()3233321333313310~0.0046better than quantization dH/H due to using also p
spread_tuned_s()2333312333313310~0.0040L log L complexity (sort)
spread_tuned_p()2331233330133331~0.0058testing 1/(p ln(1+1/i)) ~ i/p approximation

Tuning shifts symbols right when q[s]/L > p[s] and left otherwise, getting better agreement and so compression rate.

Some sources: article, slides, discussion, list of implementations of ANS.

Feel free to add new probability distributions, better quantizers and spreads.

Jarek Duda, July 2014

asymmetricnumeralsystemstoolkit's People

Contributors

aconverse avatar jarekduda avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.