Coder Social home page Coder Social logo

fglazov / python-ranscoder Goto Github PK

View Code? Open in Web Editor NEW
17.0 2.0 3.0 38 KB

A lightweight rANSCoder meant for rapid prototyping.

License: Creative Commons Zero v1.0 Universal

Python 100.00%
data-compression asymmetric-numeral-systems python ans entropy-coding compression compression-library

python-ranscoder's Introduction

Introduction

This is a lightweight, pure python version of a rANSCoder. A rANSCoder is an entropy coder, which was first described in Jarek Duda's paper (http://arxiv.org/abs/1311.2540). This variant of rANS works on 64 bits and emits 32-bit words at the same time. It is a port of Fabien Giesen 64 bit rANSCoder to python. It should be compatible with the functions rANS64XXXXXX in Giesen's project - i.e. you could theoretically encode or decode with this project, and do the other step via Giesen's code.

Installation

This repository is available via pip. You can install it as follows.

pip install py_rans

Purpose

The focus of this repository is not performance, but rather rapid prototyping. This entropy coder will provide compresion performance which is close to entropy at a reasonable speed. If you want the same compression performance and higher compression speed you should look at a C/C++ library fufilling those needs - I can recommend either Giesen's or Bonfield's implementation of rANS. The following are the Github links to their two projects:

API by simple example

The API is deliberately kept very simple, with a single way to use the rANSCoder. The interface consists of two classes "Encoder" and Decoder".

To encode something, you would do the following:

import rans.rANSCoder as rANS
import random
import numpy as np

encoder = rANS.Encoder()
probs = np.array([0.1, 0.2, 0.3, 0.05, 0.05, 0.15, 0.05, 0.1], dtype=np.float32)
data = np.array([random.randint(0,7) for i in range(10000)], dtype=np.int32)
# Use numpy arrays for better performance! Numba works better with numpy arrays than vanilla python arrays.

for symbol in data:
    encoder.encode_symbol(probs, symbol)
encoded = encoder.get_encoded()
print("The number of bytes needed is: " + str(len(encoded) * 4))

# Encoded is a list of 32-bit words.
# Do things afterwards with encoded - e.g. save it into a file or send it somewhere.

And later, to decode:

import rans.rANSCoder as rANS

# Given: length_decoded, encoded, probs
# Assuming the previous code snippet was how it was encoded, then:
# length_decoded = len(data) 
# encoded = encoded
# probs = probs

decoder = rANS.Decoder(encoded)
decoded_data = []
for _ in range(length_decoded):
    decoded_data.append(decoder.decode_symbol(probs))
    
# decoded_data is the original data, in REVERSE ORDER.

If you run the two code snippets in order, then you will need to call decoded_data.reverse() before running the check data == decoded_data.

API more formally

This python package consists of two classes, Encoder and Decoder. There are also a few helper functions outside of the two classes - but they are not meant to be used as part of the public API - this package is meant to be interfaced with solely through the two classes. All of the code is wrapped in numba wrappers, which compiles the code just in time before use. The first call to any method will take longer than any calls following it.

The Encoder is responsible for taking the uncompressed data and compressing it. It is an entropy coder, so it expects data and a corresponding model (i.e. probability distribution which says how frequent a given symbol is). The Encoder's constructor takes no arguments, so you may simply construct it via encoder = rANS.Encoder().

There are two methods:

Encoder.encode_symbol(probs, symb)

This method takes a symbol and adds it to the state inside of the Encoder which represents the currently compressed data. In order to compress it, the encoder requires a probability distribution which says how likely a given symb will occur. So, for example, given probs = [0.15, 0.8, 0.05], the symbol 0 occurs with 15% probability, 1 occurs with 80% probability, and 2 with 5% probability. So note that symb is a number with the bounds 0 โ‰ค symb < len(probs).

For this method to work properly, sum(probs) should be close to 1 - i.e. you need to pass a proper probability distribution. Note that you may use different probs for each succesive symbol you encode in an Encoder. This is allowed and intentional, as your model might change while you're encoding.

Encoder.get_encoded()

This method flushes the state in the encoder, and returns an array of 32-bit words which represents the compressed data. Note that you should not use the Encoder after calling this method - if you want to encode more data later on, you should then create a new Encoder.

The Decoder is responsible for taking the compressed data generated by an Encoder, and transforming it back into the original data. Note that the data is output in reversed order. The constructor expects the encoded data as input, i.e. you may construct the Decoder as decoder = rANS.Decoder(encoded_data).

The Decoder only has one method:

Decoder.decode_symbol(probs)

This method takes the probability distribution that the corresponding symbol was encoded with, and retrieves this symbol. The next time this method is called, the next symbol is decoded. Notice that these symbols are decoded in reverse order - i.e. this coder follows a FILO principle - the last symbol encoded is the first symbol decoded.

Further note that the Decoder does not know how many symbols are left - you will either need to have messages/data of fixed length, or you will have to store the length with the compressed data in order to later be able to decompress it.

python-ranscoder's People

Contributors

fglazov avatar mlaugharn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

python-ranscoder's Issues

Cannot install for py3.11

Hi, could you please update the numba requirements? The version in setup.py is not available for 3.11, a newer version is.

Thanks, great project!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.