Coder Social home page Coder Social logo

bamler-lab / constriction Goto Github PK

View Code? Open in Web Editor NEW
71.0 1.0 4.0 2.36 MB

Entropy coders for research and production in Python and Rust.

Home Page: https://bamler-lab.github.io/constriction/

License: Apache License 2.0

Rust 87.61% Python 12.01% Handlebars 0.38%
compression-methods entropy-coding data-science machine-learning entropy-models compression python-api rust compression-codecs

constriction's Introduction

Entropy Coders for Research and Production

test status

The constriction library provides a set of composable entropy coding algorithms with a focus on correctness, versatility, ease of use, compression performance, and computational efficiency. The goals of constriction are three-fold:

  1. to facilitate research on novel lossless and lossy compression methods by providing a composable set of primitives (e.g., you can can easily switch out a Range Coder for an ANS coder without having to find a new library or change how you represent exactly invertible entropy models);
  2. to simplify the transition from research code to deployed software by providing similar APIs and binary compatible entropy coders for both Python (for rapid prototyping on research code) and Rust (for turning successful prototypes into standalone binaries, libraries, or WebAssembly modules); and
  3. to serve as a teaching resource by providing a variety of entropy coding primitives within a single consistent framework. Check out our additional teaching material from a university course on data compression, which contains some problem sets where you use constriction (with solutions).

More Information: project website

Live demo: here's a web app that started out as a machine-learning research project in Python and was later turned into a web app by using constriction in a WebAssembly module.

Project Status

We currently provide implementations of the following entropy coding algorithms (see also benchmarks below):

  • Asymmetric Numeral Systems (ANS): a fast modern entropy coder with near-optimal compression effectiveness that supports advanced use cases like bits-back coding.
  • Range Coding: a computationally efficient variant of Arithmetic Coding that has essentially the same compression effectiveness as ANS Coding but operates as a queue ("first in first out"), which makes it preferable for autoregressive models.
  • Chain Coding: an experimental new entropy coder that combines the (net) effectiveness of stream codes with the locality of symbol codes (for details, see Section 4.3 in this paper); it admits experimental new compression techniques that perform joint inference, quantization, and bits-back coding in an end-to-end optimization. This experimental coder is mainly provided to prove to ourselves that the API for encoding and decoding, which is shared across all stream coders, is flexible enough to express complex novel tasks.
  • Huffman Coding: a well-known symbol code, mainly provided here for teaching purpose; you'll usually want to use a stream code like ANS or Range Coding instead since symbol codes can have a considerable overhead on the bit rate, especially in the regime of low entropy per symbol, which is common in machine-learning based compression methods.

Further, constriction provides implementations of common probability distributions in fixed-point arithmetic, which can be used as entropy models in either of the above stream codes. The library also provides adapters for turning custom probability distributions into exactly invertible fixed-point arithmetic.

The provided implementations of entropy coding algorithms and probability distributions are continuously and extensively tested. We consider updates that can affect the encoder or decoder output in existing code as breaking changes that necessitate a bump in the leading nonzero number of the version string (this is a stronger guarantee than SemVer in that we apply it even to 0.y.z versions). Please file an issue if you find a bug, are missing a particular feature, or run into a scenario where the current APIs are confusing or unnecessarily limit what you can achieve with constriction.

Quick Start Guides And Examples in Python and Rust

Python

Install constriction for Python:

pip install constriction~=0.3.5

Then go ahead and encode and decode some data:

import constriction
import numpy as np

message = np.array([6, 10, -4, 2, 5, 2, 1, 0, 2], dtype=np.int32)

# Define an i.i.d. entropy model (see links below for more complex models):
entropy_model = constriction.stream.model.QuantizedGaussian(-50, 50, 3.2, 9.6)

# Let's use an ANS coder in this example (see links below for Range Coding examples).
encoder = constriction.stream.stack.AnsCoder()
encoder.encode_reverse(message, entropy_model)

compressed = encoder.get_compressed()
print(f"compressed representation: {compressed}")
print(f"(in binary: {[bin(word) for word in compressed]})")

decoder = constriction.stream.stack.AnsCoder(compressed)
decoded = decoder.decode(entropy_model, 9) # (decodes 9 symbols)
assert np.all(decoded == message) # (verifies correctness)

There's a lot more you can do with constriction's Python API. Please check out the Python API Documentation or our example jupyter notebooks.

Rust

Add this line to your Cargo.toml:

[dependencies]
constriction = "0.3.5"
probability = "0.17" # Not strictly required but used in many code examples.

If you compile in no_std mode then you have to deactivate constriction's default features (and you can't use the probability crate):

[dependencies]
constriction = {version = "0.3.5", default-features = false} # for `no_std` mode

Then go ahead and encode and decode some data:

use constriction::stream::{model::DefaultLeakyQuantizer, stack::DefaultAnsCoder, Decode};

// Let's use an ANS Coder in this example. Constriction also provides a Range
// Coder, a Huffman Coder, and an experimental new "Chain Coder".
let mut coder = DefaultAnsCoder::new();
 
// Define some data and a sequence of entropy models. We use quantized Gaussians here,
// but `constriction` also provides other models and allows you to implement your own.
let symbols = [23i32, -15, 78, 43, -69];
let quantizer = DefaultLeakyQuantizer::new(-100..=100);
let means = [35.2f64, -1.7, 30.1, 71.2, -75.1];
let stds = [10.1f64, 25.3, 23.8, 35.4, 3.9];
let models = means.iter().zip(&stds).map(
    |(&mean, &std)| quantizer.quantize(probability::distribution::Gaussian::new(mean, std))
);

// Encode symbols (in *reverse* order, because ANS Coding operates as a stack).
coder.encode_symbols_reverse(symbols.iter().zip(models.clone())).unwrap();

// Obtain temporary shared access to the compressed bit string. If you want ownership of the
// compressed bit string, call `.into_compressed()` instead of `.get_compressed()`.
println!("Encoded into {} bits: {:?}", coder.num_bits(), &*coder.get_compressed().unwrap());

// Decode the symbols and verify correctness.
let reconstructed = coder.decode_symbols(models).collect::<Result<Vec<_>, _>>().unwrap();
assert_eq!(reconstructed, symbols);

There's a lot more you can do with constriction's Rust API. Please check out the Rust API Documentation.

Benchmarks

The following table and diagrams show empirical bit rates and run-time performances of the two main entropy coders provided by constriction: Range Coding (RC) and Asymmetric Numeral Systems (ANS). We compare both to Arithmetic Coding (AC), as implemented in the arcode crate. The reported results are from experiments with data that came up in a real-world application. In each experiment, we compressed a message that consists of 3 million symbols, which we modeled as i.i.d. within each message. The messages span a wide range of entropy from about 0.001 to 10 bits per symbol. Reported run times for encoding and decoding were observed on an Intel Core i7-7500U CPU (2.70 GHz) using constrictions Rust API (runtimes of constriction's Python bindings in any real-world scenario will almost certainly be dwarfed by any additionally necessary python operations). More experimental details are explained in Section 5.2 of this paper, and in the benchmarking code.

Aggregated Benchmark Results

The table below shows bit rates and run times for each tested entropy coder, aggregated over all tested messages. For RC and ANS, the numbers in brackets after the entropy coder name denote advanced coder settings that are only exposed in constriction's Rust API (see documentation). The most relevant settings are the ones labeled as "default" (bold). These settings are the only ones exposed by constriction's Python API, and they are generally recommended for prototyping. The table reports bit rates as relative overhead over the information content. Thus, e.g., the 0.02 % overhead reported for Range Coding (RC) means that constriction's range coder compresses the entire benchmark data to a bit string that is 1.0002 times as long as the bit rate that a hypothetical optimal lossless compression code would achieve.

Entropy Coder (precision / word size / state size) bit rate overhead encoder / decoder runtime
ANS (24/32/64) ("default") 0.0015 % 24.2 / 6.1 ns/symbol
ANS (32/32/64) 0.0593 % 24.2 / 6.9 ns/symbol
ANS (16/16/32) 0.2402 % 19.8 / 6.4 ns/symbol
ANS (12/16/32) ("small") 3.9567 % 19.8 / 6.9 ns/symbol
RC (24/32/64) ("default") 0.0237 % 16.6 / 14.3 ns/symbol
RC (32/32/64) 1.6089 % 16.7 / 14.8 ns/symbol
RC (16/16/32) 3.4950 % 16.9 / 9.4 ns/symbol
RC (12/16/32) ("small") 4.5807 % 16.8 / 9.4 ns/symbol
Arithmetic Coding (AC; for comparison, using arcode crate) 0.0004 % 43.2 / 85.6 ns/symbol

We observe that the "default" ANS and RC coders, as well as the Arithmetic Coding baseline all essentially achieve the optimal bit rate (all with less than 0.1 % overhead). When choosing an entropy coder for a practical application, the decision should therefore typically not be based on the bit rate but rather on run time performance and ergonomics. Concerning run time, constriction's ANS and RC coders are both considerably faster than AC. When comparing ANS to RC, the main difference is in ergonomics: ANS operates as a stack ("last in first out"), which is good for bits-back coding with latent variable models, while RC operates as a queue ("first in first out"), which is good for autoregressive models.

Detailed Benchmark Results

The plots below break down each coder's performance as a function of the information content of the message that we compress. Each data point corresponds to a single message (consisting of 3 million symbols each), and the horizontal axis shows the information content of the message.

The most important results are again for entropy coders with the "default" settings (red plus signs), which are the ones that are also exposed in the Python API. Note the logarithmic scale on both axes.

Bit Rates

empirical bit rates

Run Times

empirical run times

Citing

I'd appreciate attribution if you use constriction in your scientific work. You can cite the following paper, which announces constriction (Section 5.1) and analyzes its compression performance and runtime efficiency (Section 5.2):

  • R. Bamler, Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective, arXiv preprint arXiv:2201.01741.

BibTex:

@article{bamler2022constriction,
  title   = {Understanding Entropy Coding With Asymmetric Numeral Systems (ANS): a Statistician's Perspective},
  author  = {Bamler, Robert},
  journal = {arXiv preprint arXiv:2201.01741},
  year    = {2022}
}

Compiling From Source

Users of constriction typically don't need to manually compile the library from source. Just install constriction via pip or cargo as described in the above quick start guides.

Contributors can compile constriction manually as follows:

  1. Prepare your system:
    • If you don't have a Rust toolchain, install one as described on https://rustup.rs
    • If you already have a Rust toolchain, make sure it's on version 1.51 or later. Run rustc --version to find out and rustup update stable if you need to update.
  2. git clone the repository and cd into it.
  3. To compile the Rust library:
    • compile in development mode and execute all tests: cargo test
    • compile in release mode (i.e., with optimizations) and run the benchmarks: cargo bench
  4. If you want to compile the Python module:
    • install poetry.
    • install Python dependencies: cd into the repository and run poetry install
    • build the Python module: poetry run maturin develop --features pybindings --release
    • run Python unit tests: poetry run pytest tests/python
    • start a Python REPL that sees the compiled Python module: poetry run ipython

Contributing

Pull requests and issue reports are welcome. Unless contributors explicitly state otherwise at the time of contributing, all contributions will be assumed to be licensed under either one of MIT license, Apache License Version 2.0, or Boost Software License Version 1.0, at the choice of each licensee.

There's no official guide for contributions since nobody reads those anyway. Just be nice to other people and act like a grown-up (i.e., it's OK to make mistakes as long as you strive for improvement and are open to consider respectfully phrased opinions of other people).

License

This work is licensed under the terms of the MIT license, Apache License Version 2.0, or Boost Software License Version 1.0. You can choose between these licenses if you use this work. See the files whose names start with LICENSE in this directory. The compiled python extension module is linked with a number of third party libraries. Binary distributions of the constriction python extension module contain a file LICENSE.html that includes all licenses of all dependencies (the file is also available online).

What's With the Name?

Constriction is a Rust library of compression primitives with bindings for Python. Pythons are a family of nonvenomous snakes that subdue their prey by "compressing" it, a method known as constriction.

constriction's People

Contributors

dependabot[bot] avatar robamler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

constriction's Issues

assertions not supported for parameter cdf in constriction.stream.model.CustomModel

I tried to implement a custom entropy model that is only defined on positive numbers.
To make sure only positive numbers are encoded I used an assert statement to check for non-negativity.
To make my point I use a lognormal distribution in the following snippet.

def cdf(x, mu, sigma):
     assert x >= 0
     return stats.lognorm.cdf(x, mu, sigma)

def inverse_cdf(q, mu, sigma):
     return stats.lognorm.ppf(q, mu, sigma)

mus = np.random.randn(100)
sigmas = np.random.randn(100)**2 +1
dummy_entropy_model = constriction.stream.model.CustomModel(cdf ,inverse_cdf, -10, 10)

message = (np.random.randn(100)**2).round().astype(np.int32)

coder = constriction.stream.stack.AnsCoder()
coder.encode_reverse(message, dummy_entropy_model, mus, sigmas)
decode = coder.decode(dummy_entropy_model, mus, sigmas)

thread '<unnamed>' panicked at 'TODO: PyErr { type: <class 'AssertionError'>, value: AssertionError(), traceback: Some(<traceback object at 0x7f22f0efac80>) }', src/pybindings/stream/model/internals.rs:380:14

PanicException: TODO: PyErr { type: , value: AssertionError(), traceback: Some() }

This problem is of course solved by deleting assertions from cdf.

A more permanent solution might be a change to the documentation or the error message.

Output Dict/Struct of Huffman Symbol Codes

It would be very good for me (neuroinformatics research) if we could output the binary symbol codes from the Huffman tree.

I've been working in the python library but can write rust code. If this feature exists in the rust code I'm happy to write the python binding. If the feature isn't in rust code either, I'd be happy to write it and add the python binding also.

Let me know if this would be good, and maybe me a little direction about how you'd like it if so!

Encoding int8/unit8 streams of symbols

Hello, thanks for releasing such a complete and well-documented library!

I have a doubt about its application on integer streams. Why does the library force you to cast arrays to numpy.int32? In my use case, in which I have to compress neural network weights, I find a general-purpose compression algorithm such as brotli orders of magnitude more efficient than both the Range and ANS coders, and I believe that representing symbols as 32-bit integers may be part of the issue.

Different behavior on amd64 and arm

Hi, I'm trying to use this package on arm devices, but I found inconsistent behavior on arm and amd64 platforms, what's the reason for this?

This is the test code

import constriction
import numpy as np

message = np.array([10, 10], dtype=np.int32)

entropy_model = constriction.stream.model.QuantizedLaplace(
            -20, 20 + 1
        )

encoder = constriction.stream.queue.RangeEncoder()
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0.]))

compressed = encoder.get_compressed()
print(f"compressed representation: {compressed}")
print(f"(in binary: {[bin(word) for word in compressed]})")

decoder = constriction.stream.queue.RangeDecoder(compressed)
decoded = decoder.decode(entropy_model, np.array([10., 10.]), np.array([10.,  0.])) 
assert np.all(decoded == message) # (verifies correctness)

On amd64

compressed representation: [2042752375]
(in binary: ['0b1111001110000011110110101110111'])

On arm

thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0

The full traceback is

thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
stack backtrace:
   0:       0x7f9d20a7ac - std::backtrace_rs::backtrace::libunwind::trace::heaab0e590535aeb3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:       0x7f9d20a7ac - std::backtrace_rs::backtrace::trace_unsynchronized::h89cc7ae9ebb707d7
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:       0x7f9d20a7ac - std::sys_common::backtrace::_print_fmt::h08c31be18fedf422
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:67:5
   3:       0x7f9d20a7ac - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc38bcf44d9e857e3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:44:22
   4:       0x7f9d226b04 - core::fmt::rt::Argument::fmt::ha5b752f9cd7ef4a3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/rt.rs:138:9
   5:       0x7f9d226b04 - core::fmt::write::h9fac187ae7486f3c
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/mod.rs:1094:21
   6:       0x7f9d208358 - std::io::Write::write_fmt::h239e9fb6296b3a7f
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:1714:15
   7:       0x7f9d20a5e0 - std::sys_common::backtrace::_print::h52f67cfa8753b0ab
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:47:5
   8:       0x7f9d20a5e0 - std::sys_common::backtrace::print::hdea7481e2c957a93
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:34:9
   9:       0x7f9d20b958 - std::panicking::default_hook::{{closure}}::h7c36fa733369c49e
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:270:22
  10:       0x7f9d20b680 - std::panicking::default_hook::h303eee75f9a8f6a8
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:290:9
  11:       0x7f9d20bf1c - std::panicking::rust_panic_with_hook::h270c94381ec34744
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:707:13
  12:       0x7f9d20bda4 - std::panicking::begin_panic_handler::{{closure}}::h3653e3502bcc1625
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:597:13
  13:       0x7f9d20ac90 - std::sys_common::backtrace::__rust_end_short_backtrace::h6b8510f2f024eeeb
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:170:18
  14:       0x7f9d20bb34 - rust_begin_unwind
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
  15:       0x7f9d0fe4a4 - core::panicking::panic_fmt::ha96945d7a1b20293
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
  16:       0x7f9d0fe514 - core::panicking::panic::h8f06a2df29fa4962
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:117:5
  17:       0x7f9d11f308 - probability::distribution::laplace::Laplace::new::haefeb509845cc871
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9
  18:       0x7f9d119a44 - constriction::pybindings::stream::model::QuantizedLaplace::new::{{closure}}::h5e7236efc8092f4a
                               at /home/user/repos/constriction/src/pybindings/stream/model.rs:544:44
  19:       0x7f9d161b7c - <constriction::pybindings::stream::model::internals::ParameterizableModel<(P0,P1),M,F> as constriction::pybindings::stream::model::internals::Model>::parameterize::h135fd1ad8427458f
                               at /home/user/repos/constriction/src/pybindings/stream/model/internals.rs:249:35
  20:       0x7f9d17476c - constriction::pybindings::stream::queue::RangeEncoder::encode::h7a9429b10f1006e7
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:314:13
  21:       0x7f9d175934 - constriction::pybindings::stream::queue::_::<impl constriction::pybindings::stream::queue::RangeEncoder>::__pymethod_encode__::hb8e05b89a123d73c
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
  22:       0x7f9d0ff204 - pyo3::impl_::trampoline::fastcall_with_keywords::{{closure}}::he0112aa3b1e49083
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:29
  23:       0x7f9d0ff10c - pyo3::impl_::trampoline::trampoline::{{closure}}::h9f46c71dbb4a44d4
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:54
  24:       0x7f9d15b494 - std::panicking::try::do_call::hd5234df960a8f95d
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40
  25:       0x7f9d15b720 - __rust_try
  26:       0x7f9d15b268 - std::panicking::try::haeac5d692bf46066
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19
  27:       0x7f9d115570 - std::panic::catch_unwind::h2b95655acc141454
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14
  28:       0x7f9d0fed68 - pyo3::impl_::trampoline::trampoline::h82d7e8a0b17fbbdf
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:9
  29:       0x7f9d140a94 - pyo3::impl_::trampoline::fastcall_with_keywords::h45c8d979f3ca1f26
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:13
  30:       0x7f9d174a48 - constriction::pybindings::stream::queue::_::<impl pyo3::impl_::pyclass::PyMethods<constriction::pybindings::stream::queue::RangeEncoder> for pyo3::impl_::pyclass::PyClassImplCollector<constriction::pybindings::stream::queue::RangeEncoder>>::py_methods::ITEMS::trampoline::h905700428ec8b158
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
  31:           0x488db8 - _PyMethodDescr_FastCallKeywords
  32:           0x512714 - _PyEval_EvalFrameDefault
  33:           0x50b47c - _PyEval_EvalCodeWithName
  34:           0x50dd70 - PyEval_EvalCode
  35:           0x616f84 - <unknown>
  36:           0x617070 - PyRun_FileExFlags
  37:           0x61844c - PyRun_SimpleFileExFlags
  38:           0x6494c8 - <unknown>
  39:           0x649b00 - _Py_UnixMain
  40:       0x7f9db946e0 - __libc_start_main
                               at /build/glibc-4fr630/glibc-2.27/csu/../csu/libc-start.c:310
  41:           0x5aef1c - <unknown>
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0

Thanks!

Relax minimal numpy version, if possible

I am currently working on maintaining a legacy codebase which for valid reasons1 requires numpy (>=1.16.0,<1.19.0). I was about to introduce constriction as a mean to significantly reduce memory consumption by some base models, only to find out that I cannot install it due to version conflict.

Is numpy>=1.19.0 a hard requirement?

Footnotes

  1. Old versions of some scientific libraries require old version of numpy. It is impossible to upgrade those libraries because newer versions have introduced incompatible changes to serialized models. We have to support existing models and cannot recreate them.

Vectorize cdf query of custom model

In my usecase I want to compress a large amount of data with a custom entropy model.
Unfortunately this takes quite some time since for each compressed symbol the cdf is called.
I can't straight up use the scipy model adapter since I'm using a mixture distribution which is not implemented in scipy.

Here's my dummy code:

from scipy import stats
import constriction
import numpy as np

c = 0
def cdf_likelihood_normal(x, mu, sigma):
    global c
    c += 1
    print(c, end="\r")
    p =  stats.norm.cdf(x, loc=mu, scale=sigma )
    return p

def inverse_cdf_likelihood_normal(q, mu, sigma):
    x = stats.norm.ppf(q, loc=mu, scale = sigma)
    return x

coder = constriction.stream.stack.AnsCoder()
entropy_model = constriction.stream.model.CustomModel(cdf_likelihood_normal, inverse_cdf_likelihood_normal, -10, 10)


sigma =  np.ones(int(1e4))
mu    = np.zeros(int(1e4))
message = np.random.randint(-1,1,int(1e4),dtype=np.int32)

p = stats.norm.cdf(message, loc=mu, scale=sigma) # very fast

coder.encode_reverse(message, entropy_model,  mu, sigma) # very slow
print(coder.num_bits())

reconstruction = coder.decode(entropy_model, mu,sigma)

assert (message == reconstruction).all()

Is it possible to take care of vectorizable cdfs in the custom model adapter to allow for a speed up?

Creating a categorical distribution sometimes fails to converge

The following python code

import constriction
import numpy as np
model = constriction.stream.model.Categorical(np.array([0.15, 0.69, 0.15]))

gets stuck in an infinite loop. This is probably caused by a bug in the function optimize_leaky_categorical.

Fixing this might turn out to be a breaking change, so it might require a major version bump.

Additional example:

p = np.array([1.34673042e-04, 6.52306480e-04, 3.14999325e-03, 1.49921896e-02, 6.67127371e-02, 2.26679876e-01, 3.75356406e-01, 2.26679876e-01, 6.67127594e-02, 1.49922138e-02, 3.14990873e-03, 6.52299321e-04, 1.34715927e-04])
constriction.stream.model.Categorical(p/p.sum())

Many thanks to Grégoire Jauvion for reporting this issue by email.

List `LICENSE.html` in `RECORD` file of python wheels

As reported in #40 (comment), the license file should be listed in the RECORD file of python wheels.

Warning: Validation of the RECORD file of constriction-0.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl failed. Please report to the maintainers of that package so they can fix their build process. Details:
In /home/uken/.cache/pypoetry/artifacts/bf/7f/3e/dafe24f40a54f14858cdb770e83c4fc076d2a8e86141bb380a25393718/constriction-0.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, LICENSE.html is not mentioned in RECORD

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.