Coder Social home page Coder Social logo

bamler-lab / constriction Goto Github PK

View Code? Open in Web Editor NEW
76.0 1.0 5.0 2.38 MB

Entropy coders for research and production in Python and Rust.

Home Page: https://bamler-lab.github.io/constriction/

License: Apache License 2.0

Rust 87.66% Python 11.96% Handlebars 0.38%
compression-methods entropy-coding data-science machine-learning entropy-models compression python-api rust compression-codecs

constriction's Issues

Different behavior on amd64 and arm

Hi, I'm trying to use this package on arm devices, but I found inconsistent behavior on arm and amd64 platforms, what's the reason for this?

This is the test code

import constriction
import numpy as np

message = np.array([10, 10], dtype=np.int32)

entropy_model = constriction.stream.model.QuantizedLaplace(
            -20, 20 + 1
        )

encoder = constriction.stream.queue.RangeEncoder()
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0.]))

compressed = encoder.get_compressed()
print(f"compressed representation: {compressed}")
print(f"(in binary: {[bin(word) for word in compressed]})")

decoder = constriction.stream.queue.RangeDecoder(compressed)
decoded = decoder.decode(entropy_model, np.array([10., 10.]), np.array([10.,  0.])) 
assert np.all(decoded == message) # (verifies correctness)

On amd64

compressed representation: [2042752375]
(in binary: ['0b1111001110000011110110101110111'])

On arm

thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0

The full traceback is

thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
stack backtrace:
   0:       0x7f9d20a7ac - std::backtrace_rs::backtrace::libunwind::trace::heaab0e590535aeb3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
   1:       0x7f9d20a7ac - std::backtrace_rs::backtrace::trace_unsynchronized::h89cc7ae9ebb707d7
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:       0x7f9d20a7ac - std::sys_common::backtrace::_print_fmt::h08c31be18fedf422
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:67:5
   3:       0x7f9d20a7ac - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc38bcf44d9e857e3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:44:22
   4:       0x7f9d226b04 - core::fmt::rt::Argument::fmt::ha5b752f9cd7ef4a3
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/rt.rs:138:9
   5:       0x7f9d226b04 - core::fmt::write::h9fac187ae7486f3c
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/mod.rs:1094:21
   6:       0x7f9d208358 - std::io::Write::write_fmt::h239e9fb6296b3a7f
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:1714:15
   7:       0x7f9d20a5e0 - std::sys_common::backtrace::_print::h52f67cfa8753b0ab
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:47:5
   8:       0x7f9d20a5e0 - std::sys_common::backtrace::print::hdea7481e2c957a93
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:34:9
   9:       0x7f9d20b958 - std::panicking::default_hook::{{closure}}::h7c36fa733369c49e
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:270:22
  10:       0x7f9d20b680 - std::panicking::default_hook::h303eee75f9a8f6a8
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:290:9
  11:       0x7f9d20bf1c - std::panicking::rust_panic_with_hook::h270c94381ec34744
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:707:13
  12:       0x7f9d20bda4 - std::panicking::begin_panic_handler::{{closure}}::h3653e3502bcc1625
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:597:13
  13:       0x7f9d20ac90 - std::sys_common::backtrace::__rust_end_short_backtrace::h6b8510f2f024eeeb
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:170:18
  14:       0x7f9d20bb34 - rust_begin_unwind
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
  15:       0x7f9d0fe4a4 - core::panicking::panic_fmt::ha96945d7a1b20293
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
  16:       0x7f9d0fe514 - core::panicking::panic::h8f06a2df29fa4962
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:117:5
  17:       0x7f9d11f308 - probability::distribution::laplace::Laplace::new::haefeb509845cc871
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9
  18:       0x7f9d119a44 - constriction::pybindings::stream::model::QuantizedLaplace::new::{{closure}}::h5e7236efc8092f4a
                               at /home/user/repos/constriction/src/pybindings/stream/model.rs:544:44
  19:       0x7f9d161b7c - <constriction::pybindings::stream::model::internals::ParameterizableModel<(P0,P1),M,F> as constriction::pybindings::stream::model::internals::Model>::parameterize::h135fd1ad8427458f
                               at /home/user/repos/constriction/src/pybindings/stream/model/internals.rs:249:35
  20:       0x7f9d17476c - constriction::pybindings::stream::queue::RangeEncoder::encode::h7a9429b10f1006e7
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:314:13
  21:       0x7f9d175934 - constriction::pybindings::stream::queue::_::<impl constriction::pybindings::stream::queue::RangeEncoder>::__pymethod_encode__::hb8e05b89a123d73c
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
  22:       0x7f9d0ff204 - pyo3::impl_::trampoline::fastcall_with_keywords::{{closure}}::he0112aa3b1e49083
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:29
  23:       0x7f9d0ff10c - pyo3::impl_::trampoline::trampoline::{{closure}}::h9f46c71dbb4a44d4
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:54
  24:       0x7f9d15b494 - std::panicking::try::do_call::hd5234df960a8f95d
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40
  25:       0x7f9d15b720 - __rust_try
  26:       0x7f9d15b268 - std::panicking::try::haeac5d692bf46066
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19
  27:       0x7f9d115570 - std::panic::catch_unwind::h2b95655acc141454
                               at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14
  28:       0x7f9d0fed68 - pyo3::impl_::trampoline::trampoline::h82d7e8a0b17fbbdf
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:9
  29:       0x7f9d140a94 - pyo3::impl_::trampoline::fastcall_with_keywords::h45c8d979f3ca1f26
                               at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:13
  30:       0x7f9d174a48 - constriction::pybindings::stream::queue::_::<impl pyo3::impl_::pyclass::PyMethods<constriction::pybindings::stream::queue::RangeEncoder> for pyo3::impl_::pyclass::PyClassImplCollector<constriction::pybindings::stream::queue::RangeEncoder>>::py_methods::ITEMS::trampoline::h905700428ec8b158
                               at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
  31:           0x488db8 - _PyMethodDescr_FastCallKeywords
  32:           0x512714 - _PyEval_EvalFrameDefault
  33:           0x50b47c - _PyEval_EvalCodeWithName
  34:           0x50dd70 - PyEval_EvalCode
  35:           0x616f84 - <unknown>
  36:           0x617070 - PyRun_FileExFlags
  37:           0x61844c - PyRun_SimpleFileExFlags
  38:           0x6494c8 - <unknown>
  39:           0x649b00 - _Py_UnixMain
  40:       0x7f9db946e0 - __libc_start_main
                               at /build/glibc-4fr630/glibc-2.27/csu/../csu/libc-start.c:310
  41:           0x5aef1c - <unknown>
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
    from apport.fileutils import likely_packaged, get_recent_crashes
  File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
    from apport.report import Report
  File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
    import apport.fileutils
  File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
    from apport.packaging_impl import impl as packaging
  File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
    import apt
  File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
    import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'

Original exception was:
Traceback (most recent call last):
  File "test_script.py", line 13, in <module>
    encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0

Thanks!

TANS with separate probability

Hello,

In my use case, I have a Vec of String in a structure that I want to compress.
But I need to keep O(1) access to the element in the Vec, so I was thinking about using TANS and storing my probability table on the side:
Before:

struct Index {
  doc: Vec<String>,
}

After:

struct Index {
  doc: Vec<Vec<u8>>,
  // The same probabilities are used to encode/decode every string of the documents
  prob: [u8; 256],
}

Is this library supposed to support this?
From what I’ve seen, it seems like we need to provide the probabilities for the symbol we're currently compressing.

Encoding int8/unit8 streams of symbols

Hello, thanks for releasing such a complete and well-documented library!

I have a doubt about its application on integer streams. Why does the library force you to cast arrays to numpy.int32? In my use case, in which I have to compress neural network weights, I find a general-purpose compression algorithm such as brotli orders of magnitude more efficient than both the Range and ANS coders, and I believe that representing symbols as 32-bit integers may be part of the issue.

assertions not supported for parameter cdf in constriction.stream.model.CustomModel

I tried to implement a custom entropy model that is only defined on positive numbers.
To make sure only positive numbers are encoded I used an assert statement to check for non-negativity.
To make my point I use a lognormal distribution in the following snippet.

def cdf(x, mu, sigma):
     assert x >= 0
     return stats.lognorm.cdf(x, mu, sigma)

def inverse_cdf(q, mu, sigma):
     return stats.lognorm.ppf(q, mu, sigma)

mus = np.random.randn(100)
sigmas = np.random.randn(100)**2 +1
dummy_entropy_model = constriction.stream.model.CustomModel(cdf ,inverse_cdf, -10, 10)

message = (np.random.randn(100)**2).round().astype(np.int32)

coder = constriction.stream.stack.AnsCoder()
coder.encode_reverse(message, dummy_entropy_model, mus, sigmas)
decode = coder.decode(dummy_entropy_model, mus, sigmas)

thread '<unnamed>' panicked at 'TODO: PyErr { type: <class 'AssertionError'>, value: AssertionError(), traceback: Some(<traceback object at 0x7f22f0efac80>) }', src/pybindings/stream/model/internals.rs:380:14

PanicException: TODO: PyErr { type: , value: AssertionError(), traceback: Some() }

This problem is of course solved by deleting assertions from cdf.

A more permanent solution might be a change to the documentation or the error message.

Vectorize cdf query of custom model

In my usecase I want to compress a large amount of data with a custom entropy model.
Unfortunately this takes quite some time since for each compressed symbol the cdf is called.
I can't straight up use the scipy model adapter since I'm using a mixture distribution which is not implemented in scipy.

Here's my dummy code:

from scipy import stats
import constriction
import numpy as np

c = 0
def cdf_likelihood_normal(x, mu, sigma):
    global c
    c += 1
    print(c, end="\r")
    p =  stats.norm.cdf(x, loc=mu, scale=sigma )
    return p

def inverse_cdf_likelihood_normal(q, mu, sigma):
    x = stats.norm.ppf(q, loc=mu, scale = sigma)
    return x

coder = constriction.stream.stack.AnsCoder()
entropy_model = constriction.stream.model.CustomModel(cdf_likelihood_normal, inverse_cdf_likelihood_normal, -10, 10)


sigma =  np.ones(int(1e4))
mu    = np.zeros(int(1e4))
message = np.random.randint(-1,1,int(1e4),dtype=np.int32)

p = stats.norm.cdf(message, loc=mu, scale=sigma) # very fast

coder.encode_reverse(message, entropy_model,  mu, sigma) # very slow
print(coder.num_bits())

reconstruction = coder.decode(entropy_model, mu,sigma)

assert (message == reconstruction).all()

Is it possible to take care of vectorizable cdfs in the custom model adapter to allow for a speed up?

Output Dict/Struct of Huffman Symbol Codes

It would be very good for me (neuroinformatics research) if we could output the binary symbol codes from the Huffman tree.

I've been working in the python library but can write rust code. If this feature exists in the rust code I'm happy to write the python binding. If the feature isn't in rust code either, I'd be happy to write it and add the python binding also.

Let me know if this would be good, and maybe me a little direction about how you'd like it if so!

Creating a categorical distribution sometimes fails to converge

The following python code

import constriction
import numpy as np
model = constriction.stream.model.Categorical(np.array([0.15, 0.69, 0.15]))

gets stuck in an infinite loop. This is probably caused by a bug in the function optimize_leaky_categorical.

Fixing this might turn out to be a breaking change, so it might require a major version bump.

Additional example:

p = np.array([1.34673042e-04, 6.52306480e-04, 3.14999325e-03, 1.49921896e-02, 6.67127371e-02, 2.26679876e-01, 3.75356406e-01, 2.26679876e-01, 6.67127594e-02, 1.49922138e-02, 3.14990873e-03, 6.52299321e-04, 1.34715927e-04])
constriction.stream.model.Categorical(p/p.sum())

Many thanks to Grégoire Jauvion for reporting this issue by email.

List `LICENSE.html` in `RECORD` file of python wheels

As reported in #40 (comment), the license file should be listed in the RECORD file of python wheels.

Warning: Validation of the RECORD file of constriction-0.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl failed. Please report to the maintainers of that package so they can fix their build process. Details:
In /home/uken/.cache/pypoetry/artifacts/bf/7f/3e/dafe24f40a54f14858cdb770e83c4fc076d2a8e86141bb380a25393718/constriction-0.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, LICENSE.html is not mentioned in RECORD

Relax minimal numpy version, if possible

I am currently working on maintaining a legacy codebase which for valid reasons1 requires numpy (>=1.16.0,<1.19.0). I was about to introduce constriction as a mean to significantly reduce memory consumption by some base models, only to find out that I cannot install it due to version conflict.

Is numpy>=1.19.0 a hard requirement?

Footnotes

  1. Old versions of some scientific libraries require old version of numpy. It is impossible to upgrade those libraries because newer versions have introduced incompatible changes to serialized models. We have to support existing models and cannot recreate them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.