bamler-lab / constriction Goto Github PK
View Code? Open in Web Editor NEWEntropy coders for research and production in Python and Rust.
Home Page: https://bamler-lab.github.io/constriction/
License: Apache License 2.0
Entropy coders for research and production in Python and Rust.
Home Page: https://bamler-lab.github.io/constriction/
License: Apache License 2.0
Hi, I'm trying to use this package on arm devices, but I found inconsistent behavior on arm and amd64 platforms, what's the reason for this?
This is the test code
import constriction
import numpy as np
message = np.array([10, 10], dtype=np.int32)
entropy_model = constriction.stream.model.QuantizedLaplace(
-20, 20 + 1
)
encoder = constriction.stream.queue.RangeEncoder()
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0.]))
compressed = encoder.get_compressed()
print(f"compressed representation: {compressed}")
print(f"(in binary: {[bin(word) for word in compressed]})")
decoder = constriction.stream.queue.RangeDecoder(compressed)
decoded = decoder.decode(entropy_model, np.array([10., 10.]), np.array([10., 0.]))
assert np.all(decoded == message) # (verifies correctness)
On amd64
compressed representation: [2042752375]
(in binary: ['0b1111001110000011110110101110111'])
On arm
thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
File "test_script.py", line 13, in <module>
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "test_script.py", line 13, in <module>
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
The full traceback is
thread '<unnamed>' panicked at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9:
assertion failed: b > 0.0
stack backtrace:
0: 0x7f9d20a7ac - std::backtrace_rs::backtrace::libunwind::trace::heaab0e590535aeb3
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/libunwind.rs:93:5
1: 0x7f9d20a7ac - std::backtrace_rs::backtrace::trace_unsynchronized::h89cc7ae9ebb707d7
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
2: 0x7f9d20a7ac - std::sys_common::backtrace::_print_fmt::h08c31be18fedf422
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:67:5
3: 0x7f9d20a7ac - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hc38bcf44d9e857e3
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:44:22
4: 0x7f9d226b04 - core::fmt::rt::Argument::fmt::ha5b752f9cd7ef4a3
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/rt.rs:138:9
5: 0x7f9d226b04 - core::fmt::write::h9fac187ae7486f3c
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/fmt/mod.rs:1094:21
6: 0x7f9d208358 - std::io::Write::write_fmt::h239e9fb6296b3a7f
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/io/mod.rs:1714:15
7: 0x7f9d20a5e0 - std::sys_common::backtrace::_print::h52f67cfa8753b0ab
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:47:5
8: 0x7f9d20a5e0 - std::sys_common::backtrace::print::hdea7481e2c957a93
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:34:9
9: 0x7f9d20b958 - std::panicking::default_hook::{{closure}}::h7c36fa733369c49e
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:270:22
10: 0x7f9d20b680 - std::panicking::default_hook::h303eee75f9a8f6a8
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:290:9
11: 0x7f9d20bf1c - std::panicking::rust_panic_with_hook::h270c94381ec34744
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:707:13
12: 0x7f9d20bda4 - std::panicking::begin_panic_handler::{{closure}}::h3653e3502bcc1625
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:597:13
13: 0x7f9d20ac90 - std::sys_common::backtrace::__rust_end_short_backtrace::h6b8510f2f024eeeb
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/sys_common/backtrace.rs:170:18
14: 0x7f9d20bb34 - rust_begin_unwind
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:595:5
15: 0x7f9d0fe4a4 - core::panicking::panic_fmt::ha96945d7a1b20293
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:67:14
16: 0x7f9d0fe514 - core::panicking::panic::h8f06a2df29fa4962
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/core/src/panicking.rs:117:5
17: 0x7f9d11f308 - probability::distribution::laplace::Laplace::new::haefeb509845cc871
at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/probability-0.20.3/src/distribution/laplace.rs:22:9
18: 0x7f9d119a44 - constriction::pybindings::stream::model::QuantizedLaplace::new::{{closure}}::h5e7236efc8092f4a
at /home/user/repos/constriction/src/pybindings/stream/model.rs:544:44
19: 0x7f9d161b7c - <constriction::pybindings::stream::model::internals::ParameterizableModel<(P0,P1),M,F> as constriction::pybindings::stream::model::internals::Model>::parameterize::h135fd1ad8427458f
at /home/user/repos/constriction/src/pybindings/stream/model/internals.rs:249:35
20: 0x7f9d17476c - constriction::pybindings::stream::queue::RangeEncoder::encode::h7a9429b10f1006e7
at /home/user/repos/constriction/src/pybindings/stream/queue.rs:314:13
21: 0x7f9d175934 - constriction::pybindings::stream::queue::_::<impl constriction::pybindings::stream::queue::RangeEncoder>::__pymethod_encode__::hb8e05b89a123d73c
at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
22: 0x7f9d0ff204 - pyo3::impl_::trampoline::fastcall_with_keywords::{{closure}}::he0112aa3b1e49083
at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:29
23: 0x7f9d0ff10c - pyo3::impl_::trampoline::trampoline::{{closure}}::h9f46c71dbb4a44d4
at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:54
24: 0x7f9d15b494 - std::panicking::try::do_call::hd5234df960a8f95d
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502:40
25: 0x7f9d15b720 - __rust_try
26: 0x7f9d15b268 - std::panicking::try::haeac5d692bf46066
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466:19
27: 0x7f9d115570 - std::panic::catch_unwind::h2b95655acc141454
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142:14
28: 0x7f9d0fed68 - pyo3::impl_::trampoline::trampoline::h82d7e8a0b17fbbdf
at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181:9
29: 0x7f9d140a94 - pyo3::impl_::trampoline::fastcall_with_keywords::h45c8d979f3ca1f26
at /home/user/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41:13
30: 0x7f9d174a48 - constriction::pybindings::stream::queue::_::<impl pyo3::impl_::pyclass::PyMethods<constriction::pybindings::stream::queue::RangeEncoder> for pyo3::impl_::pyclass::PyClassImplCollector<constriction::pybindings::stream::queue::RangeEncoder>>::py_methods::ITEMS::trampoline::h905700428ec8b158
at /home/user/repos/constriction/src/pybindings/stream/queue.rs:43:1
31: 0x488db8 - _PyMethodDescr_FastCallKeywords
32: 0x512714 - _PyEval_EvalFrameDefault
33: 0x50b47c - _PyEval_EvalCodeWithName
34: 0x50dd70 - PyEval_EvalCode
35: 0x616f84 - <unknown>
36: 0x617070 - PyRun_FileExFlags
37: 0x61844c - PyRun_SimpleFileExFlags
38: 0x6494c8 - <unknown>
39: 0x649b00 - _Py_UnixMain
40: 0x7f9db946e0 - __libc_start_main
at /build/glibc-4fr630/glibc-2.27/csu/../csu/libc-start.c:310
41: 0x5aef1c - <unknown>
Traceback (most recent call last):
File "test_script.py", line 13, in <module>
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Error in sys.excepthook:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/apport_python_hook.py", line 63, in apport_excepthook
from apport.fileutils import likely_packaged, get_recent_crashes
File "/usr/lib/python3/dist-packages/apport/__init__.py", line 5, in <module>
from apport.report import Report
File "/usr/lib/python3/dist-packages/apport/report.py", line 30, in <module>
import apport.fileutils
File "/usr/lib/python3/dist-packages/apport/fileutils.py", line 23, in <module>
from apport.packaging_impl import impl as packaging
File "/usr/lib/python3/dist-packages/apport/packaging_impl.py", line 24, in <module>
import apt
File "/usr/lib/python3/dist-packages/apt/__init__.py", line 23, in <module>
import apt_pkg
ModuleNotFoundError: No module named 'apt_pkg'
Original exception was:
Traceback (most recent call last):
File "test_script.py", line 13, in <module>
encoder.encode(message, entropy_model, np.array([10., 10.]), np.array([10., 0]))
pyo3_runtime.PanicException: assertion failed: b > 0.0
Thanks!
Hello,
In my use case, I have a Vec
of String
in a structure that I want to compress.
But I need to keep O(1)
access to the element in the Vec
, so I was thinking about using TANS and storing my probability table on the side:
Before:
struct Index {
doc: Vec<String>,
}
After:
struct Index {
doc: Vec<Vec<u8>>,
// The same probabilities are used to encode/decode every string of the documents
prob: [u8; 256],
}
Is this library supposed to support this?
From what I’ve seen, it seems like we need to provide the probabilities for the symbol we're currently compressing.
Hello, thanks for releasing such a complete and well-documented library!
I have a doubt about its application on integer streams. Why does the library force you to cast arrays to numpy.int32? In my use case, in which I have to compress neural network weights, I find a general-purpose compression algorithm such as brotli orders of magnitude more efficient than both the Range and ANS coders, and I believe that representing symbols as 32-bit integers may be part of the issue.
I tried to implement a custom entropy model that is only defined on positive numbers.
To make sure only positive numbers are encoded I used an assert statement to check for non-negativity.
To make my point I use a lognormal distribution in the following snippet.
def cdf(x, mu, sigma):
assert x >= 0
return stats.lognorm.cdf(x, mu, sigma)
def inverse_cdf(q, mu, sigma):
return stats.lognorm.ppf(q, mu, sigma)
mus = np.random.randn(100)
sigmas = np.random.randn(100)**2 +1
dummy_entropy_model = constriction.stream.model.CustomModel(cdf ,inverse_cdf, -10, 10)
message = (np.random.randn(100)**2).round().astype(np.int32)
coder = constriction.stream.stack.AnsCoder()
coder.encode_reverse(message, dummy_entropy_model, mus, sigmas)
decode = coder.decode(dummy_entropy_model, mus, sigmas)
thread '<unnamed>' panicked at 'TODO: PyErr { type: <class 'AssertionError'>, value: AssertionError(), traceback: Some(<traceback object at 0x7f22f0efac80>) }', src/pybindings/stream/model/internals.rs:380:14
PanicException: TODO: PyErr { type: , value: AssertionError(), traceback: Some() }
This problem is of course solved by deleting assertions from cdf.
A more permanent solution might be a change to the documentation or the error message.
There is an error when import constriction
on Ubuntu system with python.
Environment:
Ubuntu 18.04
python 3.7, 3.8, 3.9
In my usecase I want to compress a large amount of data with a custom entropy model.
Unfortunately this takes quite some time since for each compressed symbol the cdf is called.
I can't straight up use the scipy model adapter since I'm using a mixture distribution which is not implemented in scipy.
Here's my dummy code:
from scipy import stats
import constriction
import numpy as np
c = 0
def cdf_likelihood_normal(x, mu, sigma):
global c
c += 1
print(c, end="\r")
p = stats.norm.cdf(x, loc=mu, scale=sigma )
return p
def inverse_cdf_likelihood_normal(q, mu, sigma):
x = stats.norm.ppf(q, loc=mu, scale = sigma)
return x
coder = constriction.stream.stack.AnsCoder()
entropy_model = constriction.stream.model.CustomModel(cdf_likelihood_normal, inverse_cdf_likelihood_normal, -10, 10)
sigma = np.ones(int(1e4))
mu = np.zeros(int(1e4))
message = np.random.randint(-1,1,int(1e4),dtype=np.int32)
p = stats.norm.cdf(message, loc=mu, scale=sigma) # very fast
coder.encode_reverse(message, entropy_model, mu, sigma) # very slow
print(coder.num_bits())
reconstruction = coder.decode(entropy_model, mu,sigma)
assert (message == reconstruction).all()
Is it possible to take care of vectorizable cdfs in the custom model adapter to allow for a speed up?
It would be very good for me (neuroinformatics research) if we could output the binary symbol codes from the Huffman tree.
I've been working in the python library but can write rust code. If this feature exists in the rust code I'm happy to write the python binding. If the feature isn't in rust code either, I'd be happy to write it and add the python binding also.
Let me know if this would be good, and maybe me a little direction about how you'd like it if so!
The following python code
import constriction
import numpy as np
model = constriction.stream.model.Categorical(np.array([0.15, 0.69, 0.15]))
gets stuck in an infinite loop. This is probably caused by a bug in the function optimize_leaky_categorical
.
Fixing this might turn out to be a breaking change, so it might require a major version bump.
Additional example:
p = np.array([1.34673042e-04, 6.52306480e-04, 3.14999325e-03, 1.49921896e-02, 6.67127371e-02, 2.26679876e-01, 3.75356406e-01, 2.26679876e-01, 6.67127594e-02, 1.49922138e-02, 3.14990873e-03, 6.52299321e-04, 1.34715927e-04])
constriction.stream.model.Categorical(p/p.sum())
Many thanks to Grégoire Jauvion for reporting this issue by email.
As reported in #40 (comment), the license file should be listed in the RECORD
file of python wheels.
Warning: Validation of the RECORD file of constriction-0.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl failed. Please report to the maintainers of that package so they can fix their build process. Details:
In /home/uken/.cache/pypoetry/artifacts/bf/7f/3e/dafe24f40a54f14858cdb770e83c4fc076d2a8e86141bb380a25393718/constriction-0.3.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl, LICENSE.html is not mentioned in RECORD
I am currently working on maintaining a legacy codebase which for valid reasons1 requires numpy (>=1.16.0,<1.19.0). I was about to introduce constriction
as a mean to significantly reduce memory consumption by some base models, only to find out that I cannot install it due to version conflict.
Is numpy>=1.19.0 a hard requirement?
Old versions of some scientific libraries require old version of numpy. It is impossible to upgrade those libraries because newer versions have introduced incompatible changes to serialized models. We have to support existing models and cannot recreate them. ↩
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.