Coder Social home page Coder Social logo

py-lz4framed's Introduction

Overview

This is an LZ4-frame compression library for Python v3.2+ (and 2.7+), bound to Yann Collet's LZ4 C implementation.

Installing / packaging

# To get from PyPI
pip3 install py-lz4framed

# To only build extension modules inline (e.g. in repository)
python3 setup.py build_ext -i

# To build & install globally
python3 setup.py install

Notes

Usage

Single-function operation:

import lz4framed

compressed = lz4framed.compress(b'binary data')

uncompressed = lz4framed.decompress(compressed)

To iteratively compress (to a file or e.g. BytesIO instance):

with open('myFile', 'wb') as f:
    # Context automatically finalises frame on completion, unless an exception occurs
    with Compressor(f) as c:
        try:
            while (...):
               c.update(moreData)
        except Lz4FramedNoDataError:
            pass

To decompress from a file-like object:

with open('myFile', 'rb') as f:
    try:
        for chunk in Decompressor(f):
           decoded.append(chunk)
    except Lz4FramedNoDataError:
        # Compress frame data incomplete - error case
        ...

See also lz4framed/__main__.py for example usage.

Documentation

import lz4framed
print(lz4framed.__version__, lz4framed.LZ4_VERSION, lz4framed.LZ4F_VERSION)
help(lz4framed)

Command-line utility

python3 -mlz4framed
USAGE: lz4framed (compress|decompress) (INFILE|-) [OUTFILE]

(De)compresses an lz4 frame. Input is read from INFILE unless set to '-', in
which case stdin is used. If OUTFILE is not specified, output goes to stdout.

Tests

Static

This library has been checked using flake8 and pylint, using a modified configuration - see pylint.rc and flake8.cfg.

Unit

python3 -m unittest discover -v .

Why?

The only existing lz4-frame interoperable implementation I was aware of at the time of writing (lz4tools) had the following limitations:

  • Incomplete implementation in terms of e.g. reference & memory leaks on failure
  • Lack of unit tests
  • Not thread safe
  • Does not release GIL during low level (de)compression operations
  • Did not address the requirements for an external project

py-lz4framed's People

Contributors

honzasp avatar iotic-labs-markwharton avatar vtermanis avatar windreamer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

py-lz4framed's Issues

v0.11.0 uses LZ4 v1.8.1.2

Hi guys,
first thanks for your python library! it proved to be very useful for my research work which needs to store a lot of gigabytes of data.

Secondly, the current version of lz4 is now v1.8.1.2, with v1.8.1 being deprecated:
https://github.com/lz4/lz4/releases/tag/v1.8.1

The code is exactly the same it seems, but with a packaging mistake. Are you planning on getting up-to-date with it or just wait for the next major version?
I think keeping the versions up-to-date is less confusing :).

Thanks again!

What causes Lz4FramedError (lz4framed.LZ4F_ERROR_GENERIC)

Hi,

What does it mean error ('ERROR_GENERIC', 1)?

I wrote a piece of code that uses the TCP socket.
On one side I compresses data and send to the socket.
On other side I read.
The error occurs (on reader side) after I put too much, too quickly data to the socket.
Change block_size_id from LZ4F_BLOCKSIZE_MAX64KB to LZ4F_BLOCKSIZE_MAX4MB pushes the boundaries of the speed with which I can write.

Is this the expected error after exceeding the speed limit?

level parameter <3 no effect

I wanted to benchmark different fast compression configurations, but setting the level to {0,1,2} makes no difference in file size.

Duplicated effort

Hi folks at Iotic. I just stumbled across this package. We've been hard at work here building out lz4 frame bindings for Python (including file handlers etc): https://github.com/python-lz4/python-lz4

It's a shame to duplicate effort, hopefully we can collaborate :)

Linewise iteration over compressed file

With gzip.open() the following is possible:

with gzip.open(filename, 'rb') as f:
    for line in f:

How do I achieve this with lz4framed? (And I think this question is general enough that this should be integrated into the library)

There's ByteIO, but this only supports an initial chunk. The question is essentially: How do I stream the chunks from lz4 into a ByteIO so that I can read line-by-line?

multi-threading decompression

Hi!
Thanks for sharing this repo :)

I'm wondering - since this is thread safe, is there an example of using multi-threading to decode a single file faster ?
(I assume that it is done decoding each part separately using each thread worker)

Support passing memoryview objects

Use case

We have a large pointer (memoryview) to a bytes buffer containing compressed data. Specifically we are receiving and sending compressed py-lz4framed data via zeromq zero-copy.

It would be beneficial (and possibly more performant) if we did not have to create a copy of the bytes buffer (via .tobytes()) before we passed the data to py-lz4framed.

Desired behavior

  • decompress accepts memoryview object and does not create a copy of the input data
  • compress accepts memoryview object and does not copy the input data

I apologize that my C skills aren't good enough to submit a pull-request.

Please never delete releases from PyPI

Our build system broke since we were pinned to py-lz4framed==0.9.5, which has been deleted from pypi.python.org. It is a best practice to never delete released packages, to prevent such issues for people who were already using your library.

Precompiled binaries for easier installation?

Our specific use case is that we are using lz4framed inside Docker containers. The issue is that base linux in docker used for python often doesn't have the needed gcc components. We use the miniconda3 docker base image and of the dozens of python packages we use (many of which require compilation), lz4framed is the only one that needs to have gcc installed into the container.

We can obviously apt-get install gcc, but having lz4framed packaged up into binaries would be a great benefit to its users like us. I don't know what the compile dependencies are and if they impact wheels for pip, but if so conda can be used to build binaries so this isn't an issue and the package just installs anywhere easily.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.