Coder Social home page Coder Social logo

bvibber / mtpng Goto Github PK

View Code? Open in Web Editor NEW
201.0 6.0 12.0 7.54 MB

A parallelized PNG encoder in Rust

Home Page: https://crates.io/crates/mtpng

License: Other

Rust 74.81% Shell 1.21% Batchfile 2.25% Makefile 0.77% C 15.09% Swift 5.87%
png png-encoder rust-library multithreaded

mtpng's Introduction

mtpng

A parallelized PNG encoder in Rust

by Brooke Vibber [email protected]

Background

Compressing PNG files is a relatively slow operation at large image sizes, and can take from half a second to over a second for 4K resolution and beyond. See my blog post series on the subject for more details.

The biggest CPU costs in traditional libpng seem to be the filtering, which is easy to parallelize, and the deflate compression, which can be parallelized in chunks at a slight loss of compression between block boundaries.

pigz is a well-known C implementation of parallelized deflate/gzip compression, and was a strong inspiration for the chunking scheme used here.

I was also inspired by an experimental C++/OpenMP project called png-parallel by Pascal Beyeler, which didn't implement filtering but confirmed the basic theory.

State

Creates correct files in all color formats (input must be pre-packed). Performs well on large files, but needs work for small files and ancillary chunks. Planning API stability soon, but not yet there -- things will change before 1.0.

Goals

Performance:

  • ☑️ MUST be faster than libpng when multi-threaded
  • ☑️ SHOULD be as fast as or faster than libpng when single-threaded

Functionality:

  • ☑️ MUST support all standard color types and depths
  • ☑️ MUST support all standard filter modes
  • ☑️ MUST compress within a few percent as well as libpng
  • MAY achieve better compression than libpng, but MUST NOT do so at the cost of performance
  • ☑️ SHOULD support streaming output
  • MAY support interlacing

Compatibility:

  • MUST have a good Rust API (in progress)
  • MUST have a good C API (in progress)
  • ☑️ MUST work on Linux x86, x86_64
  • ☑️ MUST work on Linux arm, arm64
  • ☑️ SHOULD work on macOS x86_64
  • ☑️ SHOULD work on iOS arm64
  • ☑️ SHOULD work on Windows x86, x86_64
  • ☑️️ SHOULD work on Windows arm64

Compression

Compression ratio is a tiny fraction worse than libpng with the dual-4K screenshot and the arch photo at the current default 256 KiB chunk size, getting closer the larger you increase it.

Using a smaller chunk size, or enabling streaming mode, will increase the file size slightly more in exchange for greater parallelism (small chunks) and lower latency to bytes hitting the wire (streaming).

In 0.3.5 a correction was made to the filter heuristic algorithm to match libpng in some circumstances where it differs; this should provide very similar results to libpng when used as a drop-in replacement now. Later research may involve changing the heuristic, as it fails to correctly predict good performance of the "none" filter on many screenshot-style true color images.

Performance

Note that unoptimized debug builds are about 50x slower than optimized release builds. Always run with --release!

As of September 26, 2018 with Rust 1.29.0, single-threaded performance on Linux x86_64 is ~30-40% faster than libpng saving the same dual-4K screenshot sample image on Linux and macOS x86_64. Using multiple threads consistently beats libpng by a lot, and scales reasonably well at least to 8 physical cores.

See docs/perf.md for informal benchmarks on various devices.

At the default settings, files whose uncompressed data is less than 128 KiB will not see any multi-threading gains, but may still run faster than libpng due to faster filtering.

Todos

See the projects list on GitHub for active details.

Build instructions

A Cargo build process is used; note that libz_sys is pulled in which may build the zlib C library on some platforms that don't ship it standard like Windows.

There are two user-visible feature flags:

  • capi builds and exports the C-compatible API symbols; only needed if you're going to link the resulting library with C/C++ code that calls it
  • cli builds the command-line tool for testing/demo as well as the library

To use mtpng in a pure Rust program, or only in the Rust part of a mixed C-Rust program, it is not required to use either flag.

Usage

Note: the Rust and C APIs are not yet stable, and will change before 1.0.

Rust usage

See the crate API docs for details.

The mtpng CLI tool can be used as an example of writing files.

In short, something like this:

let mut writer = Vec::<u8>::new();

let mut header = Header::new();
header.set_size(640, 480)?;
header.set_color(ColorType::TruecolorAlpha, 8)?;

let mut options = Options::new();

let mut encoder = Encoder::new(writer, &options);

encoder.write_header(&header)?;
encoder.write_image_rows(&data)?;
encoder.finish()?;

C usage

See c/mtpng.h for a C header file which connects to unsafe-Rust wrapper functions in the mtpng::capi module.

To build the C sample on Linux or macOS, run make. On Windows, run build-win.bat x64 for an x86-64 native build, or pass x86 or arm64 to build for those platforms.

These will build a sample executable from sample.c as well as a libmtpng.so, libmtpng.dylib, or mtpng.dll for it to link. It produces an output file in out/csample.png.

Data flow

Encoding can be broken into many parallel blocks:

Encoder data flow diagram

Decoding cannot; it must be run as a stream, but can pipeline (not yet implemented):

Decoder data flow diagram

Dependencies

Rayon is used for its ThreadPool implementation. You can create an encoder using either the default Rayon global pool or a custom ThreadPool instance.

crc is used for calculating PNG chunk checksums.

libz-sys is used to wrap libz for the deflate compression. I briefly looked at pure-Rust implementations but couldn't find any supporting raw stream output, dictionary setting, and flushing to byte boundaries without closing the stream.

itertools is used to manage iteration in the filters.

png is used by the CLI tool to load input files to recompress for testing.

clap is used by the CLI tool to handle option parsing and help display.

time is used by the CLI tool to time compression.

License

You may use this software under the following MIT-style license:

Copyright (c) 2018-2024 Brooke Vibber

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

mtpng's People

Contributors

backtozero avatar bvibber avatar kneelawk avatar paulgrandperrin avatar pwuertz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

mtpng's Issues

Add metadata to allow parallel decoding

PNG is an extensible format - you can add custom Ancillary chunks to a file, and decoders will ignore any such chunk that they don't recognize.

If an Ancillary chunk was added, to store the offsets of the zlib sub-streams, then a decoder could take advantage of this information to parallelize decoding.

Optionally, to allow parallel defiltering, the pixel values of the last row in each block could be included, compressed (this comes with an obvious file-size penalty, however) (thinking about this some more, it'd make more sense to just never use filtering on the first row of each block)

Thinking even more about this, a sufficiently intelligent client requesting PNGs over the network could use this metadata to request each chunk over a parallel connection (e.g. using HTTP range requests)

Use fuzzer to verify correctness

Fuzzing is a powerful tool for correctness verification. The gist of it is that it generates a lot of semi-random inputs really fast and uses execution path tracing to generate better ones. It lets you discover both memory safety issues (with Address Sanitizer) and correctness issues - e.g. if the encoded image failed to decode, or decoded to something else than original data.

Introduction to fuzzing in Rust: https://rust-fuzz.github.io/book/introduction.html

Example of a PNG fuzzing harness (decoding only): https://github.com/image-rs/image-png/blob/master/png-afl/src/main.rs

Example of a roundtrip fuzz target: https://github.com/gendx/lzma-rs/blob/master/fuzz/fuzz_targets/roundtrip_lzma.rs

Trophy case of bugs in Rust code found via fuzzing: https://github.com/rust-fuzz/trophy-case

Please document cargo "features"

When running cargo add with this crate, it shows the following features are available but off by default:

  • capi
  • clap
  • cli
  • libc
  • png
  • time

I do not find documentation of these in the README, nor in the docs.rs page.

From a comment in the cargo.toml combined with implications in the README, I assume that "cli" is for building the command line version, "clap", "png" and "time" are helper features used by the cli and not directly relevant to a user, and I assume "capi" and "libc" are related to the WIP C frontend. However, it would be helpful to have this all be explicitly stated in some piece of documentation, if only so that a user can check and know "no, I don't need any of these" (it is somewhat concerning to install a png encoder library and discover that the "png" feature is disabled :) ). Thanks!

Can't build with libz-sys on Windows ARM64

Currently I can't figure out how to get a Windows ARM64 (aarch64-pc-windows-msvc) build working with the libz-sys dependency. Since there's no system zlib, it pulls the source and tries to build an embedded copy.

But something builds for x86 (the toolchain arch) instead of arm64 (the target arch) and ends up failing to link. (It's cross-compiling on the actual device because there's no native arm64 toolchain, but it runs the x86 toolchain fine in emulation.)

Note that on the miniz-oxide branch, it builds and runs fine without the C dependency, but the encoding is much slower.

Encoder panics if dropped before calling flush()

The Issue

If an Encoder is created and written to, then dropped, there is a likelihood that the encoder will panic when still-running encoder threads attempt to send filtered or deflated results back to the now dropped encoder. This is because the Receiver held by the encoder will be dropped, but separate threads could still be running and attempting to send using the Senders attached to the now dropped receiver. Once these threads have tried to send their results back to the dropped encoder, they unwrap the results on the sender, causing their rayon threads to panic, killing the application.

The code of note is in encoder.rs on lines 734 and 757.
https://github.com/brion/mtpng/blob/7b8bc8939c8dda1c571dd5083113f4a45f074a99/src/encoder.rs#L734
https://github.com/brion/mtpng/blob/7b8bc8939c8dda1c571dd5083113f4a45f074a99/src/encoder.rs#L757

Context

I am using mtpng in a situation where there are many circumstances under which it could be dropped and attempting to find each of them and make sure flush() is called before the drop happens is difficult.

Pull Request

I am working on a pull request that specifically replaces these unwrap() statements with ok() statements so that the Results of attempting to send the filtered/deflated data are ignored.

expose "write_chunk" functionality

The current API does not allow writing arbitrary chunks. This is because the Writer is not exposed on the Encoder struct, and the writer module is not public.
This is currently limiting me from using mtpng, as I need to be able to write extra/custom chunks other than allowed currently.
Maybe a new write_chunk method could be added to the Encoder that simply forwards to the inner Writer?

Setting compression strategy in C API

Hi, I was experimenting with the compression strategy setting of the mtpng cli tool. For a specific type of images I'm seeing a huge benefit in using RLE or Huffman instead of the default setting.

I don't really see a way of setting this option using the C-API though. Unfortunately I don't know anything about Rust, but I'd assume that a new wrapper method similar to capi.rs/mtpng_encoder_options_set_filter is required to expose this setting to C?

Single threaded option - rayon as the default feature

I know the crate is specifically for multi-threaded encoding/decoding.
I have managed to get sub millisecond encoding per image for my use case of encoding hundreds of small png files concurrently, and I would like to use mtpng to have low level control over Indexed pngs with transparency.

However, server side I do not want to use many threads on each request. Throughput of the server is more important, not time per request, so I think using the current thread would be the best way to do this.

I have looked at the code to see how easy it would be to have rayon as a default (optional) dependency, and be able to add default-features=false. However, I dont understand the code enough to remove the multithreading part in encoder.rs.

Also Im not even sure there would be a significant performance gain over

let pool = rayon::ThreadPoolBuilder::new().num_threads(1).build().unwrap();
(except that creating a thread pool per request seems like a bad idea)

I'd like to get feedback on this, also it could be useful for the WASM issue #13

Pass an Iterator as source (for BGR->RGB for example)

Is it possible to add the ability to pass an iterator into the encoder (without having an effect on performance)?
I tried looking through the code but it was a bit too complex for me.

Why I ask is that in case you have BGR data, you currently have to convert it beforehand.
Which means going through everything just to convert it,
it's very fast though, but I would imagine it being faster if it could be done when the data is actually being encoded.

Here's an example of BGR to RGB:

pub fn bgr_to_rgb(b: &mut [u8]) {
    // swap R<->B to convert BGR->RGB
    b.chunks_exact_mut(3).for_each(|l| l.swap(0, 2));
}

Now if we could pass an iterator the swap could be done on-the-fly which might be more cache friendly as well.

Thoughts on this?

WASM experience?

Does anyone have any experience in compiling this to WASM please?

Windows build is slower than Linux build

Win64 and win32 builds of mtpng are noticeably slower than Linux or macOS builds, more so than when running the Linux build on WSL.

It looks like it may be due to using zlib via the libz_sys crate, which compiles locally on Windows while picking up a highly-optimized system package on Linux. The local compile may not pick up the best optimization options, and definitely doesn't engage the specialized assembly functions that are available.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.