Coder Social home page Coder Social logo

fpnge's People

Contributors

animetosho avatar nigeltao avatar stefanvk avatar veluca93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fpnge's Issues

Tweak heuristics for fast predictor selection

Have you tried subtracting 1 iff nbits[0] <= (1/2)?

Even better I think could be to double all the bit costs (as that changes nothing), but remap bitcost 1 into 1 instead of 2 for 0, and perhaps bitcost 2 into 3 (or 4), or something like that.

Rationale being that RLE only kicks in for many consecutive 0s, so you'd need the 0 probability to be quite high.

Originally posted by @veluca93 in #19 (comment)

Filter search short circuiting

In some images (mainly photographic content), considering other filters than paeth is futile.

I therefore tested a simple heuristic: Group scanlines into groups of 8. If the filter search selects paeth for all of the first 3, the remaining five get paeth applied as well without other filters considered.

This reaps most of the benefits of optimal filter selection, at the cost of half of fpnge4's speed advantage over fpnge.

Sorted file sizes for a corpus of images, both photographic and synthetic images:

heuristics_2

While this is just a quick attempt, I think it shows that trying to achieve near optimal filter selection without doing a full filter search is both doable and worthwhile.

The opposite case, getting rid of paeth calculations in images where paeth doesn't perform well may also yield some gains. (flat areas?)

Appears to not handle 16-bit RGB images

Output image is 8bit from 16bit image input according to file and ImageMagick compare.

Steps to reproduce:

git clone <this repo>
./build.sh
# Check inputs
file 8b.png 
# 8b.png: PNG image data, 1920 x 1200, 8-bit/color RGB, non-interlaced
file 12b.png
# 12b.png: PNG image data, 1920 x 1200, 16-bit/color RGB, non-interlaced

# Convert images
./build/fpnge 8b.png 8b-out.png 100
./build/fpnge 12b.png 12b-out.png 100

# Compare:
compare -metric AE 8b.png 8b-out.png compare.png  # gives 0
compare -metric AE 12b.png 12b-out.png compare.png  # Gives a very very large number (2.304e+06)

file 12b-out.png 
# 12b-out.png: PNG image data, 1920 x 1200, 8-bit/color RGB, non-interlaced

Note: the 12b-out.png claims to be 8-bit/color. Interestingly opencv reads this still as 16-bit, but wrongly.

Extra Info:

CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
OS: Debian GNU/Linux 11 (bullseye) (in docker)
Compiler: gcc (Debian 10.2.1-6) 10.2.1 20210110 g++ (Debian 10.2.1-6) 10.2.1 20210110

Extra extra

I stumbled on this via the python binding here. Below is an example out showing the errors:

blank_image = np.random.randint(0, high=np.iinfo(np.uint16).max, size=(1,2,3), dtype=np.uint16)
image_bytes = fpnge.binding.encode_view(blank_image.data, blank_image.shape[1], blank_image.shape[0], blank_image.shape[2], blank_image.dtype.itemsize * 8)
image_check: cv2.Mat = cv2.imdecode(np.frombuffer(image_bytes, np.uint8), cv2.IMREAD_ANYCOLOR | cv2.IMREAD_ANYDEPTH)
assert image_check.dtype == blank_image.dtype
assert image_check.shape == blank_image.shape
assert (image_check == blank_image).all()
# If you print these:
pprint.pprint(image_check)
array([[[51929, 35260,  2880],
        [ 6872, 23392, 15276]]], dtype=uint16)
pprint.pprint(blank_image)
array([[[16395, 48265, 55754],
        [44091, 24667, 55322]]], dtype=uint16)

Images:
8b
12b

Assertion failed in HuffmanTable::FillBits with certain images

I've discovered some images which fail the assert in HuffmanTable::FillBits. I've been able to reproduce the issue on GCC and MSVC, samples loaded with lodepng or stb_image. Failure happens when trying to encode images in RGB format.

Sample images, including raw pixel data at 3 channel RGB: sample_images.zip

Image 1:

  • width: 256
  • height: 256
  • num_channels: 3
  • bytes_per_channel: 1
  • FPNGE_COMPRESS_LEVEL_BEST
  • FPNGE_CICP_NONE
    img1

Image 2:

  • width: 512
  • height: 512
  • num_channels: 3
  • bytes_per_channel: 1
  • FPNGE_COMPRESS_LEVEL_BEST/FPNGE_COMPRESS_LEVEL_DEFAULT/3/2/1 (failing all compression levels, libpng encodes this image without issue)
  • FPNGE_CICP_NONE
    sample0

Apple Silicon, MacOS AARCH64 (ARM)

Greetings.

The code does not compile on AARCH64 since the SSE/AVX intrinsics would depend on NEON.
Example:

#if defined(__SSE4_2__)
#include <nmmintrin.h>
#elif defined(__aarch64__)
#include <arm_neon.h>
#endif

// z[i] = x[i] + y[i]
void vadd(const int* x, const int* y, int* z, unsigned int count) {
    // process 4 integers (128bits) with simd
    unsigned int i = 0;
    for (; i + 4 <= count; i += 4) {
#if defined(__SSE4_2__)
        const __m128i vx = _mm_lddqu_si128((const __m128i*)(x + i));
        const __m128i vy = _mm_lddqu_si128((const __m128i*)(y + i));
        const __m128i vz = _mm_add_epi32(vx, vy);
        _mm_storeu_si128((__m128i*)(z + i), vz);
#elif defined(__aarch64__)
        const int32x4_t vx = vld1q_s32(x + i);
        const int32x4_t vy = vld1q_s32(y + i);
        const int32x4_t vz = vaddq_s32(vx, vy);
        vst1q_s32(z + i, vz);
#endif
    }

    // tail loop
    for (; i < count; ++i) {
        z[i] = x[i] + y[i];
    }
}

I have setup a working Github pipeline for compiling and testing this FPNGe on AARCH. But I am not a CPP programmer, would you be able and willing to help me when I have questions on the porting?

Java Wrapper

Greetings!

Thank you and compliments for this great library. I wrapped it into a Java Library pursuing for the fastest PNG Encoder. Please see https://github.com/manticore-projects/fpng-java

As it looks by now, FPNGE is the winner by size and also performance:

Benchmark                                           (imageName)  Mode  Cnt     Score    Error  Units
FPNGEBenchmark.encode                               example.png  avgt    3     5.895 ±  0.369  ms/op
FPNGEBenchmark.encode                   looklet-look-scale6.png  avgt    3   242.326 ±  3.215  ms/op
FPNGEncoderBenchmark.encode                         example.png  avgt    3     7.267 ±  4.725  ms/op
FPNGEncoderBenchmark.encode             looklet-look-scale6.png  avgt    3   353.969 ± 14.833  ms/op
ImageIOEncoderBenchmark.encode                      example.png  avgt    3    54.096 ±  0.742  ms/op
ImageIOEncoderBenchmark.encode          looklet-look-scale6.png  avgt    3  1285.791 ± 14.237  ms/op
ObjectPlanetPNGEncoderBenchmark.encode              example.png  avgt    3    25.882 ±  0.670  ms/op
ObjectPlanetPNGEncoderBenchmark.encode  looklet-look-scale6.png  avgt    3   639.232 ± 23.186  ms/op
PNGEncoderBenchmark.encode                          example.png  avgt    3    29.218 ±  0.965  ms/op
PNGEncoderBenchmark.encode              looklet-look-scale6.png  avgt    3   573.997 ± 16.234  ms/op
PNGEncoderBenchmark.encodeFastest                   example.png  avgt    3    17.628 ±  0.511  ms/op
PNGEncoderBenchmark.encodeFastest       looklet-look-scale6.png  avgt    3   362.208 ±  4.346  ms/op

Although I still need to improve those Benchmarks so they compare based on similar achieved file-sizes and also need to reflect different JDKs.

Ideas for improving compression ratio?

fpnge is pretty fast, so I was wondering what, in theory, could be done to improve compression slightly without sacrificing too much speed. I was hoping you might have some ideas on the topic (regardless of if they'd ever be implemented).
Thoughts I had:

RLE

Currently only full vectors of zeroes are candidates for RLE.

  • Detecting sequences starting/ending partway through a vector might help density very slightly, with relatively little performance cost
  • Detecting sequences completely within a vector may be possible without too much cost, but I suspect would have little impact on compression, due to it being relatively short sequences
  • Detecting sequences of bytes other than 0 - not sure of the effectiveness
    • Perhaps use the distance of one pixel to the left instead of one byte?

Unfortunately due to how the Huffman table is currently constructed, the LZ77 symbols always get assigned rather long sequences, making RLE unattractive (particularly with 0s) unless it's a rather long sequence. Which leads to the next point.

Huffman encoding

The current implementation doesn't consider symbol counts for many symbols, as the Huffman table has to be specially crafted for the algorithm used. Trying to relax some length restrictions often leads to ComputeCodeLengths becoming rather slow - from what I can tell, it appears to be finding the optimal tree via a brute-force approach. Would you happen to know of a faster algorithm for this?

The other idea is to remove almost all length restrictions, which allows normal optimal Huffman tree algorithms to be used. I can't think of a quick way to allow the middle section (symbols 16-239) to have differing bit lengths, so you'd need to manipulate with symbol counts there a bit to ensure they all end up getting the same bit length. Otherwise, the unrestricted Huffman code could be handled with some slowdown, as, in addition to shuffle-lookup for the low bits, you'd need to do it for the high bits as well.

Another thing to consider might be having multiple deflate blocks and sampling the image at various points, to be able to adapt to changes which occur vertically down the image.

I suspect a more optimal Huffman code might generally give a low single-digit percentage gain most of the time, but at some speed cost.

LZ77

It seems that there's a fair bit of scope here to improve compression through better match finding, but this is also often the slowest part of deflate implementations. I can't think of a way to implement this without dropping to scalar code either.

Furthermore, I'm not sure that knowing the data is an image, provides much benefit over generic LZ77 implementations - except that perhaps you can only look for matches on pixel boundaries instead of for every byte.

Any ideas here?

FPNGEOutputAllocSize can underestimate the required sized

There exist some scenarios for which FPNGEOutputAllocSize underestimates the required size. This can lead to a heap buffer overflows.

So far I only ran into the problem with 1 pixel wide grayvalue images at compression level 2.

Example image 1 channel 1x64000 noise

With added -g option for grayvalue images:
./fpnge -2 -g ../testdata/narrow_gv.png out.png
Segmentation fault (core dumped)

Allocated size: 129024
Encoded size: 129875

[fjxld] group tiles by type, then decode using SIMD

Hi, Luca.

The question is related to jxl.

Previously I noticed that decoder is 10 times slower than encoder: libjxl/libjxl#1975 (comment)

I guess, it is because of dependency between pixels at decoding. What if to group tiles with the same predictor type?

I said "guess" because I'm unable to locate the code for encoding and decoding -e 1, the structure of libjxl is a bit over-complicated. Can you provide the direct links at current master?

Or, maybe, you want to test this idea and write fjxld?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.