Coder Social home page Coder Social logo

LARGE_WINDOW 32kb windows about isa-l HOT 17 CLOSED

intel avatar intel commented on August 21, 2024
LARGE_WINDOW 32kb windows

from isa-l.

Comments (17)

gbtucker avatar gbtucker commented on August 21, 2024

Sync_flush does do what you expect and allow matches over submit bounties as does no_flush in statefull compress. Changing the default to large_window may not have a big impact depending on data set unless you also change the default hufftables at compile time or use isal_create_hufftables() at runtime.

We are working on a number of improvements in compression that we will push soon that both improve the compression ratio and speed. This includes making 32K window the default.

from isa-l.

vlm avatar vlm commented on August 21, 2024

@gbtucker thanks.

Changing the default Huffman tables to the ones tailored to the data did increase compression ratio for 32kB windows from 2.49x to 3.97x.

But then, the same tables on 8kB windows gave 3.66x.

This leads me to believe that larger windows are not getting using properly in this code.

from isa-l.

gbtucker avatar gbtucker commented on August 21, 2024

It depends a lot on your data of course. What data set are you using?

from isa-l.

vlm avatar vlm commented on August 21, 2024

I created a demo script. It concatenates two random files of size ${BLOCK} several times and then tries to compress them using gzip -1 or ISA-L's igzip_example in 32-K window mode.

[vlm@p1 igzip]$ ./test-compression.sh 8000
igzip_example
Window Size: 32 K
End of igzip_example

igzip_example
Window Size: 32 K
End of igzip_example

-rw-rw-r--. 1 vlm vlm  28936 Oct 26 02:10 outfile-gzip.bin
-rw-rw-r--. 1 vlm vlm  61336 Oct 26 02:10 outfile-isal-NO_FLUSH.bin
-rw-rw-r--. 1 vlm vlm 133756 Oct 26 02:10 outfile-isal-SYNC_FLUSH.bin
[vlm@p1 igzip]$ ./test-compression.sh 16000
igzip_example
Window Size: 32 K
End of igzip_example

igzip_example
Window Size: 32 K
End of igzip_example

-rw-rw-r--. 1 vlm vlm  56637 Oct 26 02:10 outfile-gzip.bin
-rw-rw-r--. 1 vlm vlm 119510 Oct 26 02:10 outfile-isal-NO_FLUSH.bin
-rw-rw-r--. 1 vlm vlm 371824 Oct 26 02:10 outfile-isal-SYNC_FLUSH.bin
[vlm@p1 igzip]$ 

As you see, the ISA-L in 32K window mode performs much worse than gzip -1 pretty much all the time, but even more so in SYNC_FLUSH mode.

test-compression.sh file

#!/usr/bin/env bash

set -e

if [ -z "$1" ]; then
    echo "Usage: $0 <random_block_size>"
    exit
fi
BLOCK="$1"

# Two random blocks
dd if=/dev/urandom count=1 bs=${BLOCK} of=rnd1.bin 2>/dev/null
dd if=/dev/urandom count=1 bs=${BLOCK} of=rnd2.bin 2>/dev/null

# Create a large file out of repetition of two random blocks
rm -f infile.bin
for n in {1..100}; do
    cat rnd1.bin rnd2.bin >> infile.bin
done

# gzip -1 baseline.
cat infile.bin | gzip -1 > outfile-gzip.bin

for mode in NO_FLUSH SYNC_FLUSH; do

    sed -i "s/NO_FLUSH/${mode}/" igzip_example.c
    sed -i "s/SYNC_FLUSH/${mode}/" igzip_example.c

    cc -I../include -DLARGE_WINDOW -o igzip_example igzip_example.c  -L ../.libs -lisal

    ./igzip_example infile.bin outfile-isal-${mode}.bin
done

ls -al outfile-*.bin

from isa-l.

gbtucker avatar gbtucker commented on August 21, 2024

I looked into this case and it's not that large_window or sync_flush isn't working. The dataset is highly compressible with lots of large matches. That the 16000 case compresses shows that large_window must be finding matches of distance > 8k even with sync_flush on. After an initial set of literals there are mostly matches of max size 258. At the end of an input segment when sync_flush is on the input must be drained resulting in a partial match of less than 258. This results in a few more literals before max matches resume. Since the compression ratio is so high on this dataset a few literals makes a big difference. The other contributor is that when sync flush is on, a new header must be added at the start of each input block. This contributes over 100 bytes every 8k.

from isa-l.

vlm avatar vlm commented on August 21, 2024

What I see is that ISA-L does significantly (2 times) worse on my streaming data examples. I've prepared an apples-to-apples comparison code which executes zlib and isal in the same line-by-line mode (the line is about 1000 bytes long) and compared the output:

https://github.com/vlm/isa-l/commit/f77f252d3bb95659fbe7b97e82794e8eb418ba15

The scripts and the test file are in the above patch. The outcome:

./test-isal-vs-zlib.sh test.json

Observe:

-rw-rw-r--. 1 vlm vlm 121379 Oct 27 23:38 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 278236 Oct 27 23:38 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Oct 27 23:38 outfile-zlib-1.bin
-rw-rw-r--. 1 vlm vlm 592008 Oct 27 23:36 test.json

Note that gzip-1 and zlib's deflate in a line-per-line mode yields about 4.1 .. 4.8x compression, whereas isa-l yields out about 2.1x compression under the same conditions. Although ISA-L is very fast (thank you!), the 2.1 compression is not very useful, and it is indicative of some problem that will pop up for all customers trying to use a library in streaming contexts. The default zlib doesn't exhibit this problem.

from isa-l.

gbtucker avatar gbtucker commented on August 21, 2024

It's true I didn't expect a common usage to be sync_flush so often on such small buffers. Is sync_flush really necessary on every line?

Zlib probably puts a type 1, constant static block, for each line. Without sync_flush on small buffers we do much better.

version 2.16:
$ ./igzip_file_perf test.json
  file test.json - in_size=592008 out_size=152714 iter=844 ratio_default=25.8% ratio_custom=21.8%
  igzip_file: runtime =     878183 usecs, bandwidth 476 MB in 0.8782 sec = 568.96 MB/s

prelim next release:
$ ./igzip_file_perf test.json
  file test.json - in_size=592008 out_size=140757 iter=844 ratio_default=23.8% ratio_custom=19.1%
  igzip_file: runtime =     614582 usecs, bandwidth 476 MB in 0.6146 sec = 813.00 MB/s

Would this make it worth it to delay the sync_flush in your usage?

from isa-l.

vlm avatar vlm commented on August 21, 2024

The use case is streaming compression, such as WebSocket's RFC 7692. It requires a sync flush and mandates removing the terminating 00 00 ff ff sequence for each message transmitted. My test replicates this behavior for typical data representative to whatever we see in the wild (the data is anonymized, but has similar entropy).

So, as much as we'd like the to avoid spontaneous flushing, this flushing has semantic significance and can't be avoided for this class of applications.

from isa-l.

gbtucker avatar gbtucker commented on August 21, 2024

OK, perhaps there are some features we can add to help with the compression ratio in this case.

from isa-l.

vlm avatar vlm commented on August 21, 2024

Any news on that or plans to address somehow?

from isa-l.

gbtucker avatar gbtucker commented on August 21, 2024

Yes, we have pulled in the feature to do static (btype 01) headers. This is the real issue on compression ratio with very small inputs and sync flush on.

from isa-l.

vlm avatar vlm commented on August 21, 2024

Before the latest changes (i.e., copied from #9 (comment)):

./test-isal-vs-zlib.sh test.json

Observe:

-rw-rw-r--. 1 vlm vlm 121379 Oct 27 23:38 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 278236 Oct 27 23:38 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Oct 27 23:38 outfile-zlib-1.bin
-rw-rw-r--. 1 vlm vlm 592008 Oct 27 23:36 test.json

After the latest changes:

./test-isal-vs-zlib.sh test.json

-rw-rw-r--. 1 vlm vlm 121379 Jan 11 06:03 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 213130 Jan 11 06:03 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Jan 11 06:03 outfile-zlib-1.bin

From what I see, no tangible difference has been made on this type of load (278k->213k ISA-L vs 142k zlib). ISA-L still gives 50% less efficient compression than Z_BEST_SPEED zlib.

All the sources are in https://github.com/vlm/isa-l/commit/12aa33a1f6cce53a0f7e6ac3b5744056a94cb722, just ./configure --disable-shared CFLAGS=-DLARGE_WINDOW.

from isa-l.

gbtucker avatar gbtucker commented on August 21, 2024

When you have such small blocks and sync flushing each one set for static hufftables.

isal_deflate_set_hufftables(stream, NULL, IGZIP_HUFFTABLE_STATIC);

from isa-l.

vlm avatar vlm commented on August 21, 2024

With isal_deflate_set_hufftables:

New:
-rw-rw-r--. 1 vlm vlm 121379 Jan 12 22:17 outfile-gzip-1.bin
-rw-rw-r--. 1 vlm vlm 148602 Jan 12 22:17 outfile-isal.bin
-rw-rw-r--. 1 vlm vlm 142806 Jan 12 22:17 outfile-zlib-1.bin
-rw-rw-r--. 1 vlm vlm 592008 Jan 12 22:09 test.json

So, it is largely solved! Thank you very much!

from isa-l.

vlm avatar vlm commented on August 21, 2024

However, there's another anomaly.

I tried to do the custom hufftables based on a 456k file. Now here's the result with DEFAULT hufftables, STATIC hufftables, and then a pair of hufftables_default and hufftables_static that were generated by the generate_custom_hufftables utility.

I would expect the custom hufftables to be markedly more efficient (in STATIC or DEFAULT) modes than the default ones shipped with igzip. However, they aren't:

Uncompressed DEFAULT STATIC "_default" CUSTOM "_static" CUSTOM
456810 81502 81271 67787 81271

As you see, the custom hufftable in the DEFAULT mode works best, whereas in a STATIC mode the file outputs are exactly the same (81271 bytes!), irrespectively of whether the default (shipped) table is used or the generated one.

With another sample file:

Uncompressed DEFAULT STATIC "_default" CUSTOM "_static" CUSTOM
283375 102774 72137 80255 72137

And another:

Uncompressed DEFAULT STATIC "_default" CUSTOM "_static" CUSTOM
2315 1582 1360 1333 1360

In the last examples I am not finding any advantage of using generated custom static tables over the default static ones. Is this expected?

from isa-l.

gbtucker avatar gbtucker commented on August 21, 2024

That's great. I hope you see the point of isal_deflate_set_hufftables() is to give more control to the programmer especially when you know something about your data that you would rather not the compressor guess at. As such setting IGZIP_HUFFTABLE_STATIC will force the use of btype 01 hufftables and header as specified by the standard. If you changed the default hufftables this doesn't effect the format of the standard btype 01 static table so having the same compression ratio is expected. The benefit being the 01 header is 100 bytes shorter.
For your use case I would suggest a decision is made based on the size of input.

if (length < 1024)  // or some threshold where a 100B header is worth it
     set to btype 01 hufftable
else
     set to custom hufftable

Then you can have the best of both worlds.

from isa-l.

vlm avatar vlm commented on August 21, 2024

Thank you!

from isa-l.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.