emmanuel-marty / lzsa Goto Github PK

View Code? Open in Web Editor NEW

228.0 17.0 30.0 1.05 MB

Byte-aligned, efficient lossless packer that is optimized for fast decompression on 8-bit micros

License: Other

Makefile 0.16% Assembly 36.68% C 58.26% CMake 4.90%

compression 8-bit retrocomputing suffix-array 8088 c z80 6502 zx-spectrum c64 ibm-pc assembler 6809 6309

lzsa's Issues

improvement of literal run length & match length encoding

`low=LITERALS_RUN_LEN_V1 or MATCH_RUN_LEN_V1

if(len >= low)
if((len-=low)<254)
output len
else if((len-=254)<256)
output 254,len
else ouput 255,len&255,len>>8

EOD marker:
ouput 0,255,255,255(because it is greater than BLOCK_SIZE)`

Segmentation fault when compressing since commit on 2021-06-04

I tried updating my lzsa repo from its prior revision of 5cfec00 today. However, the compression crashed with a segmentation fault. A git bisect run ended up pointing to 5e404e9 as the first bad commit. This is on a Debian 11 amd64 server. Observe:

lzsa$ clang --version
Debian clang version 11.0.1-2
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
lzsa$ git checkout 65d6972 && make && lzsa -c -f2 --prefer-ratio -v debug.big /tmp/output.bin
Previous HEAD position was 5e404e9 Small LZSA2 ratio increase for some files
HEAD is now at 65d6972 Add link to streamed LZSA2 depacker for 8088
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_block_v2.c -o obj/src/shrink_block_v2.o
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_context.c -o obj/src/shrink_context.o
clang obj/src/lzsa.o obj/src/dictionary.o obj/src/expand_block_v1.o obj/src/expand_block_v2.o obj/src/expand_context.o obj/src/expand_inmem.o obj/src/expand_streaming.o obj/src/frame.o obj/src/matchfinder.o obj/src/shrink_block_v1.o obj/src/shrink_block_v2.o obj/src/shrink_context.o obj/src/shrink_inmem.o obj/src/shrink_streaming.o obj/src/stream.o obj/src/libdivsufsort/lib/divsufsort.o obj/src/libdivsufsort/lib/divsufsort_utils.o obj/src/libdivsufsort/lib/sssort.o obj/src/libdivsufsort/lib/trsort.o  -o lzsa
Compressed 'debug.big' in 0.358186 seconds, 0.26 Mb/s, 12710 tokens (7.66515 bytes/token), 97424 into 62853 bytes ==> 64.5149 %
Compared '/tmp/output.bin' in 0.000311 seconds, 298.748 Mb/s
lzsa$ git checkout 5e404e9 && make && lzsa -c -f2 --prefer-ratio -v debug.big /tmp/output.bin
Previous HEAD position was 65d6972 Add link to streamed LZSA2 depacker for 8088
HEAD is now at 5e404e9 Small LZSA2 ratio increase for some files
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_block_v2.c -o obj/src/shrink_block_v2.o
clang -O3 -g -fomit-frame-pointer -Isrc/libdivsufsort/include -Isrc -c src/../src/shrink_context.c -o obj/src/shrink_context.o
clang obj/src/lzsa.o obj/src/dictionary.o obj/src/expand_block_v1.o obj/src/expand_block_v2.o obj/src/expand_context.o obj/src/expand_inmem.o obj/src/expand_streaming.o obj/src/frame.o obj/src/matchfinder.o obj/src/shrink_block_v1.o obj/src/shrink_block_v2.o obj/src/shrink_context.o obj/src/shrink_inmem.o obj/src/shrink_streaming.o obj/src/stream.o obj/src/libdivsufsort/lib/divsufsort.o obj/src/libdivsufsort/lib/divsufsort_utils.o obj/src/libdivsufsort/lib/sssort.o obj/src/libdivsufsort/lib/trsort.o  -o lzsa
Segmentation fault

Dropping any of the switches doesn't help:

lzsa$ lzsa -c -f2 --prefer-ratio -v debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa -c -f2 --prefer-ratio debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa -c -f2 debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa -c debug.big /tmp/output.bin
Segmentation fault
lzsa$ lzsa debug.big /tmp/output.bin
Segmentation fault

Test file is available at https://pushbx.org/ecm/test/20220323/debug.big -- however it does not seem like it matters, trying to compress the readme of lzsa itself also fails:

lzsa$ ./lzsa README.md /tmp/output.bin
Segmentation fault

8088 decompressors won't run on 8088

Hi, there's a few instances of this:

shl al,2

...which actually require an 80186 or higher to work. These should be replaced with:

shl al,1
shl al,1

13-bit match offset description error

There's an error in description of 13-bit match in BlockFormat_LZSA2.md
You write:

10Z 13-bit offset: read a nibble for offset bits 9-12 and use the inverted bit Z for bit 8 of the offset, then read a byte for offset bits 0-7. set bits 13-15 of the offset to 1.

But you don't mention subtracting 512 from resulting value.

When creating my own unpacking code, I couldn't understand the reason of error. And found this value of 512 only in your sources.

And one more note about stream format (StreamFormat.md)

You write:

0    1                2
0x7b 0x9e             7 6 5 4 3 2 1
                      V V V Z Z Z Z
<--- signature --->   <- traits ->
Trait bits:

V: 3 bit code that indicates which block data encoding is used. 0 is LZSA1 and 2 is LZSA2.

Z: these bits in the traits are set to 0 for LZSA1 and LZSA2.

Where is zero bit ?
Your packer writes 20h into traits byte for LZSA2 and 0 for LZSA1. So it's must be four V bits (if V = 2 means LZSA2). Or five Z bits (but V = 1 means LZSA2 in that case, not V = 2).

8086 depacker for non-raw LZSA2 file

Hello,

I used your NASM source for the space-efficient 8088 LZSA2 raw decompressor to build my 8086 depacker for the lDOS/lDebug/RxDOS kernel file.

I use -f2 for the lzsa tool to create an LZSA2 compressed stream. My files are usually larger than 64 KiB so using raw blocks was not an option.

I adapted your source so it counts down the remaining lengths of the source and destination. This provides a simple type of error check. I also added checks for the end of the block data similar to your C source. I further made it so a back reference can cross a segment boundary by doing segment arithmetic. I changed the code not to use bp so I can use it as a stack frame base pointer throughout. Finally, I added support for overlapping source and destination buffers which checks the far pointers after every match to insure the source data is not corrupted.

Suggestion: Support for streamed decompression?

Hi there, thanks for this great project.

A while ago, I wrote a custom 8-bit oriented LZ4 compressor/decoder (but nowhere near as optimal as lzsa!), mainly to solve a particular problem of "streamed decompression", where we want to partially decompress data on the fly but without requiring full access to all of the previously decompressed data stream.

This is useful in 8-bit scenarios for example where we might be decompressing video or audio data to be consumed byte-by-byte through a small in-memory buffer, and it is not practical nor desirable to decompress the whole thing in one go due to memory or latency constraints.

In my custom modification to LZ4 I achieved this by just limiting the window size (similar to BLOCK_SIZE in lzsa I suspect) for match offsets, and setting it to some user provided command line value (in my use-case anywhere from 256 bytes to 2048 bytes).

In this way, we know the decoder will never need to persist more than WINDOW_SIZE previously decompressed bytes in memory, so all we need is a WINDOW_SIZE memory buffer on the decoder side, and some fairly trivial helper functions to supply decompressed bytes one at a time from the compressed data stream. (I just implemented a simple state machine in my 6502 decoder to keep a track of ongoing literal and match runs to facilitate fetching of individual bytes)

Naturally, setting a smaller window size for match offsets will degrade compression ratio, but we can happily accept that trade-off in exchange for the streamed decompression capability. I still achieved pretty decent ratios even with a tiny 256 byte window.

In summary, do you think the ability to specify the maximum match offset window size would be a feasible possibility for lzsa to support?
Thanks!

Feature request

Hi there,

i have been using a lz-packer derived from doynax-lz for a while with my 8 bit loader-system bitfire on c64. The pack-ration is about the same as yielded by lzsa. I had a look into the faster depacker and saved a few bytes and found a few spots where to squeeze out a few cycles. Your depacker is really fast. I am thinking of adding this packer to my loader-suite, or might even switch completely to lzsa. So far i lack support for loading-adresses on the packed files, as well as inplace depacking that allows for depacking without overlap. I think that this can also be implemented into lzsa, just as done in my lz. by treating everything that would reach into overlap as literal, and prepending the end-position to the file.

Greetings Toby aka bitbreaker/performers^oxyron

Suggestion: ZX0 and ZX1

The ZX0 project says that ZX7 is superseded by ZX0.
https://github.com/einar-saukas/ZX0
Also, a variant ZX1 exists.
Would it be interesting to add those in the comparison list and the pareto graph?

typo?

shrink_block_v2.c line 1033-1044:
comment is not correct. becouse @param nStartOffset and @param nEndOffset is not exist.

Bug in decompress_small_v1.S for 8088?

The following is compressed using lzsa on Linux : lzsa -f1 -r test.txt test.bin

00000000: 5468 6973 2069 7320 6120 7465 7374 2066  This is a test f
00000010: 6f72 204c 5a53 4131 2064 6563 6f6d 7072  or LZSA1 decompr
00000020: 6573 736f 7220 696e 2038 3038 3820 6173  essor in 8088 as
00000030: 7365 6d62 6c79 2e0a                      sembly..

Which results in the following.


00000000: 5054 6869 7320 fd7f 2961 2074 6573 7420  PThis ..)a test 
00000010: 666f 7220 4c5a 5341 3120 6465 636f 6d70  for LZSA1 decomp
00000020: 7265 7373 6f72 2069 6e20 3830 3838 2061  ressor in 8088 a
00000030: 7373 656d 626c 792e 0a00 ee00 00         ssembly......

However, when trying to decompress this using the decompress_small_v1.S, it doesn't appear to decompress properly.

00000000: 5468 6973 2000 0000 6120 7465 7374 2066  This ...a test f
00000010: 6f72 204c 5a53 4131 2064 6563 6f6d 7072  or LZSA1 decompr
00000020: 6573 736f 7220 696e 2038 3038 3820 6173  essor in 8088 as
00000030: 7365 6d62 6c79 2e0a                      sembly..

Any idea what's wrong and why "is" doesn't decompress? When I decompress with the lzsa app, it works fine.

msufsort

msufsort is usually faster than divsufsort.
https://github.com/michaelmaniscalco/msufsort

Format documentation typo in LZSA2.

Hello, Emmanuel.

Given the simplicity of LZSA2, I am using it as a replacement for the Heatshrink library. The role of the compression library is to assist in OS booting on a 32-bit ARMv7-A processor with megabytes of RAM, running at GHz speeds.

I am using a custom decoder that allows me to feed the decompressor byte-by-byte, just as is the case for Heatshrink. This has allowed me to cut on boot times since I can do something else while I wait for more compressed data to stream into my RAM, even if the decompression is no longer very fast. The bottleneck is the flash memory, which I can access in parallel to the decompression, so this was a worthwhile trade.

I have been relying on the GitHub documentation in creating my decoder. While doing so I have encountered this puzzle:

If the encoded match length is 7 or more, the 'M' bits in the token form the value 7, and an extra nibble is read:
0-14: the value is added to the ***3*** stored in the token, and then the minmatch of 2 is added, to compose the final match length.

Shouldn't this be 7? Or 3 bits?

Thank you for the hard work and for the beautiful file format.

[Question] Cortex M4 assembly decompressor

Salut Emmanuel,

Before I spend time in learning the ARM/Cortex M4 assembly, did you already code an assembly or C-optimized version of the decompressor ?

My hardware targets are STM32F4 and ATSMAD51.

Merci!

Can't decompress raw block backwards on 6502

Likely my own fault but I have;

Compressed data using

        auto packed_size =
            lzsa_compress_inmem(..., LZSA_FLAG_RAW_BACKWARD | LZSA_FLAG_RAW_BLOCK, 0, 1);

Verified the result using the lzma command line tool
Using 6502/decompress_fast_v1.asm for decompression, with -DBACKWARDS_DECOMPRESS=1
Packed data is at $880, unpacker at $400
Passing the address of the last byte (end-1) as source and a high memory location ($c000) as dest

I see the unpacker running, first reading 3 bytes, and then just writing and reading one byte at a time, going further and further down in memory, way past the beginning of the packed data, until it hits the depacker routine (at $400) and crashes.

The first 3 bytes read are the last 3 bytes of the compressed data (0x70 0x2c 0x00, in reverse order) like expected.

Any idea what could be wrong?

MC68000 depacker source

Hi there,

Just letting you know that there is a 68k depack source for both LZSA formats here: https://github.com/tattlemuss/lz4-m68k/blob/master/src/lzsa.s

At the author mentions, it's not super optimised but it's correct, so I figure it's better than starting from scratch :)

lzsa v1.3.6 crashes when built under linux

Hello,

I've update a previous version 1.2.0 of lzsa which compiled and ran fine under linux to 1.3.6.

The new version compiles successfully but fails when executed, reporting

** stack smashing detected ***: terminated

Failure can be invoked using:

lzsa -test

I am not adept at linux or c but am available to assist in testing to help resolve, please let me know what I can do to assist.

Thanks.

Faster 6502 decompressor & suggestions for possible format changes.

Hi, I've been trying out LZSA and I am really impressed with the work that you've done!

I have written a 6502 LZSA2 decompressor that seems to be a bit more efficient than Peter Ferrie's code that you are currently distributing.

I've also identified a couple of potential changes that you might consider making to the LZSA2 format to make it decompress a bit better on the 6502.

Here are some test results (from running the 6502 code on a HuC6280 which has slightly different instruction timings to a standard 6502) ...

lzsa2: Peter Ferrie's 6502 FAST decompression (code 253 bytes long) of gpl-3.0.txt (35147 bytes decompressed)

2723148 cycles : 77 cycles per byte decompressed.

lzsa2: 6502 elmer-standard decompression (code 252 bytes long, 1 bytes shorter) of gpl-3.0.txt (35147 bytes decompressed)

1937239 cycles with LZSA_BEST_SIZE==1 and LZSA_SLOW_NIBL==1 (29% faster than Peter Ferrie's code).

lzse2: 6502 elmer-modified decompression (code 243 bytes long, 10 bytes shorter) of gpl-3.0.txt (35147 bytes decompressed)

1912587 cycles with LZSA_BEST_SIZE==1 and LZSA_SLOW_NIBL==1 (30% faster than Peter Ferrie's code).

24652 cycles saved by changing format.

lzsa2: 6502 elmer-standard decompression (code 267 bytes long, 14 bytes longer) of gpl-3.0.txt (35147 bytes decompressed)

1813175 cycles with LZSA_BEST_SIZE==0 and LZSA_SLOW_NIBL==1 (33% faster than Peter Ferrie's code).

lzse2: 6502 elmer-modified decompression (code 258 bytes long, 5 bytes longer) of gpl-3.0.txt (35147 bytes decompressed)

1788523 cycles with LZSA_BEST_SIZE==0 and LZSA_SLOW_NIBL==1 (35% faster than Peter Ferrie's code).

1788523 cycles : 50 cycles per byte decompressed.

24652 cycles saved by changing format.

For comparison ...

aplib 6502 elmer-standard decompression (code 256 bytes long) of gpl-3.0.txt (35147 bytes decompressed)

2668144 cycles : 75 cycles per byte decompressed.

I'm attaching the decompressor here, and I have checked it into my fork of lzsa, together with the potential changes to the compressor that I wish to bring to your attention.

Note: Of the two potential changes, one seems like it wouldn't cause any slowdown in decompressors on other targets, but the second change probably would.

I can't currently see any simple and elegant way to make your compressor do a platform-specific change to the output format at runtime, but I don't consider that the changes that I propose could justify calling them a separate format.

Anyway, thanks for your consideration!

lzsa2_6502.txt

gcc with C++11 incompatibilities

Marty, many thanks for the LZSA code, I re-used it for my Sav2Cart utility here: https://github.com/nzeemin/ukncbtl-utils/tree/master/Sav2Cartridge

Just FYI, compiling the LZSA code under Ubuntu with gcc, in C++11 mode, I found two small incompatibilities, both are very easy to fix:
1.

lzsa/shrink_block_v2.c:444:28: error: ‘for’ loop initial declarations are only allowed in C99 mode
                            for (int nn = n;
                            ^
lzsa/shrink_block_v2.c:444:28: note: use option -std=c99 or -std=gnu99 to compile your code

lzsa/matchfinder.c:135:4: error: ‘for’ loop initial declarations are only allowed in C99 mode
    for (int r = 1; r < nInWindowSize; r++) {
    ^
lzsa/matchfinder.c:135:4: note: use option -std=c99 or -std=gnu99 to compile your code

fast match finder

Do you know libsais and esa-matchfinder? those are very fast.

emmanuel-marty / lzsa Goto Github PK

lzsa's Issues

Recommend Projects

Recommend Topics

Recommend Org