facebook / zstd Goto Github PK

Zstandard - Fast real-time compression algorithm

License: Other

Makefile 1.97% C 76.06% Shell 3.03% C++ 13.96% CMake 0.73% Batchfile 0.08% Python 2.56% Roff 0.62% Meson 0.48% Lua 0.03% Dockerfile 0.02% HTML 0.02% Starlark 0.17% Assembly 0.25% Swift 0.02%

zstd's Issues

test failures on big endian systems

Hi Yann,

The Debian build daemons have found some issues with the test suite on mips, powerpc and s390x systems. If you consider this a bug, the build logs are here. Otherwise, if you consider these architectures unsupported, I can remove them from the list of architectures that zstd will be built on.

Cheers,
Kevin

Question around ZSTD_decompressContinue

Hi @Cyan4973

If my understanding is correct, currently ZSTD_decompressContinue expects src to be the [compressed block + block header of next block]. This is a problem in scenarios where we want to decompress blocks independently using the framing format, e.g. we dont have the next block available yet.

Wouldn't it make more sense that ZSTD_decompressContinue takes src as [block header + compressed block] ? This way the current block can be decompressed without needing the header of the next block?

Zstd library should be compatible with static allocation

Requested by Dimitri
Original discussion : http://fastcompression.blogspot.fr/2015/01/zstd-stronger-compression-algorithm.html?showComment=1424173050454#c7703504284913974280

Segfault during decompression

Hello! Thanks for a great library!
I've recently encountered problem while decoding data generated by this script:

#!/usr/bin/env python

import sys

n = int(sys.argv[1])

for i in range(0, n):
    sys.stdout.write(chr(i + ord('a')) * (2**i))

When I run:

  ./generate.py 20 > input 
  ./zstd -f input compressed  # Compressed 1048575 bytes into 235 bytes ==> 0.02%
  ./zstd -f -d compressed decompressed # Segmentation fault

I get segmentation fault.

If I replace 20 with 19 I get:

  ./generate.py 19 > input 
  ./zstd -f input compressed  # Compressed 524287 bytes into 159 bytes ==> 0.03%
  ./zstd -f -d compressed decompressed # Decoded 262144 bytes

Which is weird because decompressed input size doesn't match original input size.

I faced this situation on OSX 10.10 and on Ubuntu 12.04 with both Clang and GCC 4.9

Out of bounds heap read in ZSTD_copy8

This file
https://crashes.fuzzing-project.org/zstd-oob-heap-ZSTD_copy8
causes an out of bounds heap read access in zstd. This can be seen with either address sanitizer or valgrind.

This was found with american fuzzy lop.

Address Sanitizer output:

==12888==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fc173c8104b at pc 0x0000004e939f bp 0x7ffe115e1a50 sp 0x7ffe115e1a48
READ of size 8 at 0x7fc173c8104b thread T0
    #0 0x4e939e in ZSTD_copy8 /f/zst/zstd/programs/../lib/zstd.c:158:56
    #1 0x4e939e in ZSTD_wildcopy /f/zst/zstd/programs/../lib/zstd.c:168
    #2 0x4e939e in ZSTD_execSequence /f/zst/zstd/programs/../lib/zstd.c:1337
    #3 0x4e939e in ZSTD_decompressSequences /f/zst/zstd/programs/../lib/zstd.c:1436
    #4 0x4e939e in ZSTD_decompressBlock /f/zst/zstd/programs/../lib/zstd.c:1473
    #5 0x4e68b2 in ZSTD_decompressContinue /f/zst/zstd/programs/../lib/zstd.c:1622:21
    #6 0x52dcbf in FIO_decompressFrame /f/zst/zstd/programs/fileio.c:396:23
    #7 0x52e721 in FIO_decompressFilename /f/zst/zstd/programs/fileio.c:492:21
    #8 0x530fe3 in main /f/zst/zstd/programs/zstdcli.c:352:9
    #9 0x7fc172bf4f9f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.20-r2/work/glibc-2.20/csu/libc-start.c:289
    #10 0x4377c6 in _start (/mnt/ram/zstd/zstd+0x4377c6)

0x7fc173c8104b is located 3 bytes to the right of 141384-byte region [0x7fc173c5e800,0x7fc173c81048)
allocated by thread T0 here:
    #0 0x4be792 in __interceptor_malloc (/mnt/ram/zstd/zstd+0x4be792)
    #1 0x4e627c in ZSTD_createDCtx /f/zst/zstd/programs/../lib/zstd.c:1560:35
    #2 0x530fe3 in main /f/zst/zstd/programs/zstdcli.c:352:9

SUMMARY: AddressSanitizer: heap-buffer-overflow /f/zst/zstd/programs/../lib/zstd.c:158 ZSTD_copy8
Shadow bytes around the buggy address:
  0x0ff8ae7881b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff8ae7881c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff8ae7881d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff8ae7881e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0ff8ae7881f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff8ae788200: 00 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa
  0x0ff8ae788210: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff8ae788220: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff8ae788230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff8ae788240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0ff8ae788250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==12888==ABORTING

Standard names for compression levels

I'm currently updating my wrapper for the new API introduced in 0.4.x, and I'm merging all the methods with and without compression levels into a single set of methods.

Asking the caller to pass an untyped integer value for the compression level seems a bit dangerous: most people will probably remember zlib, and expect a 1 - 9 scale with default at 5. That's why I would like to have an enum with well known names, and a clear default "if you don't know better use that one" level.

Quick questions:

Is the default level still intended to be level 1? or is the whole zstd vs zstd_hc a thing of the past?
What about level 0? The comment in zstd_static.h says that level 0 is "never used", but quick tests show that using compression level 0 works fine (and compress about the same as level 1)
Are there plans to have standardized names for some compression levels, like "fast", "high", "ultra" and so on? And if yes, what would be the values?
Right now max level is 20 but it seems that it can change. If yes, do you plan to have some way to, at runtime, probe for the range of supported compression levels?

use case: HTTP/2 HPACK ?

The new HTTP compresses headers, with a static-expanding table:
talk: https://youtu.be/r5oT_2ndjms?list=PLNYkxOF6rcICcHeQY02XLvoGL34rZFWZn&t=820
spec: https://httpwg.github.io/specs/rfc7541.html
implementation: https://github.com/twitter/hpack/tree/master/hpack/src/main/java/com/twitter/hpack

Might make cookies make a comeback.

Stack buffer overflow in v043

VS2013 and Xcode711/Asan detect a stack buffer overflow in a released v0.4.3 with a specific input data.

How to repro

Use the following PVRTC4 compressed sample image with a fullbench app, called without additional parameters.
https://www.dropbox.com/s/tlgr7lxpmtiq4yw/sample.pvr?dl=0

Output with VS2013-update5, Windows7:

*** Zstandard speed analyzer  32-bits, by Yann Collet (Dec 10 2015) ***
 D:\work\zstd\release-v043\visual\2012\sample.pvr :
 1- ZSTD_compress                  :     3.4 MB/s  (  2120226)
11- ZSTD_decompress                :     7.4 MB/s  (  2796340)
31- ZSTD_decodeLiteralsBlock       :    11.8 MB/s  (    74273)
 1- ZSTD_decodeSeqHeaders          :


Run-Time Check Failure #2 - Stack around the variable 'DTableOffb' was corrupted.


>   fullbench.exe!local_ZSTD_decodeSeqHeaders(void * dst, unsigned int dstSize, void * buff2, const void * src, unsigned int srcSize) Line 244  C
    fullbench.exe!benchMem(void * src, unsigned int srcSize, unsigned int benchNb) Line 358 C
    fullbench.exe!benchFiles(char * * fileNamesTable, int nbFiles, unsigned int benchNb) Line 468   C
    fullbench.exe!main(int argc, char * * argv) Line 584    C
    [External Code] 
    [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]

Output with Xcode711, Clang700.1.76, enabled ASan, iOS9.1:

AddressSanitizer debugger support is active. Memory error breakpoint has been installed and you can now use the 'memory history' command.
2015-12-10 12:48:24.980 TestLZ4[2018:627417] Started
*** Zstandard speed analyzer  64-bits, by Yann Collet (Dec 10 2015) ***
Loading /var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/sample.pvr...       


 /var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/sample.pvr : 
 1- ZSTD_compress                  : 
 1- ZSTD_compress                  :     4.1 MB/s  (  2120226)
 2- ZSTD_compress                  : 
 2- ZSTD_compress                  :     4.1 MB/s  (  2120226)
 3- ZSTD_compress                  : 
 3- ZSTD_compress                  :     4.1 MB/s  (  2120226)
 4- ZSTD_compress                  : 
 4- ZSTD_compress                  :     4.1 MB/s  (  2120226)
 5- ZSTD_compress                  : 
 5- ZSTD_compress                  :     4.1 MB/s  (  2120226)
 6- ZSTD_compress                  : 
 6- ZSTD_compress                  :     4.1 MB/s  (  2120226)
 1- ZSTD_compress                  :     4.1 MB/s  (  2120226)
 1- ZSTD_decompress                : 
 1- ZSTD_decompress                :    15.0 MB/s  (  2796340)
 2- ZSTD_decompress                : 
 2- ZSTD_decompress                :    15.0 MB/s  (  2796340)
 3- ZSTD_decompress                : 
 3- ZSTD_decompress                :    15.0 MB/s  (  2796340)
 4- ZSTD_decompress                : 
 4- ZSTD_decompress                :    15.0 MB/s  (  2796340)
 5- ZSTD_decompress                : 
 5- ZSTD_decompress                :    15.0 MB/s  (  2796340)
 6- ZSTD_decompress                : 
 6- ZSTD_decompress                :    15.0 MB/s  (  2796340)
11- ZSTD_decompress                :    15.0 MB/s  (  2796340)
 1- ZSTD_decodeLiteralsBlock       : 
 1- ZSTD_decodeLiteralsBlock       :    26.1 MB/s  (    74273)
 2- ZSTD_decodeLiteralsBlock       : 
 2- ZSTD_decodeLiteralsBlock       :    26.1 MB/s  (    74273)
 3- ZSTD_decodeLiteralsBlock       : 
 3- ZSTD_decodeLiteralsBlock       :    26.1 MB/s  (    74273)
 4- ZSTD_decodeLiteralsBlock       : 
 4- ZSTD_decodeLiteralsBlock       :    26.1 MB/s  (    74273)
 5- ZSTD_decodeLiteralsBlock       : 
 5- ZSTD_decodeLiteralsBlock       :    26.1 MB/s  (    74273)
 6- ZSTD_decodeLiteralsBlock       : 
 6- ZSTD_decodeLiteralsBlock       :    26.1 MB/s  (    74273)
31- ZSTD_decodeLiteralsBlock       :    26.1 MB/s  (    74273)
 1- ZSTD_decodeSeqHeaders          : 
=================================================================
==2018==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016fd84ee2 at pc 0x0001000c29ec bp 0x00016fd81430 sp 0x00016fd81428
WRITE of size 1 at 0x00016fd84ee2 thread T0
    #0 0x1000c29eb in FSE_buildDTable (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x10004a9eb)
    #1 0x10009ff7f in ZSTD_decodeSeqHeaders (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100027f7f)
    #2 0x1000bcdaf in local_ZSTD_decodeSeqHeaders (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100044daf)
    #3 0x1000bd683 in benchMem (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100045683)
    #4 0x1000be247 in benchFiles (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100046247)
    #5 0x1000bee27 in zstd_start_benchmark (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100046e27)
    #6 0x10009d15b in -[ViewController doAsyncTestButton:] (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x10002515b)
    #7 0x189ca7cfb in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x4fcfb)
    #8 0x189ca7c77 in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x4fc77)
    #9 0x189c8f92f in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x3792f)
    #10 0x189cb03cb in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x583cb)
    #11 0x189ca7013 in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x4f013)
    #12 0x189c9fcdb in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x47cdb)
    #13 0x189c704a3 in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x184a3)
    #14 0x189c6e76b in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x1676b)
    #15 0x184694543 in <redacted> (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0xdc543)
    #16 0x184693fd7 in <redacted> (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0xdbfd7)
    #17 0x184691cd7 in <redacted> (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0xd9cd7)
    #18 0x1845c0c9f in CFRunLoopRunSpecific (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0x8c9f)
    #19 0x18f7fc087 in GSEventRunModal (/System/Library/PrivateFrameworks/GraphicsServices.framework/GraphicsServices+0xc087)
    #20 0x189cd8ffb in UIApplicationMain (/System/Library/Frameworks/UIKit.framework/UIKit+0x80ffb)
    #21 0x1000f724f in main (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x10007f24f)
    #22 0x199ade8b7 in <redacted> (/usr/lib/system/libdyld.dylib+0x28b7)

Address 0x00016fd84ee2 is located in stack of thread T0 at offset 12578 in frame
    #0 0x1000bcb27 in local_ZSTD_decodeSeqHeaders (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100044b27)

  This frame has 6 object(s):
    [32, 8224) 'DTableML'
    [8480, 12576) 'DTableLL' <== Memory access at offset 12578 overflows this variable
    [12704, 14752) 'DTableOffb'
    [14880, 14888) 'dumps'
    [14912, 14920) 'length'
    [14944, 14948) 'nbSeq'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ??:0 FSE_buildDTable
Shadow bytes around the buggy address:
  0x00014e1b0980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00014e1b0990: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00014e1b09a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00014e1b09b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00014e1b09c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x00014e1b09d0: 00 00 00 00 00 00 00 00 00 00 00 00[f2]f2 f2 f2
  0x00014e1b09e0: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
  0x00014e1b09f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00014e1b0a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00014e1b0a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x00014e1b0a20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==2018==ABORTING
AddressSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.

(lldb) bt
* thread #1: tid = 0x992d9, 0x00000001001650c4 libclang_rt.asan_ios_dynamic.dylib`__asan::AsanDie(), queue = 'com.apple.main-thread', stop reason = Stack buffer overflow detected
    frame #0: 0x00000001001650c4 libclang_rt.asan_ios_dynamic.dylib`__asan::AsanDie()
    frame #1: 0x0000000100168b80 libclang_rt.asan_ios_dynamic.dylib`__sanitizer::Die() + 44
    frame #2: 0x0000000100163ed4 libclang_rt.asan_ios_dynamic.dylib`__asan::ScopedInErrorReport::~ScopedInErrorReport() + 336
    frame #3: 0x0000000100163c6c libclang_rt.asan_ios_dynamic.dylib`__asan::ScopedInErrorReport::~ScopedInErrorReport() + 12
    frame #4: 0x00000001001637e8 libclang_rt.asan_ios_dynamic.dylib`__asan_report_error + 3216
    frame #5: 0x00000001001641f8 libclang_rt.asan_ios_dynamic.dylib`__asan_report_store1 + 44
  * frame #6: 0x00000001000c29ec TestLZ4`FSE_buildDTable(dt=0x000000016fd83ee0, normalizedCounter=0x000000016fd818f0, maxSymbolValue=63, tableLog=10) + 856 at fse.c:373
    frame #7: 0x000000010009ff80 TestLZ4`ZSTD_decodeSeqHeaders(nbSeq=0x000000016fd85820, dumpsPtr=0x000000016fd857e0, dumpsLengthPtr=0x000000016fd85800, DTableLL=0x000000016fd83ee0, DTableML=0x000000016fd81de0, DTableOffb=0x000000016fd84f60, src=0x000000010a574800, srcSize=11133) + 3100 at zstd_decompress.c:377
    frame #8: 0x00000001000bcdb0 TestLZ4`local_ZSTD_decodeSeqHeaders(dst=0x000000010a2bc800, dstSize=2818710, buff2=0x000000010a574800, src=0x0000000108404800, srcSize=131072) + 664 at zstd_fullbench.c:243
    frame #9: 0x00000001000bd684 TestLZ4`benchMem(src=0x0000000108404800, srcSize=131072, benchNb=32) + 2116 at zstd_fullbench.c:358
    frame #10: 0x00000001000be248 TestLZ4`benchFiles(fileNamesTable=0x000000016fd85e98, nbFiles=1, benchNb=32) + 1100 at zstd_fullbench.c:468
    frame #11: 0x00000001000bee28 TestLZ4`zstd_start_benchmark(argc=2, argv=0x000000016fd85e90) + 2188 at zstd_fullbench.c:584
    frame #12: 0x000000010009d15c TestLZ4`-[ViewController doAsyncTestButton:](self=0x0000000107507880, _cmd="doAsyncTestButton:", sender=<unavailable>) + 792 at ViewController.mm:55
    frame #13: 0x0000000189ca7cfc UIKit`-[UIApplication sendAction:to:from:forEvent:] + 100
    frame #14: 0x0000000189ca7c78 UIKit`-[UIControl sendAction:to:forEvent:] + 80
    frame #15: 0x0000000189c8f930 UIKit`-[UIControl _sendActionsForEvents:withEvent:] + 416
    frame #16: 0x0000000189cb03cc UIKit`-[UIControl touchesBegan:withEvent:] + 268
    frame #17: 0x0000000189ca7014 UIKit`-[UIWindow _sendTouchesForEvent:] + 376
    frame #18: 0x0000000189c9fcdc UIKit`-[UIWindow sendEvent:] + 784
    frame #19: 0x0000000189c704a4 UIKit`-[UIApplication sendEvent:] + 248
    frame #20: 0x0000000189c6e76c UIKit`_UIApplicationHandleEventQueue + 5528
    frame #21: 0x0000000184694544 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
    frame #22: 0x0000000184693fd8 CoreFoundation`__CFRunLoopDoSources0 + 540
    frame #23: 0x0000000184691cd8 CoreFoundation`__CFRunLoopRun + 724
    frame #24: 0x00000001845c0ca0 CoreFoundation`CFRunLoopRunSpecific + 384
    frame #25: 0x000000018f7fc088 GraphicsServices`GSEventRunModal + 180
    frame #26: 0x0000000189cd8ffc UIKit`UIApplicationMain + 204
    frame #27: 0x00000001000f7250 TestLZ4`main(argc=1, argv=0x000000016fd87a90) + 124 at main.m:16
    frame #28: 0x0000000199ade8b8 libdyld.dylib`start + 4
(lldb)

Possible consequences

In an existing quite a big iOS app the invocation of ZSTD_decompress (exactly this function, not its internals) may just crash after several dozens of successful calls, it looks like a problem in a corrupted stack. I didn't manage to localize a problem in a separated sample, only found an issue with a fullbench above. Still not totally sure whether both issues have the same root, but it may be.

v0.5 dictionary question

So it seems that zstd can now help with serving small amounts of JSON (say from a NoSQL db), assuming that the data is similar (eg common object names)?

Compressing individual documents with a dictionary produced from a sampled batch to get better compression ratio

I'm very excited to see work being done on dictionary support in the API, because this is something that could greatly help me solve a pressing problem.

Context

In the context of a Document Store, where we are storing a set of JSON-like documents that are sharing the same schema. Each document can be created, read or updated individually in a random fashion. We would like to compress the documents on disk, but there is very little redundancy within each document, which yields very poor compression ratio (maybe 10-20%).
When compressing batchs of 10s or 100s documents, the compression ratio gets really good (10x, 50x or sometimes even more), because there is a lot of redundancy between documents, from:

the structure of the JSON itself which has a lot of ": ", or ": true, or {[[...]],[[...]]} symbols.
the names of the JSON fields: "Id", "Name", "Label", "SomeVeryLongFieldNameThatIsPresentOnlyOncePerDocument", etc..
frequent values like constants (true, "Red", "Administrator", ...), keywords, dates that start with 2015-12-14T.... for the next 24h, and even well-known or frequently used GUIDs that are shared by documents (Product Category, Tag Id, hugely popular nodes in graph databases, ...)

In the past, I used femtozip (https://github.com/gtoubassi/femtozip) which is intended precisely for this use case. It includes a dictionary training step (by building a sample batch of documents), that is then used to compress and decompress single documents, with the same compression ratio as if it was a batch. Using real life data, compressing 1000 documents individually would give the same compression ratio as compressing all 1000 documents in a batch with gzip -5.

The dictionary training part of femtozip can be very long: the more samples, the better the compression ratio would be in the end but you need tons of RAM to train it.

Also, I realized that femtozip would sometimes offset the differences in size between different formats like JSON/BSON/JSONB/ProtoBuf and other binary formats, because it would pick up the "grammar" of the format (text or binary) in the dictionary, and only deal with the "meat" of the documents (guids, integers, doubles, natural text) when compressing. This means I can use a format like JSONB (used by Postgres) which is less compact, but is faster to decode at runtime than JSON text.

Goal

I would like to be able to do something similar with Zstandard. I don't really care about building the most efficient dictionary (though it could be nice), but at least being able to exploit the fact that FSE builds a list of tokens sorted by frequency. Extracting this list of tokens may help in building a dictionary that will have the most common tokens in the training batch.

The goal would be:

For each new or modified document D, compress it AS IF we were compressing SAMPLES[cur_gen] + D.json, and only storing the bits produced by the D.json part.
When reading document D, decompress it AS IF we had the complete compressed version of SAMPLES[D.gen] + D.compressed, and only keeping the last decoded bits that make up D.

Since it would be impractical to change the compression code to be able to know which compressed bits are from D, and which from the batch, we could aproximate this by computing a DICTIONARY[gen] that would be used to initialize the compressor and decompressor.

Idea

Start by serializing an empty object into JSON (we would get the json structure and all the field names, but no values)
Use this as the the initial "gen 0" dictionary for the first batch of documents (when starting with an empty database)
After N documents, sample k random documents and compress them to produce a "generation 1" dictionary.
Compress each new or updated document (individually) with this new dictionary
After another N documents, or if some heuristic shows that compression ratio starts declining, then start a new generation of dictionary.

The Document Store would durably store each generations of dictionaries, and use them to decompress older entries. Periodically, it could recycle the entire store by recompressing everything with the most recent dictionary.

Concrete example:

Training set:

{ "id": 123, "label": "Hello", "enabled": true, "uuid": "9ad51b87-d627-4e04-85c2-d6cb77415981" }
{ "id": 126, "label": "Hell", "enabled": false, "uuid": "0c8e13a5-cdc8-4e1f-8e80-4fee025ee59c" }
{ "id": 129, "label": "Help", "enabled": true, "uuid": "fe6db321-cddd-4e7f-b3d6-6b38365b3e2a" }

Looking at it, we can extract the following repeating segments: { "id": 12.., "label": "Hel... ", "enabled": ... e, "uuid": " ... " }, which could be condensed into:

{ "id": 12, "label": "Hel", "enabled": e, "uuid":"" } (53 bytes shared by all docs)

The unique part of each documents would be:

...3...lo...tru...9ad51b87-d627-4e04-85c2-d6cb77415981 (42 bytes)
...6...l...fals...0c8e13a5-cdc8-4e1f-8e80-4fee025ee59c (42 bytes)
...9......tru...fe6db321-cddd-4e7f-b3d6-6b38365b3e2a (40 bytes)

Zstd would only have to work on 42 bytes per doc, instead of 85 bytes. More realistic documents will have a lot more stuff in common than this example.

What I've tested so far

create "gen0" dictionary with "hollow" JSON: { "id": , "foo": "", "bar": "", ....} produced by removing all values from the JSON document.
using ZSTD_compress_insertDictionary, compressing { "id": 123, "foo": "Hello", "bar": "World", ...} is indeed smaller than without dictionary.
looking at cctx->litStart, I can see a buffer with 123HelloWorld which is exactly the content specific to the document itself that got removed when producing the gen0 dict.

Maybe one way to construct a better dictionary would be:

compress the batch of random and complete document (with values)
take K first symbols ordered by frequency descending
create dictionary by outputing symbol K-1, then K-2, up to 0 (I guess that if most frequent symbol is to the end of the dictionary, offsets to it would be smaller?)
maybe one could ask for a target dictionary size, and K would be the number of symbols needed to fill the dictionary?

What I'm not sure about

I don't know how having a dictionary would help with larger documents above 128KB or 256KB. Currently I'm only inserting the dictionary for the first block. Would I need to reuse the same dictionary for each 128KB block?
What is the best size for this dictionary? 16KB? 64KB? 128KB?
ZSTD_decompress_insertDictionary branches off into different implementation for lazy, greedy and so on. I'm not sure if all compression strategy can be used to produce such a dictionary?

Again, I don't care about producing the ideal dictionary that produces the smallest result possible, only something that would give me about better compression ratio, while still being able to handle documents in isolation.

ZSTD_decompressContinue requires previous decompressed buffer

ZSTD_decompressContinue() requires previous decompressed buffer, which is incompatible with zstd_static.h document.

PS. fileio.c is GPL again... Do you plan to consider? Also, FIO_decompressFilename()'s wNbBlocks is a kind of magic number (because undocumented).

Note: You can use http://pastebin.com/XTTUSyiA as an example code.

Tag error enum

The enum in error_public.h is anonymous. It would be nice if you included a tag or a typedef, if only to help make -Wswitch / -Wswitch-enum usable. AFAIK there is currently no way to get the compiler to warn if a switch does not include a case for every enum value. I would like to be able to do something like

size_t res = …;
if (ZSTD_isError(res)) {
  switch ((ZSTD_ErrorCode) -res) {
    case ZSTD_error_No_Error:
      /* … */
      break;
  }
}

Note the cast in the controlling expression; without it the type will just be size_t, and the compiler doesn't realize that there is any association to the enum so it will not emit a warning when there is no case for a particular enum value.

FWIW, I still think it would be easier to do something like

typedef enum {
  ZSTD_error_No_Error = 0,
  ZSTD_error_GENERIC,
  ZSTD_error_prefix_uknown,
  …
} ZSTD_ErrorCode;

ZSTD_ErrorCode ZSTD_getError(size_t code);

ZSTD_getError would work just like ZSTD_isError does now; you could still do if (ZSTD_getError(code)) { … } since ZSTD_error_No_Error would be 0 and errors would be positive integers, but it would make the API a bit easier to understand since you hide the rather weird detail of negating an unsigned type.

Another thing that would be nice is a consistency check. For example, note that each of the values I wrote in the enum above (chosen because they're the first three entries in the enum in zstd, not to make this point) have different capitalization conventions.

ZSTD_ERROR_GENERIC failure on sample data

I've run into a bit of data which fails to compress. The failure case is a binary file in the following gist:

https://gist.github.com/mwiebe/c54c790288b8e16a7970

C:\Dev>dir bad.bin
 Volume in drive C is Windows
 Volume Serial Number is DA14-224A

 Directory of C:\Dev

2015-03-23  02:17 PM             2,469 bad.bin
               1 File(s)          2,469 bytes
               0 Dir(s)  273,715,097,600 bytes free

C:\Dev>"C:\Dev\zstd\visual\2012\x64\Release\zstd.exe" bad.bin out.bin
Error 24 : Compression error : ZSTD_ERROR_GENERIC

Thread safety

Hi,

I have tried to run some tests in parallel and noticed them failing non-deterministically:

ZSTD_decompressContinue fails sometimes when executed in parallel with ZSTD_compress
ZSTD_compressContinue sometimes gives wrong results when executed in parallel with ZSTD_decompressContinue
and the other way around: ZSTD_decompressContinue fails when executed in parallel with ZSTD_compressContinue.

When I say fails it is with "corruption_detected" error. When I say "non-deterministically" it means that consecutive run with the exact same inputs succeeds. I was suspecting my code is not thread safe so I ran in parallel different classes that don't share any code except the libzstd binary (like the above cases) to rule out my own faults. One additional observation: decompressContinue fails only if the original size exceed some threshold, e.g. 1M for levels 1,3,6; 2M for level 9. The parallel Zstd_compress is running with random small bufferes (0-32k) when decompressContinue fails.

So some questions:

is there anything that should be synchronized?
I am still not sure if I am not doing something stupid so some advises where I should be careful are welcome.

Regards,
luben

Unaligned accesses

When compiling with -fsanitize=undefined:

/buffer/zstd/zstd: /home/nemequ/local/src/squash/plugins/zstd/zstd/lib/zstd.c:918:54: runtime error: load of misaligned address 0x000000401b49 for type 'const void', which requires 8 byte alignment
0x000000401b49: note: pointer points here
 00 00 00  4c 6f 72 65 6d 20 69 70  73 75 6d 20 64 6f 6c 6f  72 20 73 69 74 20 61 6d  65 74 2c 20 63
              ^ 
/home/nemequ/local/src/squash/plugins/zstd/zstd/lib/zstd.c:183:44: runtime error: load of misaligned address 0x000000401b49 for type 'const void', which requires 4 byte alignment
0x000000401b49: note: pointer points here
 00 00 00  4c 6f 72 65 6d 20 69 70  73 75 6d 20 64 6f 6c 6f  72 20 73 69 74 20 61 6d  65 74 2c 20 63
              ^ 
/home/nemequ/local/src/squash/plugins/zstd/zstd/lib/zstd.c:185:47: runtime error: load of misaligned address 0x000000401b51 for type 'const void', which requires 8 byte alignment
0x000000401b51: note: pointer points here
 20 69 70  73 75 6d 20 64 6f 6c 6f  72 20 73 69 74 20 61 6d  65 74 2c 20 63 6f 6e 73  65 63 74 65 74
              ^ 
OK

new compression google/brotli

https://github.com/google/brotli
Would be nice to see on your performance test chart. Perhaps you guys can learn from each other?

vs gzip & Zophli?

I'm curious if this is compress web-compatible please?

Segmentation fault during compression inside FSE_normalizeCount

Hi Yann,

I get a segmentation fault during compression of a specific data buffer inside FSE_normalizeCount
The specific line is the call to FSE_adjustNormSlow at https://github.com/Cyan4973/zstd/blob/dev/lib/fse.c#L696

The frame stack is all screwed up examining core file, but I managed to get the following output when run through valgrind:

==32677== Invalid write of size 8
==32677==    at 0x64DD4E7: FSE_normalizeCount (fse.c:696)
==32677==    by 0x157: ???
==32677==    by 0xFFEFF33AF: ???
==32677==    by 0x9000000FE: ???
==32677==    by 0xFFEFF31AF: ???
==32677==    by 0x40012FFFFFFFF: ???
==32677==    by 0x100FFFFFF9C: ???
==32677==  Address 0xd5 is not stack'd, malloc'd or (recently) free'd

I can reliable reproduce this fault, so I might be able to provide a test case for you given some time.

HC_compress issues

Hi,

I am running into some issues with decompressing some of the results of ZSTD_HC_compress. I am using the very simple test case pasted below.

Trying to decompress the result of compressing 0 bytes buffer with level > 1 I am getting "ZSTD_error_corruption_detected" and with larger sizes buffer I am getting "ZSTD_error_srcSize_wrong". When I pass a certain threshold (15 in this case but it depends on the payload) everything starts to works correctly.

#include <stdio.h>
#include <stdlib.h>
#include <zstd.h>
#include <zstdhc.h>

int main(int argc, char **argv ) {
    char raw[20] = {1,1,1,1,1, 1,1,1,1,1, 1,1,1,1,1, 1,1,1,1,1};
    char compressed[50];
    char decompressed[50];
    size_t ccode, dcode;
    size_t size = 2;
    int level = 2;

    ccode = ZSTD_HC_compress(compressed, 50, raw, size, level);
    printf("Compression code %i\n", ccode);
    if (ZSTD_isError(ccode)) {
        printf("Compression error %s\n", ZSTD_getErrorName(ccode));
    }

    dcode = ZSTD_decompress(decompressed, 50, compressed, ccode);
    printf("Decompression code %i\n", dcode);
    if (ZSTD_isError(dcode)) {
        printf("Decompression error %s\n", ZSTD_getErrorName(dcode));
    }
    return 0;
}

compilation problems witv v0.4.0

https://github.com/Cyan4973/zstd/archive/zstd-0.4.0.zip

Changed only:

define ZSTD_LEGACY_SUPPORT 0

GCC returns tens of errors (mainly undeclared functions):

gcc.exe -Wno-unknown-pragmas -Wno-sign-compare -Wno-conversion -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math -O3 -DNDEBUG -DFREEARC_WIN -D__x86_64__ -D__SSE2__ -I. -DFREEARC_INTEL_BYTE_ORDER -D_UNICODE -DUNICODE -HAVE_CONFIG_H zstd/zstd.c -std=c99 -c -o zstd/zstd.o
zstd/zstd.c:608:8: error: conflicting types for 'ZSTD_compressBegin'
 size_t ZSTD_compressBegin(ZSTD_CCtx* ctx, void* dst, size_t maxDstSize)
        ^
In file included from zstd/zstd.c:70:0:
zstd/zstd_static.h:124:8: note: previous declaration of 'ZSTD_compressBegin' was here
 size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, void* dst, size_t maxDstSize, int compressionLevel);
        ^
zstd/zstd.c: In function 'ZSTD_compressBegin':
zstd/zstd.c:617:24: error: 'ZSTD_magicNumber' undeclared (first use in this function)
     MEM_writeLE32(dst, ZSTD_magicNumber);
                        ^
zstd/zstd.c:617:24: note: each undeclared identifier is reported only once for each function it appears in
zstd/zstd.c: At top level:
zstd/zstd.c:774:8: error: conflicting types for 'ZSTD_compressCCtx'
 size_t ZSTD_compressCCtx(ZSTD_CCtx* ctx, void* dst, size_t maxDstSize, const void* src, size_t srcSize)

zstd crashes on decoding invalid archives

Hi. It seems that zstd will read illegal pointers and crash when presented with mangled archives. Here's one such example file (GitHub doesn't allow binary attachments, so I'm providing a hex dump):

0000000    fd  2f  b5  1c  00  00  1c  40  00  12  31  32  31  31  31  31
0000020    31  31  31  31  32  32  32  32  32  32  32  0a  10  98  00  ff
0000040    7f  00  84  c0  00  00

Here's what gdb has to say about this problem:

(gdb) run -d <example.zst >example
Starting program: zstd -d <example.zst >example

Program received signal SIGSEGV, Segmentation fault.
0x0000000000410965 in ZSTD_decompressBlock (srcSize=28, src=0x801011000, maxDstSize=524288, dst=0x801032000, ctx=0x801006000) at lib/zstd.c:1533
(gdb) bt
#0  0x0000000000410965 in ZSTD_decompressBlock (srcSize=28, src=0x801011000, maxDstSize=524288, dst=0x801032000, ctx=0x801006000) at lib/zstd.c:1533
#1  ZSTD_decompressContinue (dctx=0x801006000, dst=0x801032000, maxDstSize=524288, src=0x801011000, srcSize=31) at lib/zstd.c:1680
#2  0x0000000000408681 in FIO_decompressFilename (output_filename=0x410f65 "-",input_filename=0x410f65 "-") at programs/fileio.c:363
#3  0x0000000000401a4d in main (argc=2, argv=0x7fffffffd9d0) at programs/zstdcli.c:314

This is with zstd as of commit 00f9507; the crash is located over here. The problem is that ZSTD_decompressBlock does not validate how big matchLength can get; in this case it is equal to 8650883, while the maxDstSize is only 524288 bytes, which results in an attempt to copy past the end of the output buffer.

Feature request: build only compressor

Could you add #define to optionally disable compilation of decompression code? Thanks.

searchLength=7 aliases searchLength=4

searchLength=7 in ZSTD_parameters is permitted, but it seems to give exactly the same compression and speed results as searchLength=4. Possibly this is because of the switches on matchLengthSearch in the code that have cases for 4-6 but not 7.

VS2008 Compilation Issue: Cannot open include file: 'immintrin.h'

I was looking at making simple python wrappers for the zstd library, and I ran into the following issue when compling on Windows using VS2005:

lib\zstd.c(69) :fatal error C1083: Cannot open include file: 'immintrin.h': No such file or directory

It appears to come from this line.

gcc pedantically complains about that comma... :)

zstd_static.h:74:38: warning: comma at end of enumerator list [-Wpedantic]

ZSTD_GENERIC_ERROR when compressing large binary files

When compressing a large 6GB binary file a compression error happens. After some investigation it appears to happen when there are more symbols than the max symbol limit. The line is here

https://github.com/Cyan4973/zstd/blob/master/lib/fse.c#L1458

A simple temporary fix is just deleting this line, but my guess is that this isn't a good solution. Increasing the max symbol limit didn't seem to work, but I'm not that familiar with the code base so I'm sure I missed something.

Idea: adaptive compression

I need to send large core dumps from embedded device to server, and do it in minimal time. I've found zstd to be optimal solution because it is very fast and has good compression ratio. I'm using it like this:

zstd  -c | curl -T - $url

where kernel fills stdin of zstd

However, if user has narrow bandwidth for upload, it could be benifical to switch from fast compression to high compression which in my case is 30% more effective. For example, if in first 10 seconds zstd detects that compression throughput is N times larger than write speed (in this case to stdout), it automatically switches high compression and uses it up to the end of input stream.

Would such feature make sense?

Zstd-0.4.0 compiled without legacy support

When I try to decompress a freshly compressed file, it is correctly decompressed but at the end I get:
Error 35 : Read error

Unexpected error for huge data

I've got unexpected error for huge (>4GiB ?) data which generated by xorshift. Here is a test case : https://gist.github.com/t-mat/a7e93d4767b991e191ea
It generates same error both on Ubuntu 14.04 (x64) / gcc 4.8.2 and Windows 7 SP1 (x64) / MSVC++2013.

source data       : 4295000064 bytes
zstd compressed   : 1350556646 bytes
zstd decompressed : 4295000064 bytes
Data error : offset @0x100003c41

Cmake support

I can prepare and push CMakeLists.txt file for generating solution or make file on different platforms. I don't know if this feature will be useful or not?

cSize overflow in ZSTD_compressSequences

This is with zstd as of commit 765207c
In our case(TokuDB), zstd ZSTD_isError returns true sometimes, because:

#1  0x0000000000ca8836 in ZSTD_compressSequences (dst=0x2aaac4c0623a "\034", maxDstSize=11084, seqStorePtr=Unhandled dwarf expression opcode 0xf3
)
...
Breakpoint 8, ZSTD_isError (code=18446744073709551615)

srcLen=10486
dstLen=11092

I guess it's a bug in zstd to cause cSize to be 2^64 -1.

Very slow compression speed for level 14+ on some specific datasets

I'm currently doing some tests and benchmarks on the best way to compress vectors of numerical data using zstd (and various filters to help compression, such as delta or shuffle).

I found a few oddities while trying various sample sets, and I'm not sure if this is expected or not.

note: All the tests below are done with the 0.5.1 release.

The most obvious anomaly I found was when compressing vectors of 64-bit floats produced by going from 100.0 and randomly removing a tiny amount at each step (a few tens of %) while keeping the full precision (ie: 17 decimals), i.e: the sample set looks like this (whith each value encoded as an IEEE 64-bit decimal:

XS: [
  99.9996492024556, 99.9996492024556, 99.9996492024556, 99.9996492024556, 99.9996492024556,
  99.9959685493382, 99.9934385250828, 99.9915028228664, 99.9913987684419, 99.9876946097741,
  99.9832594119447, 99.9832594119447, 99.9827409006435, 99.9827409006435, 99.9792033732376,
  99.9792033732376, 99.9792033732376, 99.9792033732376, 99.9792033732376, 99.9779770870381,
  ...,
  63.9185913211865, 63.9185913211865, 63.9185913211865, 63.9183349283913, 63.9173471531838,
  63.9129878772782, 63.9129878772782, 63.9129878772782, 63.9129878772782, 63.910876784327,
  63.910876784327, 63.9107775813923, 63.9107775813923, 63.9107775813923, 63.9107775813923,
  63.9107775813923, 63.9077360819731, 63.9077360819731, 63.9059456902976, 63.9059456902976
]

I'm also compressing the delta-encoded vector (0 = no change):

DELTA(XS): [
  99.9996492024556, 0, 0, 0, 0,
  -0.00368065311744203, -0.00253002425540672, -0.00193570221631489, -0.000104054424497235, -0.00370415866780149,
  -0.00443519782946566, 0, -0.000518511301152103, 0, -0.00353752740591062,
  0, 0, 0, 0, -0.00122628619951115,
  ...,
  -0.000363898205272051, 0, 0, -0.000256392795243698, -0.000987775207491381,
  -0.00435927590558549, 0, 0, 0, -0.00211109295120337,
  0, -9.92029346988943E-05, 0, 0, 0,
  0, -0.00304149941915455, 0, -0.00179039167556283, 0
]

Now my benchmark compresses both vectors (N=28,800) encoded as 4-byte or 8-byte elements (115KB / 230KB), using multiple codecs (lz4, zstd, zlib) and also filtering (none, blosc-like shuffle), and measure both the ratio and time to encode.

The full benchmark results can be found here: https://gist.github.com/KrzysFR/0f6835c7a8d0f19dbdc3 (warning: lots of data and ASCII charts!)

The combination of delta-encoding + shuffling on 64-bit floats with full precision induce a very visible slowdown for levels 14 up to 21, going from 15ms at level 13 to 280ms at level 14 (up to 21). This bump is still there but less visible when rounding all the numbers to keep only 3 digits.

Below are some charts that show the results.

Comparison of compression time
top: original data set with full precision, bottom: rounded to 3 decimals

The yellow line is clearly causing troubles for levels 14 and up. We can also see that ZLib has some issues with it from level 7 and up

Here it is again, but using a log scale for the time:

Comparison of ratios
top: original data set with full precision, bottom: rounded to 3 decimals

Full results

Visual Studio 2012 Release broken

Visual Studio 2012 Release mode - decompress data (which generated in HC mode) is broken again... Debug - is OK :)
Seems like a bug as in 0.2 version...

out of bounds stack read in function HUF_readStats on malformed input

This input file
https://crashes.fuzzing-project.org/zstd-oob-stack-HUF_readStats
causes an out of bounds stack read access in zstd. To see this one needs to compile zstd with address sanitizer (-fsanitize=address in CFLAGS).

Issue was found with the help of american fuzzy lop.

This is the output from address sanitizer:

==19506==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef5269784 at pc 0x0000004faff7 bp 0x7ffef5269520 sp 0x7ffef5269518
READ of size 4 at 0x7ffef5269784 thread T0
    #0 0x4faff6 in HUF_readStats /f/zst/zstd/programs/../lib/huff0.c:612:9
    #1 0x4fa03d in HUF_readDTableX2 /f/zst/zstd/programs/../lib/huff0.c:644:13
    #2 0x501080 in HUF_decompress4X2 /f/zst/zstd/programs/../lib/huff0.c:859:17
    #3 0x5138d2 in HUF_decompress /f/zst/zstd/programs/../lib/huff0.c:1701:23
    #4 0x4e4b39 in ZSTD_decompressLiterals /f/zst/zstd/programs/../lib/zstd.c:1078:21
    #5 0x4e4b39 in ZSTD_decodeLiteralsBlock /f/zst/zstd/programs/../lib/zstd.c:1102
    #6 0x4e6bb5 in ZSTD_decompressBlock /f/zst/zstd/programs/../lib/zstd.c:1468:23
    #7 0x4e68b2 in ZSTD_decompressContinue /f/zst/zstd/programs/../lib/zstd.c:1622:21
    #8 0x52dcbf in FIO_decompressFrame /f/zst/zstd/programs/fileio.c:396:23
    #9 0x52e721 in FIO_decompressFilename /f/zst/zstd/programs/fileio.c:492:21
    #10 0x530fe3 in main /f/zst/zstd/programs/zstdcli.c:352:9
    #11 0x7f76943b6f9f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.20-r2/work/glibc-2.20/csu/libc-start.c:289
    #12 0x4377c6 in _start (/mnt/ram/zstd/zstd+0x4377c6)

Address 0x7ffef5269784 is located in stack of thread T0 at offset 420 in frame
    #0 0x4f9edf in HUF_readDTableX2 /f/zst/zstd/programs/../lib/huff0.c:630

  This frame has 4 object(s):
    [32, 288) 'huffWeight'
    [352, 420) 'rankVal' <== Memory access at offset 420 overflows this variable
    [464, 468) 'tableLog'
    [480, 484) 'nbSymbols'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /f/zst/zstd/programs/../lib/huff0.c:612 HUF_readStats
Shadow bytes around the buggy address:
  0x10005ea452a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005ea452b0: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x10005ea452c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005ea452d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005ea452e0: f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
=>0x10005ea452f0:[04]f2 f2 f2 f2 f2 04 f2 04 f3 f3 f3 00 00 00 00
  0x10005ea45300: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
  0x10005ea45310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005ea45320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005ea45330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x10005ea45340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==19506==ABORTING

lib/zstd_static and programs/fileio.h have different line ending

It seems that these files in all branches have CRLF line endings instead of LF.

lost 20% of decompression speed in v0.4

It seems that you have lost 20% of decompression speed in v0.4:

Compressor name	Compression	Decompress.	Compr. size	Ratio
zstd_HC v0.3.6 level 1	250 MB/s	529 MB/s	51230550	48.86
zstd_HC v0.3.6 level 2	186 MB/s	498 MB/s	49678572	47.38
zstd_HC v0.3.6 level 3	90 MB/s	484 MB/s	48838293	46.58
zstd_HC v0.3.6 level 4	75 MB/s	474 MB/s	48423913	46.18
zstd_HC v0.3.6 level 5	61 MB/s	467 MB/s	46480999	44.33
zstd_HC v0.3.6 level 6	40 MB/s	477 MB/s	45723093	43.60
zstd_HC v0.3.6 level 7	28 MB/s	480 MB/s	44803941	42.73
zstd_HC v0.3.6 level 8	21 MB/s	475 MB/s	44511976	42.45
zstd_HC v0.3.6 level 9	15 MB/s	497 MB/s	43899996	41.87
zstd_HC v0.3.6 level 10	16 MB/s	493 MB/s	43845344	41.81
zstd_HC v0.3.6 level 11	15 MB/s	491 MB/s	42506862	40.54
zstd_HC v0.3.6 level 12	11 MB/s	493 MB/s	42402232	40.44
zstd v0.4 level 1	244 MB/s	492 MB/s	51160301	48.79
zstd v0.4 level 2	176 MB/s	443 MB/s	49719335	47.42
zstd v0.4 level 3	88 MB/s	422 MB/s	48749022	46.49
zstd v0.4 level 4	74 MB/s	402 MB/s	48352259	46.11
zstd v0.4 level 5	69 MB/s	387 MB/s	46389082	44.24
zstd v0.4 level 6	36 MB/s	387 MB/s	45525313	43.42
zstd v0.4 level 7	29 MB/s	390 MB/s	44805120	42.73
zstd v0.4 level 8	23 MB/s	389 MB/s	44509894	42.45
zstd v0.4 level 9	16 MB/s	402 MB/s	43892280	41.86
zstd v0.4 level 10	18 MB/s	407 MB/s	43807530	41.78
zstd v0.4 level 11	15 MB/s	417 MB/s	42498160	40.53
zstd v0.4 level 12	11 MB/s	406 MB/s	42394424	40.43

exit nonzero upon write failure

Report by Jim Meyering :

Please make it diagnose and exit nonzero upon write failure. A good way to demonstrate the problem is to use linux's /dev/full device:
$ echo foo | programs/zstd > /dev/full; echo $?

Provide regenerated size in header for allocation purpose

Original discussion : https://groups.google.com/forum/#!topic/lz4c/EzasmWCYCCM

This will require an update of the frame format

Test data fails round-trip

I've got another failure case. This time, the data crashes during decompression. I've uploaded a gzipped version of the bad2.bin at the gist https://gist.github.com/mwiebe/c54c790288b8e16a7970.

C:\Dev>dir bad2.bin
 Volume in drive C is Windows
 Volume Serial Number is DA14-224A

 Directory of C:\Dev

2015-03-25  04:51 PM         6,434,440 bad2.bin
               1 File(s)      6,434,440 bytes
               0 Dir(s)  266,752,405,504 bytes free

C:\Dev>"C:\Dev\zstd\visual\2012\x64\Release\zstd.exe" bad2.bin out2.bin
Compressed 6434440 bytes into 3003788 bytes ==> 46.68%

C:\Dev>"C:\Dev\zstd\visual\2012\x64\Release\zstd.exe" -d out2.bin bad2.roundtrip.bin
**CRASH**

Buffer bounds

I understand that the project is in experimental stage and I am not expecting to be bugs-free. So here is one bug.

Sometimes the destination buffer bounds are not checked properly (when compressing/decompressing) and overflows happens that could lead to a lot of nasty thing. Here is a code that demonstrates it:

#include <stdio.h>
#include <stdlib.h>
#include <zstd.h>

int main(int argc, char **argv ) {
    char *raw = (char *)malloc(1000);
    char compressed[20];
    char decompressed[900];
    size_t i, ccode, dcode;

    // fill it with ones
    for (i=0; i<1000; i++) raw[i] = 1;

    ccode = ZSTD_compress(compressed, 20, raw, 1000);
    printf("Compression code %i\n", ccode);
    dcode = ZSTD_decompress(decompressed, 900, compressed, ccode);
    printf("Decompression code %i\n", dcode);
    return 0;
}

Regards

Doesn't compile with GCC 4.4

Error due to type redefinition.
Original discussion : #24

Note : it's not an issue for later versions of GCC and clang, because the type redefinition defines exactly the same type. But earlier GCC versions nonetheless consider it an error.

zstd reads outside buffer for short inputs

When provided with a short input (in this case, a single byte), zstd will read outside of the input buffer. Here is a log from the single-byte test run under AddressSanitizer:

https://travis-ci.org/quixdb/squash/jobs/91010226#L3636

Is multithreaded compression planned?

It would be great if zstd could use at least 2 threads

Types of objects it can compress

Hi,
is this an alternative to Gzip?
Also, what kind of files/objects can it compress - just base HTML or also JS, CSS, Images etc?
Lastly, can it be deployed to any OS - Apache/IIS?
Also, if I need to 'activate' this on an existing website, what steps do I need to follow?

+ Brotili to comparison?

[Cloudflare is seeking better compression](https://blog.cloudflare.com/results-experimenting-brotli/}, so I was wondering how this would compare to another actively developed compressor brotili?

btlazy2 strategy is incredibly slow on highly repetitive data

For example, on a file containing 10,000 repetitions of "All work and no play makes Jack a dull boy.\n" (440,000 bytes total), zstd -b15 gives about 23 MB/s on my laptop while zstd -b16 and higher give about 0.02 MB/s. I had to add another digit to the speed output to see anything but 0.0. I assume the switch to the btlazy2 strategy is what makes the difference.

Debian Package

Hi Yann,

I'm just posting this as a courtesy message to say that I intend to package the Zstd library for Debian. A quick question for you, is the preferred name "zstd", or "zstandard"?

I'll post back with progress as it occurs.

Cheers,
Kevin

zstd_legacy.h is missing - include attempted when ZSTD_LEGACY_SUPPORT is defined

ZSTD_LEGACY_SUPPORT if not defined is defined to 1 in zstd_decompress.c

Then later in the file:

if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT==1)

include "zstd_legacy.h"

endif

However, zstd_legacy.h isn't present on github.

"Stack cookie instrumentation code detected a stack-based buffer overrun" in HUF_fillDTableX4Level2

tl;dr: this is a codegen issue with VS2013 in Release x64, see discussion below.

I have a test that attempts to compress and decompress vectors of raw data, and some of them (1 in 100+?) consistently crash during decompression. The crash is reproducible and deterministic, always on the same vectors.

The test program is written in .NET 4.6 and is using PInvoke to call into a version of zstd build as an x64 dll, compiled from the 0.2.1 release (9e61835).

The crash message is Stack cookie instrumentation code detected a stack-based buffer overrun and looks like the stack was overwritten by garbage during decoding.

When I try to decompress the data using zstd.exe from the command line, it works fine. But whenever I try to decompress it from my code, it crashes. I tried either calling ZSTD_compress(...) directly, or reimplementing the same logic in fileio.c (using a DCTX and calling repeatedly ZSTD_nextSrcSizeToDecompress and ZSTD_decompressContinue) and both fail exactly the same way (and also both work perfectly fine with the other 99% vectors).

The crash occurs during the first call to ZSTD_decompressContinue() that has actual compressed data (ie: first call with the frame header returns 0, then next call with the first chunk of compressed data crashes).

I was able to create a pair of one file that decompress fine, and another file that systematically crashes, and can reproduce the issue with the test (.NET code). The same code works perfectly with the previous 0.1.x branch.

The original files (both are a highly compressed vector of integer values) can be found here: https://github.com/KrzysFR/frqsspslt/blob/76e8e799c936096819bb7b97bfb13d764949d115/attachments/zstd/sample_data.zip?raw=true

original_pass.bin: compress/decompress ok
original_fail.bin: compress ok, decompress using zstdcli, but fail when decompressing from .NET:

Test program:

var files = new[] { "original_pass.bin", "original_fail.bin" };

foreach (var file in files)
{
    Trace.WriteLine("## " + file);

    var original = new ArraySegment<byte>(File.ReadAllBytes(Path.Combine(@"..\..", file)));
    ulong h1 = XxHash64.FromBytes(original);
    Trace.WriteLine($"> Original    : {original.Count,10:N0} bytes (hash: 0x{h1:x16})");

    var compressed = ZStd.CompressBuffer(original);
    Trace.WriteLine($"> Compressed  : {compressed.Count,10:N0} bytes (hash: 0x{XxHash64.FromBytes(compressed):x16})");
    using (var fs = File.Create(Path.Combine(@"..\..", file + ".zst")))
    { // save to disk (for reference)
        fs.Write(compressed.Array, compressed.Offset, compressed.Count);
    }

    var decompressed = ZStd.DecompressBuffer(compressed, originalSize: original.Count);
    var h2 = XxHash64.FromBytes(decompressed);
    Trace.WriteLine($"> Decompressed: {decompressed.Count,10:N0} bytes (hash: 0x{h2:x16})");

    if (h1 != h2)
    {
        Trace.WriteLine("> FAILED! hashes to not match!");
        Trace.WriteLine(HexaDump.Versus(original, compressed));
    }
    else
    {
        Trace.WriteLine("> PASS");
    }
}

Outputs:

## original_pass.bin
> Original    :  1,469,465 bytes (hash: 0x514cf5e26f0c9054)
> Compressed  :      2,855 bytes (hash: 0xa3a91b9ecdfaed20)
> Decompressed:  1,469,465 bytes (hash: 0x514cf5e26f0c9054)
> PASS
## original_fail.bin
> Original    :  1,958,527 bytes (hash: 0xd6172d7d482e5460)
> Compressed  :    177,267 bytes (hash: 0x452a6dce4b2dfb04)
> Decompressed:  >CRASH< (StackOverflow?)
Unhandled exception at 0x00007FF964CDC798 (zstd_x64.dll) in Test.exe: Stack cookie instrumentation code detected a stack-based buffer overrun.

Attaching a debugger, I get the following stacktrace:

CallStack:
    zstd_x64.dll!__report_gsfailure(unsigned __int64 StackCookie) Line 151  C
    zstd_x64.dll!__GSHandlerCheck(_EXCEPTION_RECORD * ExceptionRecord, void * EstablisherFrame, _CONTEXT * ContextRecord, _DISPATCHER_CONTEXT * DispatcherContext) Line 91  C
    ntdll.dll!RtlpExecuteHandlerForException()  Unknown
    ntdll.dll!RtlDispatchException()    Unknown
    ntdll.dll!KiUserExceptionDispatch() Unknown
>   zstd_x64.dll!HUF_fillDTableX4Level2(HUF_DEltX4 * DTable, unsigned int sizeLog, const unsigned int consumed, const unsigned int * rankValOrigin, const int minWeight, const sortedSymbol_t * sortedSymbols, const unsigned int sortedListSize, unsigned int nbBitsBaseline, unsigned short baseSeq) Line 893 C
    zstd_x64.dll!HUF_fillDTableX4(HUF_DEltX4 * DTable, const unsigned int targetLog, const sortedSymbol_t * sortedList, const unsigned int sortedListSize, const unsigned int * rankStart, unsigned int[17] * rankValOrigin, const unsigned int maxWeight, const unsigned int nbBitsBaseline) Line 951  C
    zstd_x64.dll!HUF_readDTableX4(unsigned int * DTable, const void * src, unsigned __int64 srcSize) Line 1041  C
    zstd_x64.dll!HUF_decompress4X4(void * dst, unsigned __int64 dstSize, const void * cSrc, unsigned __int64 cSrcSize) Line 1255    C
    // everything before that in the callstack looks like garbage (more like data, and not actual method pointers)
    0109000701090007()  Unknown
    0109000701090007()  Unknown
    0109000701090007()  Unknown
    //... this garbage address is repeated about a thousand times, and is the same value in cSrcSize/dstSize below, which confirms that the stack has been overwritten
    0109000701090007()  Unknown
    0109000701090007()  Unknown
    0000002078746341()  Unknown
    000030b400000001()  Unknown
    00000000000000dc()  Unknown
    0000000000000020()  Unknown
    0000000100000014()  Unknown
    0000003400000007()  Unknown
    000000010000017c()  Unknown

HUF_fillDTableX4Level2() locals:

        baseSeq             0x0007  unsigned short
        consumed            0x7ef2b8e0  const unsigned int
+       DElt                {sequence=0x0007 nbBits=0x09 '\t' length=0x01 '\x1' }   HUF_DEltX4
+       DTable              0x0000006f7edd9fb0 {sequence=0x0405 nbBits=0x06 '\x6' length=0x04 '\x4' }   HUF_DEltX4 *
        minWeight           0x00000007  const int
        nbBitsBaseline      0x0000000a  unsigned int
+       rankVal             0x0000006f7edd95c0 {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x5ea64b14, 0x00007ff9, ...}    unsigned int[0x00000011]
+       rankValOrigin       0x00007ff95ea037fd {Inside clr.dll!EEHeapFreeInProcessHeap(void)} {0x48c0b60f}  const unsigned int *
        sizeLog             0x00000000  unsigned int
        sortedListSize      0x00000003  const unsigned int
+       sortedSymbols       0x0000006f7edd9fba {symbol=0x00 '\0' weight=0x07 '\a' } const sortedSymbol_t *

HUF_fillDTableX4() locals:

+       DElt                {sequence=0xb8e0 nbBits=0xf2 'ò' length=0x7e '~' } HUF_DEltX4
+       DTable              0x0000000000008918 {sequence=??? nbBits=??? length=??? }    HUF_DEltX4 *
        maxWeight           0x00000007  const unsigned int
        nbBitsBaseline      0x0000000a  const unsigned int
+       rankStart           0x0000006f7edd97e0 {0x00000000} const unsigned int *
+       rankVal             0x0000006f7edd96f0 {0x00000000, 0x00000000, 0x000007b0, 0x000007c0, 0x000007c0, 0x00000880, 0x00000a00, ...}    unsigned int[0x00000011]
+       rankValOrigin       0x0000006f7edd9880 {0x00000000, 0x00000000, 0x000007b0, 0x000007c0, 0x000007c0, 0x00000880, 0x00000a00, ...}    unsigned int[0x00000011] *
+       sortedList          0x0000006f19f628c8 {symbol=0x00 '\0' weight=0x00 '\0' } const sortedSymbol_t *
        sortedListSize      0x00000008  const unsigned int
        targetLog           0x7edd9890  const unsigned int

HUF_readDTableX4() locals:

+       DTable              0x0000006f7eeeddb0 {0x5f0d2408} unsigned int *
        nbSymbols           0x00000100  unsigned int
+       rankStart0          0x0000006f7edd97e0 {0x00000000, 0x00000000, 0x000000f6, 0x000000f7, 0x000000f7, 0x000000fa, 0x000000fd, ...}    unsigned int[0x00000012]
+       rankStats           0x0000006f7edd9830 {0x00000000, 0x000000f6, 0x00000001, 0x00000000, 0x00000003, 0x00000003, 0x00000000, ...}    unsigned int[0x00000011]
+       rankVal             0x0000006f7edd9880 {0x0000006f7edd9880 {0x00000000, 0x00000000, 0x000007b0, 0x000007c0, 0x000007c0, ...}, ...}  unsigned int[0x00000010][0x00000011]
+       sortedSymbol        0x0000006f7edd9dc0 {{symbol=0x07 '\a' weight=0x01 '\x1' }, {symbol=0x08 '\b' weight=0x01 '\x1' }, {symbol=...}, ...}    sortedSymbol_t[0x00000100]
        src                 mscorlib.ni.dll!0x00007ff95d0c0220 (load symbols for additional information)    const void *
        srcSize             0x0000000000008918  unsigned __int64
        tableLog            0x00000009  unsigned int
+       weightList          0x0000006f7edd9cc0 "\a\x5\x5\x5\x4\x4\x4\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x2\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\a\a\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1...    unsigned char[0x00000100]

HUF_decompress4X4() locals:

        cSrc                0x0109000701090007  const void *
        cSrcSize            0x0109000701090007  unsigned __int64
        dst                 0x0109000701090007  void *
        dstSize             0x0109000701090007  unsigned __int64
+       DTable              0x0000006f7edda040 {0x0000000c, 0x01090007, 0x01090007, 0x01090007, 0x01090007, 0x01090007, 0x01090007, ...}    unsigned int[0x00001001]

My guess is that DTable which is allocated on the stack, was overwritten somewhere, which makes it impossible for the debugger to unwind the stack properly.

CUDA/OpenCL compressing/decompressing algorithm execution

Is it possible to "split" algorithm to many compressing/decompressing parallels blocks? It will be great idea to speedup library up to 10x.
CUDA is my hobby and I want to help improve zstd in future :)

facebook / zstd Goto Github PK

zstd's Issues

How to repro

Output with VS2013-update5, Windows7:

Output with Xcode711, Clang700.1.76, enabled ASan, iOS9.1:

Possible consequences

Context

Goal

Idea

What I've tested so far

What I'm not sure about

define ZSTD_LEGACY_SUPPORT 0

if defined(ZSTD_LEGACY_SUPPORT) && (ZSTD_LEGACY_SUPPORT==1)

include "zstd_legacy.h"

endif

Recommend Projects

Recommend Topics

Recommend Org