facebook / zstd Goto Github PK
View Code? Open in Web Editor NEWZstandard - Fast real-time compression algorithm
Home Page: http://www.zstd.net
License: Other
Zstandard - Fast real-time compression algorithm
Home Page: http://www.zstd.net
License: Other
Hi Yann,
The Debian build daemons have found some issues with the test suite on mips, powerpc and s390x systems. If you consider this a bug, the build logs are here. Otherwise, if you consider these architectures unsupported, I can remove them from the list of architectures that zstd will be built on.
Cheers,
Kevin
Hi @Cyan4973
If my understanding is correct, currently ZSTD_decompressContinue
expects src
to be the [compressed block + block header of next block]. This is a problem in scenarios where we want to decompress blocks independently using the framing format, e.g. we dont have the next block available yet.
Wouldn't it make more sense that ZSTD_decompressContinue
takes src
as [block header + compressed block] ? This way the current block can be decompressed without needing the header of the next block?
Requested by Dimitri
Original discussion : http://fastcompression.blogspot.fr/2015/01/zstd-stronger-compression-algorithm.html?showComment=1424173050454#c7703504284913974280
Hello! Thanks for a great library!
I've recently encountered problem while decoding data generated by this script:
#!/usr/bin/env python
import sys
n = int(sys.argv[1])
for i in range(0, n):
sys.stdout.write(chr(i + ord('a')) * (2**i))
When I run:
./generate.py 20 > input
./zstd -f input compressed # Compressed 1048575 bytes into 235 bytes ==> 0.02%
./zstd -f -d compressed decompressed # Segmentation fault
I get segmentation fault.
If I replace 20 with 19 I get:
./generate.py 19 > input
./zstd -f input compressed # Compressed 524287 bytes into 159 bytes ==> 0.03%
./zstd -f -d compressed decompressed # Decoded 262144 bytes
Which is weird because decompressed input size doesn't match original input size.
I faced this situation on OSX 10.10 and on Ubuntu 12.04 with both Clang and GCC 4.9
This file
https://crashes.fuzzing-project.org/zstd-oob-heap-ZSTD_copy8
causes an out of bounds heap read access in zstd. This can be seen with either address sanitizer or valgrind.
This was found with american fuzzy lop.
Address Sanitizer output:
==12888==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7fc173c8104b at pc 0x0000004e939f bp 0x7ffe115e1a50 sp 0x7ffe115e1a48
READ of size 8 at 0x7fc173c8104b thread T0
#0 0x4e939e in ZSTD_copy8 /f/zst/zstd/programs/../lib/zstd.c:158:56
#1 0x4e939e in ZSTD_wildcopy /f/zst/zstd/programs/../lib/zstd.c:168
#2 0x4e939e in ZSTD_execSequence /f/zst/zstd/programs/../lib/zstd.c:1337
#3 0x4e939e in ZSTD_decompressSequences /f/zst/zstd/programs/../lib/zstd.c:1436
#4 0x4e939e in ZSTD_decompressBlock /f/zst/zstd/programs/../lib/zstd.c:1473
#5 0x4e68b2 in ZSTD_decompressContinue /f/zst/zstd/programs/../lib/zstd.c:1622:21
#6 0x52dcbf in FIO_decompressFrame /f/zst/zstd/programs/fileio.c:396:23
#7 0x52e721 in FIO_decompressFilename /f/zst/zstd/programs/fileio.c:492:21
#8 0x530fe3 in main /f/zst/zstd/programs/zstdcli.c:352:9
#9 0x7fc172bf4f9f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.20-r2/work/glibc-2.20/csu/libc-start.c:289
#10 0x4377c6 in _start (/mnt/ram/zstd/zstd+0x4377c6)
0x7fc173c8104b is located 3 bytes to the right of 141384-byte region [0x7fc173c5e800,0x7fc173c81048)
allocated by thread T0 here:
#0 0x4be792 in __interceptor_malloc (/mnt/ram/zstd/zstd+0x4be792)
#1 0x4e627c in ZSTD_createDCtx /f/zst/zstd/programs/../lib/zstd.c:1560:35
#2 0x530fe3 in main /f/zst/zstd/programs/zstdcli.c:352:9
SUMMARY: AddressSanitizer: heap-buffer-overflow /f/zst/zstd/programs/../lib/zstd.c:158 ZSTD_copy8
Shadow bytes around the buggy address:
0x0ff8ae7881b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff8ae7881c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff8ae7881d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff8ae7881e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0ff8ae7881f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0ff8ae788200: 00 00 00 00 00 00 00 00 00[fa]fa fa fa fa fa fa
0x0ff8ae788210: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff8ae788220: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff8ae788230: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff8ae788240: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0ff8ae788250: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==12888==ABORTING
I'm currently updating my wrapper for the new API introduced in 0.4.x, and I'm merging all the methods with and without compression levels into a single set of methods.
Asking the caller to pass an untyped integer value for the compression level seems a bit dangerous: most people will probably remember zlib, and expect a 1 - 9 scale with default at 5. That's why I would like to have an enum with well known names, and a clear default "if you don't know better use that one" level.
Quick questions:
The new HTTP compresses headers, with a static-expanding table:
talk: https://youtu.be/r5oT_2ndjms?list=PLNYkxOF6rcICcHeQY02XLvoGL34rZFWZn&t=820
spec: https://httpwg.github.io/specs/rfc7541.html
implementation: https://github.com/twitter/hpack/tree/master/hpack/src/main/java/com/twitter/hpack
Might make cookies make a comeback.
VS2013 and Xcode711/Asan detect a stack buffer overflow in a released v0.4.3 with a specific input data.
Use the following PVRTC4 compressed sample image with a fullbench
app, called without additional parameters.
https://www.dropbox.com/s/tlgr7lxpmtiq4yw/sample.pvr?dl=0
*** Zstandard speed analyzer 32-bits, by Yann Collet (Dec 10 2015) ***
D:\work\zstd\release-v043\visual\2012\sample.pvr :
1- ZSTD_compress : 3.4 MB/s ( 2120226)
11- ZSTD_decompress : 7.4 MB/s ( 2796340)
31- ZSTD_decodeLiteralsBlock : 11.8 MB/s ( 74273)
1- ZSTD_decodeSeqHeaders :
Run-Time Check Failure #2 - Stack around the variable 'DTableOffb' was corrupted.
> fullbench.exe!local_ZSTD_decodeSeqHeaders(void * dst, unsigned int dstSize, void * buff2, const void * src, unsigned int srcSize) Line 244 C
fullbench.exe!benchMem(void * src, unsigned int srcSize, unsigned int benchNb) Line 358 C
fullbench.exe!benchFiles(char * * fileNamesTable, int nbFiles, unsigned int benchNb) Line 468 C
fullbench.exe!main(int argc, char * * argv) Line 584 C
[External Code]
[Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll]
AddressSanitizer debugger support is active. Memory error breakpoint has been installed and you can now use the 'memory history' command.
2015-12-10 12:48:24.980 TestLZ4[2018:627417] Started
*** Zstandard speed analyzer 64-bits, by Yann Collet (Dec 10 2015) ***
Loading /var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/sample.pvr...
/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/sample.pvr :
1- ZSTD_compress :
1- ZSTD_compress : 4.1 MB/s ( 2120226)
2- ZSTD_compress :
2- ZSTD_compress : 4.1 MB/s ( 2120226)
3- ZSTD_compress :
3- ZSTD_compress : 4.1 MB/s ( 2120226)
4- ZSTD_compress :
4- ZSTD_compress : 4.1 MB/s ( 2120226)
5- ZSTD_compress :
5- ZSTD_compress : 4.1 MB/s ( 2120226)
6- ZSTD_compress :
6- ZSTD_compress : 4.1 MB/s ( 2120226)
1- ZSTD_compress : 4.1 MB/s ( 2120226)
1- ZSTD_decompress :
1- ZSTD_decompress : 15.0 MB/s ( 2796340)
2- ZSTD_decompress :
2- ZSTD_decompress : 15.0 MB/s ( 2796340)
3- ZSTD_decompress :
3- ZSTD_decompress : 15.0 MB/s ( 2796340)
4- ZSTD_decompress :
4- ZSTD_decompress : 15.0 MB/s ( 2796340)
5- ZSTD_decompress :
5- ZSTD_decompress : 15.0 MB/s ( 2796340)
6- ZSTD_decompress :
6- ZSTD_decompress : 15.0 MB/s ( 2796340)
11- ZSTD_decompress : 15.0 MB/s ( 2796340)
1- ZSTD_decodeLiteralsBlock :
1- ZSTD_decodeLiteralsBlock : 26.1 MB/s ( 74273)
2- ZSTD_decodeLiteralsBlock :
2- ZSTD_decodeLiteralsBlock : 26.1 MB/s ( 74273)
3- ZSTD_decodeLiteralsBlock :
3- ZSTD_decodeLiteralsBlock : 26.1 MB/s ( 74273)
4- ZSTD_decodeLiteralsBlock :
4- ZSTD_decodeLiteralsBlock : 26.1 MB/s ( 74273)
5- ZSTD_decodeLiteralsBlock :
5- ZSTD_decodeLiteralsBlock : 26.1 MB/s ( 74273)
6- ZSTD_decodeLiteralsBlock :
6- ZSTD_decodeLiteralsBlock : 26.1 MB/s ( 74273)
31- ZSTD_decodeLiteralsBlock : 26.1 MB/s ( 74273)
1- ZSTD_decodeSeqHeaders :
=================================================================
==2018==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x00016fd84ee2 at pc 0x0001000c29ec bp 0x00016fd81430 sp 0x00016fd81428
WRITE of size 1 at 0x00016fd84ee2 thread T0
#0 0x1000c29eb in FSE_buildDTable (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x10004a9eb)
#1 0x10009ff7f in ZSTD_decodeSeqHeaders (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100027f7f)
#2 0x1000bcdaf in local_ZSTD_decodeSeqHeaders (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100044daf)
#3 0x1000bd683 in benchMem (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100045683)
#4 0x1000be247 in benchFiles (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100046247)
#5 0x1000bee27 in zstd_start_benchmark (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100046e27)
#6 0x10009d15b in -[ViewController doAsyncTestButton:] (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x10002515b)
#7 0x189ca7cfb in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x4fcfb)
#8 0x189ca7c77 in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x4fc77)
#9 0x189c8f92f in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x3792f)
#10 0x189cb03cb in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x583cb)
#11 0x189ca7013 in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x4f013)
#12 0x189c9fcdb in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x47cdb)
#13 0x189c704a3 in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x184a3)
#14 0x189c6e76b in <redacted> (/System/Library/Frameworks/UIKit.framework/UIKit+0x1676b)
#15 0x184694543 in <redacted> (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0xdc543)
#16 0x184693fd7 in <redacted> (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0xdbfd7)
#17 0x184691cd7 in <redacted> (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0xd9cd7)
#18 0x1845c0c9f in CFRunLoopRunSpecific (/System/Library/Frameworks/CoreFoundation.framework/CoreFoundation+0x8c9f)
#19 0x18f7fc087 in GSEventRunModal (/System/Library/PrivateFrameworks/GraphicsServices.framework/GraphicsServices+0xc087)
#20 0x189cd8ffb in UIApplicationMain (/System/Library/Frameworks/UIKit.framework/UIKit+0x80ffb)
#21 0x1000f724f in main (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x10007f24f)
#22 0x199ade8b7 in <redacted> (/usr/lib/system/libdyld.dylib+0x28b7)
Address 0x00016fd84ee2 is located in stack of thread T0 at offset 12578 in frame
#0 0x1000bcb27 in local_ZSTD_decodeSeqHeaders (/var/mobile/Containers/Bundle/Application/1FA3B3BC-B5AD-486D-BAD2-144B631187D1/TestLZ4.app/TestLZ4+0x100044b27)
This frame has 6 object(s):
[32, 8224) 'DTableML'
[8480, 12576) 'DTableLL' <== Memory access at offset 12578 overflows this variable
[12704, 14752) 'DTableOffb'
[14880, 14888) 'dumps'
[14912, 14920) 'length'
[14944, 14948) 'nbSeq'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow ??:0 FSE_buildDTable
Shadow bytes around the buggy address:
0x00014e1b0980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00014e1b0990: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00014e1b09a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00014e1b09b0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00014e1b09c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x00014e1b09d0: 00 00 00 00 00 00 00 00 00 00 00 00[f2]f2 f2 f2
0x00014e1b09e0: f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 00
0x00014e1b09f0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00014e1b0a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00014e1b0a10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x00014e1b0a20: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==2018==ABORTING
AddressSanitizer report breakpoint hit. Use 'thread info -s' to get extended information about the report.
(lldb) bt
* thread #1: tid = 0x992d9, 0x00000001001650c4 libclang_rt.asan_ios_dynamic.dylib`__asan::AsanDie(), queue = 'com.apple.main-thread', stop reason = Stack buffer overflow detected
frame #0: 0x00000001001650c4 libclang_rt.asan_ios_dynamic.dylib`__asan::AsanDie()
frame #1: 0x0000000100168b80 libclang_rt.asan_ios_dynamic.dylib`__sanitizer::Die() + 44
frame #2: 0x0000000100163ed4 libclang_rt.asan_ios_dynamic.dylib`__asan::ScopedInErrorReport::~ScopedInErrorReport() + 336
frame #3: 0x0000000100163c6c libclang_rt.asan_ios_dynamic.dylib`__asan::ScopedInErrorReport::~ScopedInErrorReport() + 12
frame #4: 0x00000001001637e8 libclang_rt.asan_ios_dynamic.dylib`__asan_report_error + 3216
frame #5: 0x00000001001641f8 libclang_rt.asan_ios_dynamic.dylib`__asan_report_store1 + 44
* frame #6: 0x00000001000c29ec TestLZ4`FSE_buildDTable(dt=0x000000016fd83ee0, normalizedCounter=0x000000016fd818f0, maxSymbolValue=63, tableLog=10) + 856 at fse.c:373
frame #7: 0x000000010009ff80 TestLZ4`ZSTD_decodeSeqHeaders(nbSeq=0x000000016fd85820, dumpsPtr=0x000000016fd857e0, dumpsLengthPtr=0x000000016fd85800, DTableLL=0x000000016fd83ee0, DTableML=0x000000016fd81de0, DTableOffb=0x000000016fd84f60, src=0x000000010a574800, srcSize=11133) + 3100 at zstd_decompress.c:377
frame #8: 0x00000001000bcdb0 TestLZ4`local_ZSTD_decodeSeqHeaders(dst=0x000000010a2bc800, dstSize=2818710, buff2=0x000000010a574800, src=0x0000000108404800, srcSize=131072) + 664 at zstd_fullbench.c:243
frame #9: 0x00000001000bd684 TestLZ4`benchMem(src=0x0000000108404800, srcSize=131072, benchNb=32) + 2116 at zstd_fullbench.c:358
frame #10: 0x00000001000be248 TestLZ4`benchFiles(fileNamesTable=0x000000016fd85e98, nbFiles=1, benchNb=32) + 1100 at zstd_fullbench.c:468
frame #11: 0x00000001000bee28 TestLZ4`zstd_start_benchmark(argc=2, argv=0x000000016fd85e90) + 2188 at zstd_fullbench.c:584
frame #12: 0x000000010009d15c TestLZ4`-[ViewController doAsyncTestButton:](self=0x0000000107507880, _cmd="doAsyncTestButton:", sender=<unavailable>) + 792 at ViewController.mm:55
frame #13: 0x0000000189ca7cfc UIKit`-[UIApplication sendAction:to:from:forEvent:] + 100
frame #14: 0x0000000189ca7c78 UIKit`-[UIControl sendAction:to:forEvent:] + 80
frame #15: 0x0000000189c8f930 UIKit`-[UIControl _sendActionsForEvents:withEvent:] + 416
frame #16: 0x0000000189cb03cc UIKit`-[UIControl touchesBegan:withEvent:] + 268
frame #17: 0x0000000189ca7014 UIKit`-[UIWindow _sendTouchesForEvent:] + 376
frame #18: 0x0000000189c9fcdc UIKit`-[UIWindow sendEvent:] + 784
frame #19: 0x0000000189c704a4 UIKit`-[UIApplication sendEvent:] + 248
frame #20: 0x0000000189c6e76c UIKit`_UIApplicationHandleEventQueue + 5528
frame #21: 0x0000000184694544 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
frame #22: 0x0000000184693fd8 CoreFoundation`__CFRunLoopDoSources0 + 540
frame #23: 0x0000000184691cd8 CoreFoundation`__CFRunLoopRun + 724
frame #24: 0x00000001845c0ca0 CoreFoundation`CFRunLoopRunSpecific + 384
frame #25: 0x000000018f7fc088 GraphicsServices`GSEventRunModal + 180
frame #26: 0x0000000189cd8ffc UIKit`UIApplicationMain + 204
frame #27: 0x00000001000f7250 TestLZ4`main(argc=1, argv=0x000000016fd87a90) + 124 at main.m:16
frame #28: 0x0000000199ade8b8 libdyld.dylib`start + 4
(lldb)
In an existing quite a big iOS app the invocation of ZSTD_decompress
(exactly this function, not its internals) may just crash after several dozens of successful calls, it looks like a problem in a corrupted stack. I didn't manage to localize a problem in a separated sample, only found an issue with a fullbench above. Still not totally sure whether both issues have the same root, but it may be.
So it seems that zstd can now help with serving small amounts of JSON (say from a NoSQL db), assuming that the data is similar (eg common object names)?
I'm very excited to see work being done on dictionary support in the API, because this is something that could greatly help me solve a pressing problem.
In the context of a Document Store, where we are storing a set of JSON-like documents that are sharing the same schema. Each document can be created, read or updated individually in a random fashion. We would like to compress the documents on disk, but there is very little redundancy within each document, which yields very poor compression ratio (maybe 10-20%).
When compressing batchs of 10s or 100s documents, the compression ratio gets really good (10x, 50x or sometimes even more), because there is a lot of redundancy between documents, from:
": "
, or ": true,
or {[[...]],[[...]]}
symbols."Id"
, "Name"
, "Label"
, "SomeVeryLongFieldNameThatIsPresentOnlyOncePerDocument"
, etc..true
, "Red"
, "Administrator"
, ...), keywords, dates that start with 2015-12-14T....
for the next 24h, and even well-known or frequently used GUIDs that are shared by documents (Product Category, Tag Id, hugely popular nodes in graph databases, ...)In the past, I used femtozip
(https://github.com/gtoubassi/femtozip) which is intended precisely for this use case. It includes a dictionary training step (by building a sample batch of documents), that is then used to compress and decompress single documents, with the same compression ratio as if it was a batch. Using real life data, compressing 1000 documents individually would give the same compression ratio as compressing all 1000 documents in a batch with gzip -5.
The dictionary training part of femtozip can be very long: the more samples, the better the compression ratio would be in the end but you need tons of RAM to train it.
Also, I realized that femtozip would sometimes offset the differences in size between different formats like JSON/BSON/JSONB/ProtoBuf and other binary formats, because it would pick up the "grammar" of the format (text or binary) in the dictionary, and only deal with the "meat" of the documents (guids, integers, doubles, natural text) when compressing. This means I can use a format like JSONB (used by Postgres) which is less compact, but is faster to decode at runtime than JSON text.
I would like to be able to do something similar with Zstandard. I don't really care about building the most efficient dictionary (though it could be nice), but at least being able to exploit the fact that FSE builds a list of tokens sorted by frequency. Extracting this list of tokens may help in building a dictionary that will have the most common tokens in the training batch.
The goal would be:
SAMPLES[cur_gen] + D.json
, and only storing the bits produced by the D.json
part.SAMPLES[D.gen] + D.compressed
, and only keeping the last decoded bits that make up D.Since it would be impractical to change the compression code to be able to know which compressed bits are from D, and which from the batch, we could aproximate this by computing a DICTIONARY[gen]
that would be used to initialize the compressor and decompressor.
The Document Store would durably store each generations of dictionaries, and use them to decompress older entries. Periodically, it could recycle the entire store by recompressing everything with the most recent dictionary.
Concrete example:
Training set:
{ "id": 123, "label": "Hello", "enabled": true, "uuid": "9ad51b87-d627-4e04-85c2-d6cb77415981" }
{ "id": 126, "label": "Hell", "enabled": false, "uuid": "0c8e13a5-cdc8-4e1f-8e80-4fee025ee59c" }
{ "id": 129, "label": "Help", "enabled": true, "uuid": "fe6db321-cddd-4e7f-b3d6-6b38365b3e2a" }
Looking at it, we can extract the following repeating segments: { "id": 12
.., "label": "Hel
... ", "enabled":
... e, "uuid": "
... " }
, which could be condensed into:
{ "id": 12, "label": "Hel", "enabled": e, "uuid":"" }
(53 bytes shared by all docs)The unique part of each documents would be:
...3...lo...tru...9ad51b87-d627-4e04-85c2-d6cb77415981
(42 bytes)...6...l...fals...0c8e13a5-cdc8-4e1f-8e80-4fee025ee59c
(42 bytes)...9......tru...fe6db321-cddd-4e7f-b3d6-6b38365b3e2a
(40 bytes)Zstd would only have to work on 42 bytes per doc, instead of 85 bytes. More realistic documents will have a lot more stuff in common than this example.
{ "id": , "foo": "", "bar": "", ....}
produced by removing all values from the JSON document.ZSTD_compress_insertDictionary
, compressing { "id": 123, "foo": "Hello", "bar": "World", ...}
is indeed smaller than without dictionary.123HelloWorld
which is exactly the content specific to the document itself that got removed when producing the gen0 dict.Maybe one way to construct a better dictionary would be:
ZSTD_decompress_insertDictionary
branches off into different implementation for lazy, greedy and so on. I'm not sure if all compression strategy can be used to produce such a dictionary?Again, I don't care about producing the ideal dictionary that produces the smallest result possible, only something that would give me about better compression ratio, while still being able to handle documents in isolation.
ZSTD_decompressContinue() requires previous decompressed buffer, which is incompatible with zstd_static.h document.
PS. fileio.c is GPL again... Do you plan to consider? Also, FIO_decompressFilename()'s wNbBlocks is a kind of magic number (because undocumented).
Note: You can use http://pastebin.com/XTTUSyiA as an example code.
The enum in error_public.h is anonymous. It would be nice if you included a tag or a typedef, if only to help make -Wswitch / -Wswitch-enum usable. AFAIK there is currently no way to get the compiler to warn if a switch does not include a case for every enum value. I would like to be able to do something like
size_t res = …;
if (ZSTD_isError(res)) {
switch ((ZSTD_ErrorCode) -res) {
case ZSTD_error_No_Error:
/* … */
break;
}
}
Note the cast in the controlling expression; without it the type will just be size_t
, and the compiler doesn't realize that there is any association to the enum so it will not emit a warning when there is no case for a particular enum value.
FWIW, I still think it would be easier to do something like
typedef enum {
ZSTD_error_No_Error = 0,
ZSTD_error_GENERIC,
ZSTD_error_prefix_uknown,
…
} ZSTD_ErrorCode;
ZSTD_ErrorCode ZSTD_getError(size_t code);
ZSTD_getError would work just like ZSTD_isError does now; you could still do if (ZSTD_getError(code)) { … }
since ZSTD_error_No_Error would be 0 and errors would be positive integers, but it would make the API a bit easier to understand since you hide the rather weird detail of negating an unsigned type.
Another thing that would be nice is a consistency check. For example, note that each of the values I wrote in the enum above (chosen because they're the first three entries in the enum in zstd, not to make this point) have different capitalization conventions.
I've run into a bit of data which fails to compress. The failure case is a binary file in the following gist:
https://gist.github.com/mwiebe/c54c790288b8e16a7970
C:\Dev>dir bad.bin
Volume in drive C is Windows
Volume Serial Number is DA14-224A
Directory of C:\Dev
2015-03-23 02:17 PM 2,469 bad.bin
1 File(s) 2,469 bytes
0 Dir(s) 273,715,097,600 bytes free
C:\Dev>"C:\Dev\zstd\visual\2012\x64\Release\zstd.exe" bad.bin out.bin
Error 24 : Compression error : ZSTD_ERROR_GENERIC
Hi,
I have tried to run some tests in parallel and noticed them failing non-deterministically:
When I say fails it is with "corruption_detected" error. When I say "non-deterministically" it means that consecutive run with the exact same inputs succeeds. I was suspecting my code is not thread safe so I ran in parallel different classes that don't share any code except the libzstd binary (like the above cases) to rule out my own faults. One additional observation: decompressContinue fails only if the original size exceed some threshold, e.g. 1M for levels 1,3,6; 2M for level 9. The parallel Zstd_compress is running with random small bufferes (0-32k) when decompressContinue fails.
So some questions:
Regards,
luben
When compiling with -fsanitize=undefined:
/buffer/zstd/zstd: /home/nemequ/local/src/squash/plugins/zstd/zstd/lib/zstd.c:918:54: runtime error: load of misaligned address 0x000000401b49 for type 'const void', which requires 8 byte alignment
0x000000401b49: note: pointer points here
00 00 00 4c 6f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 73 69 74 20 61 6d 65 74 2c 20 63
^
/home/nemequ/local/src/squash/plugins/zstd/zstd/lib/zstd.c:183:44: runtime error: load of misaligned address 0x000000401b49 for type 'const void', which requires 4 byte alignment
0x000000401b49: note: pointer points here
00 00 00 4c 6f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 73 69 74 20 61 6d 65 74 2c 20 63
^
/home/nemequ/local/src/squash/plugins/zstd/zstd/lib/zstd.c:185:47: runtime error: load of misaligned address 0x000000401b51 for type 'const void', which requires 8 byte alignment
0x000000401b51: note: pointer points here
20 69 70 73 75 6d 20 64 6f 6c 6f 72 20 73 69 74 20 61 6d 65 74 2c 20 63 6f 6e 73 65 63 74 65 74
^
OK
https://github.com/google/brotli
Would be nice to see on your performance test chart. Perhaps you guys can learn from each other?
I'm curious if this is compress
web-compatible please?
Hi Yann,
I get a segmentation fault during compression of a specific data buffer inside FSE_normalizeCount
The specific line is the call to FSE_adjustNormSlow at https://github.com/Cyan4973/zstd/blob/dev/lib/fse.c#L696
The frame stack is all screwed up examining core file, but I managed to get the following output when run through valgrind:
==32677== Invalid write of size 8
==32677== at 0x64DD4E7: FSE_normalizeCount (fse.c:696)
==32677== by 0x157: ???
==32677== by 0xFFEFF33AF: ???
==32677== by 0x9000000FE: ???
==32677== by 0xFFEFF31AF: ???
==32677== by 0x40012FFFFFFFF: ???
==32677== by 0x100FFFFFF9C: ???
==32677== Address 0xd5 is not stack'd, malloc'd or (recently) free'd
I can reliable reproduce this fault, so I might be able to provide a test case for you given some time.
Hi,
I am running into some issues with decompressing some of the results of ZSTD_HC_compress. I am using the very simple test case pasted below.
Trying to decompress the result of compressing 0 bytes buffer with level > 1 I am getting "ZSTD_error_corruption_detected" and with larger sizes buffer I am getting "ZSTD_error_srcSize_wrong". When I pass a certain threshold (15 in this case but it depends on the payload) everything starts to works correctly.
#include <stdio.h>
#include <stdlib.h>
#include <zstd.h>
#include <zstdhc.h>
int main(int argc, char **argv ) {
char raw[20] = {1,1,1,1,1, 1,1,1,1,1, 1,1,1,1,1, 1,1,1,1,1};
char compressed[50];
char decompressed[50];
size_t ccode, dcode;
size_t size = 2;
int level = 2;
ccode = ZSTD_HC_compress(compressed, 50, raw, size, level);
printf("Compression code %i\n", ccode);
if (ZSTD_isError(ccode)) {
printf("Compression error %s\n", ZSTD_getErrorName(ccode));
}
dcode = ZSTD_decompress(decompressed, 50, compressed, ccode);
printf("Decompression code %i\n", dcode);
if (ZSTD_isError(dcode)) {
printf("Decompression error %s\n", ZSTD_getErrorName(dcode));
}
return 0;
}
https://github.com/Cyan4973/zstd/archive/zstd-0.4.0.zip
Changed only:
GCC returns tens of errors (mainly undeclared functions):
gcc.exe -Wno-unknown-pragmas -Wno-sign-compare -Wno-conversion -fomit-frame-pointer -fstrict-aliasing -fforce-addr -ffast-math -O3 -DNDEBUG -DFREEARC_WIN -D__x86_64__ -D__SSE2__ -I. -DFREEARC_INTEL_BYTE_ORDER -D_UNICODE -DUNICODE -HAVE_CONFIG_H zstd/zstd.c -std=c99 -c -o zstd/zstd.o
zstd/zstd.c:608:8: error: conflicting types for 'ZSTD_compressBegin'
size_t ZSTD_compressBegin(ZSTD_CCtx* ctx, void* dst, size_t maxDstSize)
^
In file included from zstd/zstd.c:70:0:
zstd/zstd_static.h:124:8: note: previous declaration of 'ZSTD_compressBegin' was here
size_t ZSTD_compressBegin(ZSTD_CCtx* cctx, void* dst, size_t maxDstSize, int compressionLevel);
^
zstd/zstd.c: In function 'ZSTD_compressBegin':
zstd/zstd.c:617:24: error: 'ZSTD_magicNumber' undeclared (first use in this function)
MEM_writeLE32(dst, ZSTD_magicNumber);
^
zstd/zstd.c:617:24: note: each undeclared identifier is reported only once for each function it appears in
zstd/zstd.c: At top level:
zstd/zstd.c:774:8: error: conflicting types for 'ZSTD_compressCCtx'
size_t ZSTD_compressCCtx(ZSTD_CCtx* ctx, void* dst, size_t maxDstSize, const void* src, size_t srcSize)
Hi. It seems that zstd will read illegal pointers and crash when presented with mangled archives. Here's one such example file (GitHub doesn't allow binary attachments, so I'm providing a hex dump):
0000000 fd 2f b5 1c 00 00 1c 40 00 12 31 32 31 31 31 31
0000020 31 31 31 31 32 32 32 32 32 32 32 0a 10 98 00 ff
0000040 7f 00 84 c0 00 00
Here's what gdb has to say about this problem:
(gdb) run -d <example.zst >example
Starting program: zstd -d <example.zst >example
Program received signal SIGSEGV, Segmentation fault.
0x0000000000410965 in ZSTD_decompressBlock (srcSize=28, src=0x801011000, maxDstSize=524288, dst=0x801032000, ctx=0x801006000) at lib/zstd.c:1533
(gdb) bt
#0 0x0000000000410965 in ZSTD_decompressBlock (srcSize=28, src=0x801011000, maxDstSize=524288, dst=0x801032000, ctx=0x801006000) at lib/zstd.c:1533
#1 ZSTD_decompressContinue (dctx=0x801006000, dst=0x801032000, maxDstSize=524288, src=0x801011000, srcSize=31) at lib/zstd.c:1680
#2 0x0000000000408681 in FIO_decompressFilename (output_filename=0x410f65 "-",input_filename=0x410f65 "-") at programs/fileio.c:363
#3 0x0000000000401a4d in main (argc=2, argv=0x7fffffffd9d0) at programs/zstdcli.c:314
This is with zstd as of commit 00f9507; the crash is located over here. The problem is that ZSTD_decompressBlock
does not validate how big matchLength
can get; in this case it is equal to 8650883, while the maxDstSize
is only 524288 bytes, which results in an attempt to copy past the end of the output buffer.
Could you add #define to optionally disable compilation of decompression code? Thanks.
searchLength=7 in ZSTD_parameters is permitted, but it seems to give exactly the same compression and speed results as searchLength=4. Possibly this is because of the switches on matchLengthSearch in the code that have cases for 4-6 but not 7.
I was looking at making simple python wrappers for the zstd library, and I ran into the following issue when compling on Windows using VS2005:
lib\zstd.c(69) :fatal error C1083: Cannot open include file: 'immintrin.h': No such file or directory
It appears to come from this line.
zstd_static.h:74:38: warning: comma at end of enumerator list [-Wpedantic]
When compressing a large 6GB binary file a compression error happens. After some investigation it appears to happen when there are more symbols than the max symbol limit. The line is here
https://github.com/Cyan4973/zstd/blob/master/lib/fse.c#L1458
A simple temporary fix is just deleting this line, but my guess is that this isn't a good solution. Increasing the max symbol limit didn't seem to work, but I'm not that familiar with the code base so I'm sure I missed something.
I need to send large core dumps from embedded device to server, and do it in minimal time. I've found zstd to be optimal solution because it is very fast and has good compression ratio. I'm using it like this:
zstd -c | curl -T - $url
where kernel fills stdin of zstd
However, if user has narrow bandwidth for upload, it could be benifical to switch from fast compression to high compression which in my case is 30% more effective. For example, if in first 10 seconds zstd detects that compression throughput is N times larger than write speed (in this case to stdout), it automatically switches high compression and uses it up to the end of input stream.
Would such feature make sense?
When I try to decompress a freshly compressed file, it is correctly decompressed but at the end I get:
Error 35 : Read error
I've got unexpected error for huge (>4GiB ?) data which generated by xorshift. Here is a test case : https://gist.github.com/t-mat/a7e93d4767b991e191ea
It generates same error both on Ubuntu 14.04 (x64) / gcc 4.8.2 and Windows 7 SP1 (x64) / MSVC++2013.
source data : 4295000064 bytes
zstd compressed : 1350556646 bytes
zstd decompressed : 4295000064 bytes
Data error : offset @0x100003c41
I can prepare and push CMakeLists.txt file for generating solution or make file on different platforms. I don't know if this feature will be useful or not?
This is with zstd as of commit 765207c
In our case(TokuDB), zstd ZSTD_isError returns true sometimes, because:
#1 0x0000000000ca8836 in ZSTD_compressSequences (dst=0x2aaac4c0623a "\034", maxDstSize=11084, seqStorePtr=Unhandled dwarf expression opcode 0xf3
)
...
Breakpoint 8, ZSTD_isError (code=18446744073709551615)
my
srcLen=10486
dstLen=11092
I guess it's a bug in zstd to cause cSize to be 2^64 -1.
I'm currently doing some tests and benchmarks on the best way to compress vectors of numerical data using zstd (and various filters to help compression, such as delta or shuffle).
I found a few oddities while trying various sample sets, and I'm not sure if this is expected or not.
note: All the tests below are done with the 0.5.1 release.
The most obvious anomaly I found was when compressing vectors of 64-bit floats produced by going from 100.0
and randomly removing a tiny amount at each step (a few tens of %) while keeping the full precision (ie: 17 decimals), i.e: the sample set looks like this (whith each value encoded as an IEEE 64-bit decimal:
XS: [
99.9996492024556, 99.9996492024556, 99.9996492024556, 99.9996492024556, 99.9996492024556,
99.9959685493382, 99.9934385250828, 99.9915028228664, 99.9913987684419, 99.9876946097741,
99.9832594119447, 99.9832594119447, 99.9827409006435, 99.9827409006435, 99.9792033732376,
99.9792033732376, 99.9792033732376, 99.9792033732376, 99.9792033732376, 99.9779770870381,
...,
63.9185913211865, 63.9185913211865, 63.9185913211865, 63.9183349283913, 63.9173471531838,
63.9129878772782, 63.9129878772782, 63.9129878772782, 63.9129878772782, 63.910876784327,
63.910876784327, 63.9107775813923, 63.9107775813923, 63.9107775813923, 63.9107775813923,
63.9107775813923, 63.9077360819731, 63.9077360819731, 63.9059456902976, 63.9059456902976
]
I'm also compressing the delta-encoded vector (0 = no change):
DELTA(XS): [
99.9996492024556, 0, 0, 0, 0,
-0.00368065311744203, -0.00253002425540672, -0.00193570221631489, -0.000104054424497235, -0.00370415866780149,
-0.00443519782946566, 0, -0.000518511301152103, 0, -0.00353752740591062,
0, 0, 0, 0, -0.00122628619951115,
...,
-0.000363898205272051, 0, 0, -0.000256392795243698, -0.000987775207491381,
-0.00435927590558549, 0, 0, 0, -0.00211109295120337,
0, -9.92029346988943E-05, 0, 0, 0,
0, -0.00304149941915455, 0, -0.00179039167556283, 0
]
Now my benchmark compresses both vectors (N=28,800) encoded as 4-byte or 8-byte elements (115KB / 230KB), using multiple codecs (lz4, zstd, zlib) and also filtering (none, blosc-like shuffle), and measure both the ratio and time to encode.
The full benchmark results can be found here: https://gist.github.com/KrzysFR/0f6835c7a8d0f19dbdc3 (warning: lots of data and ASCII charts!)
The combination of delta-encoding + shuffling on 64-bit floats with full precision induce a very visible slowdown for levels 14 up to 21, going from 15ms at level 13 to 280ms at level 14 (up to 21). This bump is still there but less visible when rounding all the numbers to keep only 3 digits.
Below are some charts that show the results.
Comparison of compression time
top: original data set with full precision, bottom: rounded to 3 decimals
The yellow line is clearly causing troubles for levels 14
and up. We can also see that ZLib has some issues with it from level 7
and up
Here it is again, but using a log scale for the time:
Comparison of ratios
top: original data set with full precision, bottom: rounded to 3 decimals
Visual Studio 2012 Release mode - decompress data (which generated in HC mode) is broken again... Debug - is OK :)
Seems like a bug as in 0.2 version...
This input file
https://crashes.fuzzing-project.org/zstd-oob-stack-HUF_readStats
causes an out of bounds stack read access in zstd. To see this one needs to compile zstd with address sanitizer (-fsanitize=address in CFLAGS).
Issue was found with the help of american fuzzy lop.
This is the output from address sanitizer:
==19506==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffef5269784 at pc 0x0000004faff7 bp 0x7ffef5269520 sp 0x7ffef5269518
READ of size 4 at 0x7ffef5269784 thread T0
#0 0x4faff6 in HUF_readStats /f/zst/zstd/programs/../lib/huff0.c:612:9
#1 0x4fa03d in HUF_readDTableX2 /f/zst/zstd/programs/../lib/huff0.c:644:13
#2 0x501080 in HUF_decompress4X2 /f/zst/zstd/programs/../lib/huff0.c:859:17
#3 0x5138d2 in HUF_decompress /f/zst/zstd/programs/../lib/huff0.c:1701:23
#4 0x4e4b39 in ZSTD_decompressLiterals /f/zst/zstd/programs/../lib/zstd.c:1078:21
#5 0x4e4b39 in ZSTD_decodeLiteralsBlock /f/zst/zstd/programs/../lib/zstd.c:1102
#6 0x4e6bb5 in ZSTD_decompressBlock /f/zst/zstd/programs/../lib/zstd.c:1468:23
#7 0x4e68b2 in ZSTD_decompressContinue /f/zst/zstd/programs/../lib/zstd.c:1622:21
#8 0x52dcbf in FIO_decompressFrame /f/zst/zstd/programs/fileio.c:396:23
#9 0x52e721 in FIO_decompressFilename /f/zst/zstd/programs/fileio.c:492:21
#10 0x530fe3 in main /f/zst/zstd/programs/zstdcli.c:352:9
#11 0x7f76943b6f9f in __libc_start_main /var/tmp/portage/sys-libs/glibc-2.20-r2/work/glibc-2.20/csu/libc-start.c:289
#12 0x4377c6 in _start (/mnt/ram/zstd/zstd+0x4377c6)
Address 0x7ffef5269784 is located in stack of thread T0 at offset 420 in frame
#0 0x4f9edf in HUF_readDTableX2 /f/zst/zstd/programs/../lib/huff0.c:630
This frame has 4 object(s):
[32, 288) 'huffWeight'
[352, 420) 'rankVal' <== Memory access at offset 420 overflows this variable
[464, 468) 'tableLog'
[480, 484) 'nbSymbols'
HINT: this may be a false positive if your program uses some custom stack unwind mechanism or swapcontext
(longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow /f/zst/zstd/programs/../lib/huff0.c:612 HUF_readStats
Shadow bytes around the buggy address:
0x10005ea452a0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10005ea452b0: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
0x10005ea452c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10005ea452d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10005ea452e0: f2 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
=>0x10005ea452f0:[04]f2 f2 f2 f2 f2 04 f2 04 f3 f3 f3 00 00 00 00
0x10005ea45300: 00 00 00 00 00 00 00 00 00 00 00 00 f1 f1 f1 f1
0x10005ea45310: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10005ea45320: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10005ea45330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x10005ea45340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Heap right redzone: fb
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack partial redzone: f4
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==19506==ABORTING
It seems that these files in all branches have CRLF line endings instead of LF.
It seems that you have lost 20% of decompression speed in v0.4:
Compressor name | Compression | Decompress. | Compr. size | Ratio |
---|---|---|---|---|
zstd_HC v0.3.6 level 1 | 250 MB/s | 529 MB/s | 51230550 | 48.86 |
zstd_HC v0.3.6 level 2 | 186 MB/s | 498 MB/s | 49678572 | 47.38 |
zstd_HC v0.3.6 level 3 | 90 MB/s | 484 MB/s | 48838293 | 46.58 |
zstd_HC v0.3.6 level 4 | 75 MB/s | 474 MB/s | 48423913 | 46.18 |
zstd_HC v0.3.6 level 5 | 61 MB/s | 467 MB/s | 46480999 | 44.33 |
zstd_HC v0.3.6 level 6 | 40 MB/s | 477 MB/s | 45723093 | 43.60 |
zstd_HC v0.3.6 level 7 | 28 MB/s | 480 MB/s | 44803941 | 42.73 |
zstd_HC v0.3.6 level 8 | 21 MB/s | 475 MB/s | 44511976 | 42.45 |
zstd_HC v0.3.6 level 9 | 15 MB/s | 497 MB/s | 43899996 | 41.87 |
zstd_HC v0.3.6 level 10 | 16 MB/s | 493 MB/s | 43845344 | 41.81 |
zstd_HC v0.3.6 level 11 | 15 MB/s | 491 MB/s | 42506862 | 40.54 |
zstd_HC v0.3.6 level 12 | 11 MB/s | 493 MB/s | 42402232 | 40.44 |
zstd v0.4 level 1 | 244 MB/s | 492 MB/s | 51160301 | 48.79 |
zstd v0.4 level 2 | 176 MB/s | 443 MB/s | 49719335 | 47.42 |
zstd v0.4 level 3 | 88 MB/s | 422 MB/s | 48749022 | 46.49 |
zstd v0.4 level 4 | 74 MB/s | 402 MB/s | 48352259 | 46.11 |
zstd v0.4 level 5 | 69 MB/s | 387 MB/s | 46389082 | 44.24 |
zstd v0.4 level 6 | 36 MB/s | 387 MB/s | 45525313 | 43.42 |
zstd v0.4 level 7 | 29 MB/s | 390 MB/s | 44805120 | 42.73 |
zstd v0.4 level 8 | 23 MB/s | 389 MB/s | 44509894 | 42.45 |
zstd v0.4 level 9 | 16 MB/s | 402 MB/s | 43892280 | 41.86 |
zstd v0.4 level 10 | 18 MB/s | 407 MB/s | 43807530 | 41.78 |
zstd v0.4 level 11 | 15 MB/s | 417 MB/s | 42498160 | 40.53 |
zstd v0.4 level 12 | 11 MB/s | 406 MB/s | 42394424 | 40.43 |
Report by Jim Meyering :
Please make it diagnose and exit nonzero upon write failure. A good way to demonstrate the problem is to use linux's /dev/full device:
$ echo foo | programs/zstd > /dev/full; echo $?
Original discussion : https://groups.google.com/forum/#!topic/lz4c/EzasmWCYCCM
This will require an update of the frame format
I've got another failure case. This time, the data crashes during decompression. I've uploaded a gzipped version of the bad2.bin at the gist https://gist.github.com/mwiebe/c54c790288b8e16a7970.
C:\Dev>dir bad2.bin
Volume in drive C is Windows
Volume Serial Number is DA14-224A
Directory of C:\Dev
2015-03-25 04:51 PM 6,434,440 bad2.bin
1 File(s) 6,434,440 bytes
0 Dir(s) 266,752,405,504 bytes free
C:\Dev>"C:\Dev\zstd\visual\2012\x64\Release\zstd.exe" bad2.bin out2.bin
Compressed 6434440 bytes into 3003788 bytes ==> 46.68%
C:\Dev>"C:\Dev\zstd\visual\2012\x64\Release\zstd.exe" -d out2.bin bad2.roundtrip.bin
**CRASH**
I understand that the project is in experimental stage and I am not expecting to be bugs-free. So here is one bug.
Sometimes the destination buffer bounds are not checked properly (when compressing/decompressing) and overflows happens that could lead to a lot of nasty thing. Here is a code that demonstrates it:
#include <stdio.h>
#include <stdlib.h>
#include <zstd.h>
int main(int argc, char **argv ) {
char *raw = (char *)malloc(1000);
char compressed[20];
char decompressed[900];
size_t i, ccode, dcode;
// fill it with ones
for (i=0; i<1000; i++) raw[i] = 1;
ccode = ZSTD_compress(compressed, 20, raw, 1000);
printf("Compression code %i\n", ccode);
dcode = ZSTD_decompress(decompressed, 900, compressed, ccode);
printf("Decompression code %i\n", dcode);
return 0;
}
Regards
Error due to type redefinition.
Original discussion : #24
Note : it's not an issue for later versions of GCC and clang, because the type redefinition defines exactly the same type. But earlier GCC versions nonetheless consider it an error.
When provided with a short input (in this case, a single byte), zstd will read outside of the input buffer. Here is a log from the single-byte test run under AddressSanitizer:
It would be great if zstd could use at least 2 threads
Hi,
is this an alternative to Gzip?
Also, what kind of files/objects can it compress - just base HTML or also JS, CSS, Images etc?
Lastly, can it be deployed to any OS - Apache/IIS?
Also, if I need to 'activate' this on an existing website, what steps do I need to follow?
[Cloudflare is seeking better compression](https://blog.cloudflare.com/results-experimenting-brotli/}, so I was wondering how this would compare to another actively developed compressor brotili?
For example, on a file containing 10,000 repetitions of "All work and no play makes Jack a dull boy.\n" (440,000 bytes total), zstd -b15 gives about 23 MB/s on my laptop while zstd -b16 and higher give about 0.02 MB/s. I had to add another digit to the speed output to see anything but 0.0. I assume the switch to the btlazy2 strategy is what makes the difference.
Hi Yann,
I'm just posting this as a courtesy message to say that I intend to package the Zstd library for Debian. A quick question for you, is the preferred name "zstd", or "zstandard"?
I'll post back with progress as it occurs.
Cheers,
Kevin
ZSTD_LEGACY_SUPPORT if not defined is defined to 1 in zstd_decompress.c
Then later in the file:
However, zstd_legacy.h isn't present on github.
tl;dr: this is a codegen issue with VS2013 in Release x64, see discussion below.
I have a test that attempts to compress and decompress vectors of raw data, and some of them (1 in 100+?) consistently crash during decompression. The crash is reproducible and deterministic, always on the same vectors.
The test program is written in .NET 4.6 and is using PInvoke to call into a version of zstd build as an x64 dll, compiled from the 0.2.1 release (9e61835).
The crash message is Stack cookie instrumentation code detected a stack-based buffer overrun
and looks like the stack was overwritten by garbage during decoding.
When I try to decompress the data using zstd.exe from the command line, it works fine. But whenever I try to decompress it from my code, it crashes. I tried either calling ZSTD_compress(...)
directly, or reimplementing the same logic in fileio.c (using a DCTX and calling repeatedly ZSTD_nextSrcSizeToDecompress
and ZSTD_decompressContinue
) and both fail exactly the same way (and also both work perfectly fine with the other 99% vectors).
The crash occurs during the first call to ZSTD_decompressContinue() that has actual compressed data (ie: first call with the frame header returns 0, then next call with the first chunk of compressed data crashes).
I was able to create a pair of one file that decompress fine, and another file that systematically crashes, and can reproduce the issue with the test (.NET code). The same code works perfectly with the previous 0.1.x branch.
The original files (both are a highly compressed vector of integer values) can be found here: https://github.com/KrzysFR/frqsspslt/blob/76e8e799c936096819bb7b97bfb13d764949d115/attachments/zstd/sample_data.zip?raw=true
Test program:
var files = new[] { "original_pass.bin", "original_fail.bin" };
foreach (var file in files)
{
Trace.WriteLine("## " + file);
var original = new ArraySegment<byte>(File.ReadAllBytes(Path.Combine(@"..\..", file)));
ulong h1 = XxHash64.FromBytes(original);
Trace.WriteLine($"> Original : {original.Count,10:N0} bytes (hash: 0x{h1:x16})");
var compressed = ZStd.CompressBuffer(original);
Trace.WriteLine($"> Compressed : {compressed.Count,10:N0} bytes (hash: 0x{XxHash64.FromBytes(compressed):x16})");
using (var fs = File.Create(Path.Combine(@"..\..", file + ".zst")))
{ // save to disk (for reference)
fs.Write(compressed.Array, compressed.Offset, compressed.Count);
}
var decompressed = ZStd.DecompressBuffer(compressed, originalSize: original.Count);
var h2 = XxHash64.FromBytes(decompressed);
Trace.WriteLine($"> Decompressed: {decompressed.Count,10:N0} bytes (hash: 0x{h2:x16})");
if (h1 != h2)
{
Trace.WriteLine("> FAILED! hashes to not match!");
Trace.WriteLine(HexaDump.Versus(original, compressed));
}
else
{
Trace.WriteLine("> PASS");
}
}
Outputs:
## original_pass.bin
> Original : 1,469,465 bytes (hash: 0x514cf5e26f0c9054)
> Compressed : 2,855 bytes (hash: 0xa3a91b9ecdfaed20)
> Decompressed: 1,469,465 bytes (hash: 0x514cf5e26f0c9054)
> PASS
## original_fail.bin
> Original : 1,958,527 bytes (hash: 0xd6172d7d482e5460)
> Compressed : 177,267 bytes (hash: 0x452a6dce4b2dfb04)
> Decompressed: >CRASH< (StackOverflow?)
Unhandled exception at 0x00007FF964CDC798 (zstd_x64.dll) in Test.exe: Stack cookie instrumentation code detected a stack-based buffer overrun.
Attaching a debugger, I get the following stacktrace:
CallStack:
zstd_x64.dll!__report_gsfailure(unsigned __int64 StackCookie) Line 151 C
zstd_x64.dll!__GSHandlerCheck(_EXCEPTION_RECORD * ExceptionRecord, void * EstablisherFrame, _CONTEXT * ContextRecord, _DISPATCHER_CONTEXT * DispatcherContext) Line 91 C
ntdll.dll!RtlpExecuteHandlerForException() Unknown
ntdll.dll!RtlDispatchException() Unknown
ntdll.dll!KiUserExceptionDispatch() Unknown
> zstd_x64.dll!HUF_fillDTableX4Level2(HUF_DEltX4 * DTable, unsigned int sizeLog, const unsigned int consumed, const unsigned int * rankValOrigin, const int minWeight, const sortedSymbol_t * sortedSymbols, const unsigned int sortedListSize, unsigned int nbBitsBaseline, unsigned short baseSeq) Line 893 C
zstd_x64.dll!HUF_fillDTableX4(HUF_DEltX4 * DTable, const unsigned int targetLog, const sortedSymbol_t * sortedList, const unsigned int sortedListSize, const unsigned int * rankStart, unsigned int[17] * rankValOrigin, const unsigned int maxWeight, const unsigned int nbBitsBaseline) Line 951 C
zstd_x64.dll!HUF_readDTableX4(unsigned int * DTable, const void * src, unsigned __int64 srcSize) Line 1041 C
zstd_x64.dll!HUF_decompress4X4(void * dst, unsigned __int64 dstSize, const void * cSrc, unsigned __int64 cSrcSize) Line 1255 C
// everything before that in the callstack looks like garbage (more like data, and not actual method pointers)
0109000701090007() Unknown
0109000701090007() Unknown
0109000701090007() Unknown
//... this garbage address is repeated about a thousand times, and is the same value in cSrcSize/dstSize below, which confirms that the stack has been overwritten
0109000701090007() Unknown
0109000701090007() Unknown
0000002078746341() Unknown
000030b400000001() Unknown
00000000000000dc() Unknown
0000000000000020() Unknown
0000000100000014() Unknown
0000003400000007() Unknown
000000010000017c() Unknown
HUF_fillDTableX4Level2()
locals:
baseSeq 0x0007 unsigned short
consumed 0x7ef2b8e0 const unsigned int
+ DElt {sequence=0x0007 nbBits=0x09 '\t' length=0x01 '\x1' } HUF_DEltX4
+ DTable 0x0000006f7edd9fb0 {sequence=0x0405 nbBits=0x06 '\x6' length=0x04 '\x4' } HUF_DEltX4 *
minWeight 0x00000007 const int
nbBitsBaseline 0x0000000a unsigned int
+ rankVal 0x0000006f7edd95c0 {0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x00000000, 0x5ea64b14, 0x00007ff9, ...} unsigned int[0x00000011]
+ rankValOrigin 0x00007ff95ea037fd {Inside clr.dll!EEHeapFreeInProcessHeap(void)} {0x48c0b60f} const unsigned int *
sizeLog 0x00000000 unsigned int
sortedListSize 0x00000003 const unsigned int
+ sortedSymbols 0x0000006f7edd9fba {symbol=0x00 '\0' weight=0x07 '\a' } const sortedSymbol_t *
HUF_fillDTableX4()
locals:
+ DElt {sequence=0xb8e0 nbBits=0xf2 'ò' length=0x7e '~' } HUF_DEltX4
+ DTable 0x0000000000008918 {sequence=??? nbBits=??? length=??? } HUF_DEltX4 *
maxWeight 0x00000007 const unsigned int
nbBitsBaseline 0x0000000a const unsigned int
+ rankStart 0x0000006f7edd97e0 {0x00000000} const unsigned int *
+ rankVal 0x0000006f7edd96f0 {0x00000000, 0x00000000, 0x000007b0, 0x000007c0, 0x000007c0, 0x00000880, 0x00000a00, ...} unsigned int[0x00000011]
+ rankValOrigin 0x0000006f7edd9880 {0x00000000, 0x00000000, 0x000007b0, 0x000007c0, 0x000007c0, 0x00000880, 0x00000a00, ...} unsigned int[0x00000011] *
+ sortedList 0x0000006f19f628c8 {symbol=0x00 '\0' weight=0x00 '\0' } const sortedSymbol_t *
sortedListSize 0x00000008 const unsigned int
targetLog 0x7edd9890 const unsigned int
HUF_readDTableX4()
locals:
+ DTable 0x0000006f7eeeddb0 {0x5f0d2408} unsigned int *
nbSymbols 0x00000100 unsigned int
+ rankStart0 0x0000006f7edd97e0 {0x00000000, 0x00000000, 0x000000f6, 0x000000f7, 0x000000f7, 0x000000fa, 0x000000fd, ...} unsigned int[0x00000012]
+ rankStats 0x0000006f7edd9830 {0x00000000, 0x000000f6, 0x00000001, 0x00000000, 0x00000003, 0x00000003, 0x00000000, ...} unsigned int[0x00000011]
+ rankVal 0x0000006f7edd9880 {0x0000006f7edd9880 {0x00000000, 0x00000000, 0x000007b0, 0x000007c0, 0x000007c0, ...}, ...} unsigned int[0x00000010][0x00000011]
+ sortedSymbol 0x0000006f7edd9dc0 {{symbol=0x07 '\a' weight=0x01 '\x1' }, {symbol=0x08 '\b' weight=0x01 '\x1' }, {symbol=...}, ...} sortedSymbol_t[0x00000100]
src mscorlib.ni.dll!0x00007ff95d0c0220 (load symbols for additional information) const void *
srcSize 0x0000000000008918 unsigned __int64
tableLog 0x00000009 unsigned int
+ weightList 0x0000006f7edd9cc0 "\a\x5\x5\x5\x4\x4\x4\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x2\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\a\a\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1\x1... unsigned char[0x00000100]
HUF_decompress4X4()
locals:
cSrc 0x0109000701090007 const void *
cSrcSize 0x0109000701090007 unsigned __int64
dst 0x0109000701090007 void *
dstSize 0x0109000701090007 unsigned __int64
+ DTable 0x0000006f7edda040 {0x0000000c, 0x01090007, 0x01090007, 0x01090007, 0x01090007, 0x01090007, 0x01090007, ...} unsigned int[0x00001001]
My guess is that DTable
which is allocated on the stack, was overwritten somewhere, which makes it impossible for the debugger to unwind the stack properly.
Is it possible to "split" algorithm to many compressing/decompressing parallels blocks? It will be great idea to speedup library up to 10x.
CUDA is my hobby and I want to help improve zstd in future :)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.