Comments (9)
When reaching this line, if the test triggers the error code,
this is pretty bad : it means the normalized distribution is not correctly normalized.
In such case, the algorithm is right to stop processing there.
Now, if that is the right scenario, it means the next step is to understand why the normalization would fail. It could be a very specific corner case that the fuzzer is unable to produce.
It may sound a long stretch, but could it be possible to access the faulty file, for debugging ?
If the problem is the one described above, it means it's not related to the size of the file.
It might be possible to capture just the place where the problem occurs.
from zstd.
Sorry, I can't give out the faulty file since it contains some sensitive information. One thing to note is that after removing the line and compressing, then decompressing the file, the file is a perfect match for the original.
from zstd.
OK.
It's a pity to not be able to investigate the problem, but I'm glad it works correctly for you.
To be fair, I'm very surprised : this test was supposed to be an important sanitizer check, I took for granted that compression would necessarily behave badly if it fails.
Apparently that's not always the case.
Also, as a secondary question :
Do you have any idea why "there are more symbols than the max symbol limit" ?
This situation is not supposed to happen, so it would be interesting to understand why it does.
from zstd.
The comment "there are more symbols than the max symbol limit" was purely a guess from what I saw of the code. It's very likely I mis-read the code and the real issue is something else entirely.
If it helps, I can give you some gdb output, so for example here's some variables from gdb when it stops at that line
Breakpoint 1, FSE_buildCTable (CTable=0x7fffffff1d10, normalizedCounter=0x7fffffff3920, maxSymbolValue=252, tableLog=8) at ../lib/fse.c:1460
1460 return (size_t)-FSE_ERROR_GENERIC; /* Must have gone through all positions */
(gdb) p position
$1 = 49
(gdb) p maxSymbolValue
$2 = 252
(gdb) p symbol
$3 = 253
from zstd.
OK.
I see that the compression level is quite affected, since it tries to fit up to 253 symbols into a table a 256 elements. It can work, but compression ratio will suffer considerably.
symbol necessarily exits the look at maxSymbolValue+1, so this part is correct.
What is not correct is "position", which is supposed to end at "0".
Here it ends at "49".
It should be possible to know how many symbols are missing from this value. Not sure if it is very useful though.
I've made a small update of FSE within the "dev" branch of FSE.
https://github.com/Cyan4973/FiniteStateEntropy/tree/dev
Maybe it can help to solve this situation.
from zstd.
When using the dev branch of FSE, compression works.
from zstd.
Thanks for the feedback
from zstd.
Fix integrated into zstd "dev" branch
from zstd.
merged into master
from zstd.
Related Issues (20)
- Add library and cli flags for file format with embedded dictionary
- Question about ZSTD protocole HOT 2
- Building on MacOS 13 and targeting MacOS 11 and SDK 11.3 (or any other MacOS version) does not work HOT 2
- Integrating the library with an external thread pool HOT 2
- Is it safe to move compression and decompression contexts between threads? HOT 1
- ZDICT_trainFromBuffer_cover is not thread safe HOT 17
- zstd compression output differens with the same options between 1.5.5 and 1.5.6 HOT 5
- Warning message for `zstd -v --train` is missing line breaks
- How to accelerate the process of dictionary training in zstd? HOT 5
- tests/cli-tests/cltools/zstdless.sh fails with newer version of less HOT 3
- Please promote thread pools from experimental to stable HOT 1
- The CMake build script breaks check_ipo_supported
- Dynamic decompression HOT 3
- Change `dictionary_compression.c` example to use API for dictionary creation
- Enable weak symbol support for Risc-V? HOT 1
- Possibly missing check for truncated initial states in Huffman weight block HOT 4
- Poor compressor behavior on interleaved data HOT 2
- zstd 1.5.5+ has worse performance on Graviton2 nodes than v1.4.4 HOT 4
- [Not a bug] Dictionary building strategy HOT 7
- CLI: Hang bomb with with crafted circular symbolic link causes "zstd -d -r -f" to infinitely loop. "pigz -d-r -f" skips symbolic links with non compressed suffix
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zstd.