Comments (13)
Wow. This is really bad.
I was in the process of releasing 3.7.5, but I might have to delay this until I could take some time to study your problem more closely. Thanks a lot for the careful analysis!
I would very much like to have the grammar that triggered the problem, please.
Cheers!
from bison.
Here's the grammar: https://gist.github.com/bazsi/842eaa86760da2c13d65b5d0b6b1dcfa
from bison.
btw: I went back to an earlier version of syslog-ng (3.29) that doesn't yet have the 3.4.x+ updates, so this will spew a lot of warnings with the latest master. I did this to be able to bisect from 3.0.5 (which worked) up to 3.7.4.
So don't be alarmed by the amount of warnings this will generate, we are using these options:
-Wno-yacc -Wno-other -Werror=conflicts-sr -Werror=conflicts-rr
from bison.
I was looking at the patch, and although I don't really understand the internals of bison, but one thing is a bit strange. I am using hunks of the patch revert, so "-" is in fact the wrong version and "+" is the correct (but slow) one.
This is the change that converted the use of an array to the use of a bitset. The single location that flips bits in our bitset:
@@ -750,12 +746,7 @@ pack_table (void)
/* Action of I were already coded for S. */
place = base[s];
- /* Store PLACE into POS_SET. PLACE might not belong to the set
- of possible values for instance with useless tokens. It
- would be more satisfying to eliminate the need for this
- 'if'. */
- if (0 <= nstates + place)
- bitset_set (pos_set, nstates + place);
+ pos[i] = place;
base[order[i]] = place;
}
pos[i] gets assigned "place", which is an index of some kind (e.g not just 0 or 1). Then the read side of the bitset:
@@ -665,8 +659,10 @@ pack_vector (vector_number vector)
ok = false;
}
- if (ok && bitset_test (pos_set, nstates + res))
- ok = false;
+ if (ok)
+ for (int k = 0; k < vector; k++)
+ if (pos[k] == res)
+ ok = false;
}
if (ok)
If you look at the conditional that turns ok to false, we check if pos[k] is EQUAL to "res", not just non-zero, but is equal to the loop variable in pack_vector(). This means that whereas earlier we only turned ok to false when pos[k] was equal to the current loop variable, now we do that every time even if that was set in an earlier iteration.
I might be completely at loss here, but the conversion does not seem to be equivalent.
from bison.
scrap my last comment. sorry for the noise.
from bison.
I don't really understand the internals of bison
The problem is... I don't know that part very well either....
If you look at the conditional that turns ok to false, we check if pos[k] is EQUAL to "res", not just non-zero, but is equal to the loop variable in pack_vector().
I don't see that. In the +
version above, we iterate over the whole pos
(an array of ints) to see if res
is in it. Its position in pos
is irrelevant.
from bison.
I think that the part:
/* Store PLACE into POS_SET. PLACE might not belong to the set
of possible values for instance with useless tokens. It
would be more satisfying to eliminate the need for this
'if'. */
if (0 <= nstates + place)
bitset_set (pos_set, nstates + place);
is worth worrying (I mean, the if
: it should be unconditional). And it would also explain why this went unnoticed for so long but hit you: it would be related to useless tokens, that most people don't have.
from bison.
I think I have the problem, I just can't come up with a good solution yet.
I have added some logging into these places, and I've found differences between the "good" and the "bad" versions and it indeed relates to the condition before bitset_set().
It seems that a bitset_set() is not invoked on a "place" value of -147, because "nstates + place" is indeed less than zero (nstates being 70) When executing bison in the "good" version, -147 is matched in the original "pos" array and thus skipped in pack_vector().
So the problem seems to be that we indeed need to eliminate that conditional by shifting the bitset even more. So the base "nstates" is not the right base that the code uses right now. We need to find the one that shifts "place" to be a positive value and thus eliminate the conditional.
I am not sure what "place" is, for me the most negative value is -147, which does not relate to the number of states (70), the number of vectors (86), the number of entries (50).
This is the access log of the "pos_set" bitset.
- Whenever you see "res XXX is ok", that means that we haven't found a res value.
- When you see "not storing: place=XXXX" that means that we don't store a value in the bitset because of the conditional.
The problem starts at the 9th res XXX is ok, where pack_vector() considers -147 and accepts it, and if you see earlier in the log, -147 should already have been stored in pos_set, but it wasn't, since -147 + nstates is still less than zero.
So my conclusion is that we indeed need to drop the conditional and do that by shifting the bitset even more to positive values so that it can encapsulate all potential bits.
res -87 is ok
not storing: place=-87
not storing: place=-87
res -20 is ok
res -125 is ok
not storing: place=-125
res -124 is ok
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
res -147 is ok
not storing: place=-147 <---- THIS IS WHERE WE DON'T STORE -147
res -4 is ok
res -142 is ok
not storing: place=-142
res 16 is ok
res -147 is ok <----- THIS IS WHERE THE BUG IS TRIGGERED -147 is already encountered at this point, but wasn't stored in pos_set.
not storing: place=-147
res -146 is ok
not storing: place=-146
res -143 is ok
not storing: place=-143
res -142 is ok
not storing: place=-142
res -141 is ok
not storing: place=-141
res -140 is ok
not storing: place=-140
res -139 is ok
not storing: place=-139
res -138 is ok
not storing: place=-138
res -137 is ok
not storing: place=-137
res -136 is ok
not storing: place=-136
res -136 is ok
not storing: place=-136
res -135 is ok
not storing: place=-135
res -134 is ok
not storing: place=-134
res -133 is ok
not storing: place=-133
res -132 is ok
not storing: place=-132
res -131 is ok
not storing: place=-131
res -129 is ok
not storing: place=-129
res -126 is ok
not storing: place=-126
res -124 is ok
not storing: place=-124
res -123 is ok
not storing: place=-123
res -122 is ok
not storing: place=-122
res -121 is ok
not storing: place=-121
res -120 is ok
not storing: place=-120
res -119 is ok
not storing: place=-119
res 29 is ok
res 1 is ok
res 14 is ok
res 28 is ok
res -9 is ok
res 12 is ok
res -87 is ok
not storing: place=-87
not storing: place=-87
res -20 is ok
res -125 is ok
not storing: place=-125
res -124 is ok
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
res -147 is ok
not storing: place=-147
res -4 is ok
res -142 is ok
not storing: place=-142
res 16 is ok
res -147 is ok
not storing: place=-147
res -146 is ok
not storing: place=-146
res -143 is ok
not storing: place=-143
res -142 is ok
not storing: place=-142
res -141 is ok
not storing: place=-141
res -140 is ok
not storing: place=-140
res -139 is ok
not storing: place=-139
res -138 is ok
not storing: place=-138
res -137 is ok
not storing: place=-137
res -136 is ok
not storing: place=-136
res -136 is ok
not storing: place=-136
res -135 is ok
not storing: place=-135
res -134 is ok
not storing: place=-134
res -133 is ok
not storing: place=-133
res -132 is ok
not storing: place=-132
res -131 is ok
not storing: place=-131
res -129 is ok
not storing: place=-129
res -126 is ok
not storing: place=-126
res -124 is ok
not storing: place=-124
res -123 is ok
not storing: place=-123
res -122 is ok
not storing: place=-122
res -121 is ok
not storing: place=-121
res -120 is ok
not storing: place=-120
res -119 is ok
not storing: place=-119
res 29 is ok
res 1 is ok
res 14 is ok
res 28 is ok
res -9 is ok
res 12 is ok
from bison.
I'm at the same place, and I have a working solution: really implementing a set of integers on top of bitsets. The problem is that the "base" (the smallest int to represent) may change, and that's where the current implementation is wrong.
I've also discovered that we are using way too much bits. On your grammar I have:
pos_set (32915, -147) = -147 -146 -143 -142 -141 -140 -139 -138 -137 -136 -135 -134 -133 -132 -131 -129 -126 -124 -123 -122 -121 -120 -119 -58 -20 -9 -4 1 12 14 16 28 29
I.e., we have 32915 bits of range, although we only need to go from -147 to 29. I'm looking a being less eager right now.
from bison.
Balazs, I will not have enough time to finish this today. I shall address this tomorrow in the morning. Thanks a lot for having found the bad commit, this was an immense help!
What I really wish I had right now is a small grammar I could add to the test suite to check that in the future. If I can't design one, would you agree to sign disclaimers to the FSF so that I could add your grammar to the test suite?
Cheers!
from bison.
from bison.
Hi Balazs
Thanks a lot for your efforts, really appreciate them.
Well, I'm sorry in the first place for having let the bug in...
I probably have my signature on file with the FSF,
I need something specific to Bison.
but I am willing to do any paperwork necessary. I problem with submitting the code directly is that it's not just my copyright in there. I sold the company that originally wrote syslog-ng, thus some of the copyright is already at its new owner. And I am not in any decisive capacity. Maybe if I would remove the code sections, comments and change naming and only retain the structure of the grammar that would suffice to reproduce the problem and would get rid off any copyrightable material. What do you think?
I can't tell. And actually, the FSF is not really concerned about that: that's really the problem of the current owner. Of course, if I were them, I would definitely understand that:
- the code is GPL anyway, so it does not make any difference for them
- none of the genuine content of the code will be used, we just need the parser (not the actions)
- this is to ensure that a dependency of their tool (Bison) does not wreck the tool
But then again, I can't tell.
However I confess that the most precious test case would be some parsing that does fail without the fix. I dislike tests that just look at the generated tables: some day we might want to change the implementation of these tables. They are irrelevant per-se, what matters is that the parser behaves correctly. Unfortunately reproducing your problem is already difficult when comparing the tables (none of the test case in Bison's test suite actually change with the fix!), it seems even more delicate to have a running example.
So I would really like to have not just your grammar go into the test suite, but also the input that triggers the bad behavior. We still don't care about the genuine actions, they are still irrelevant. But I need something to exercise the behavior rather than the details of the implementation.
from bison.
Released in 3.7.5.
from bison.
Related Issues (20)
- Escaping dollar sign in a rule action
- Associating %prec with a rule without any occurrences of tokens
- [TC] Problem: Bad Variant Access errors are not explicit
- Request for latest release HOT 7
- Hello, if I want to print the full string content of the current child node, what do I need to do
- --file-prefix in the docs HOT 1
- FIXME - java HOT 2
- Broken for D parsers: %code lexer { HOT 1
- Compilation error: No rule to make target 'textstyle.h' HOT 2
- 3.8.2 testsuite segfault on armv6 HOT 3
- D skeleton file breaks recent D compilers with example code from manual HOT 6
- yynerrs unused-but-set-variable warning with Clang 15 HOT 1
- Can't build with conan, using clang compiler, having dependencies as shared libs HOT 4
- [BUG] reachable assertation in string_decode, bison HOT 1
- [BUG] abitset_set is reachable by crafted input, which cause the program abort HOT 1
- How to remove entries from the token list at runtime / from being passed to yysyntax_error? HOT 1
- [NonFetal Error]: use-of-uninitialized-value in bison(version 3.8.2.45, commit 25b3d0e1)
- No example code in the C++ examples HOT 2
- make fails HOT 9
- git shallow clone - error: Server does not allow request for unadvertised object - git.sv.gnu.org - git.savannah.gnu.org
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bison.