Coder Social home page Coder Social logo

Comments (13)

akimd avatar akimd commented on May 21, 2024

Wow. This is really bad.

I was in the process of releasing 3.7.5, but I might have to delay this until I could take some time to study your problem more closely. Thanks a lot for the careful analysis!

I would very much like to have the grammar that triggered the problem, please.

Cheers!

from bison.

bazsi avatar bazsi commented on May 21, 2024

Here's the grammar: https://gist.github.com/bazsi/842eaa86760da2c13d65b5d0b6b1dcfa

from bison.

bazsi avatar bazsi commented on May 21, 2024

btw: I went back to an earlier version of syslog-ng (3.29) that doesn't yet have the 3.4.x+ updates, so this will spew a lot of warnings with the latest master. I did this to be able to bisect from 3.0.5 (which worked) up to 3.7.4.

So don't be alarmed by the amount of warnings this will generate, we are using these options:
-Wno-yacc -Wno-other -Werror=conflicts-sr -Werror=conflicts-rr

from bison.

bazsi avatar bazsi commented on May 21, 2024

I was looking at the patch, and although I don't really understand the internals of bison, but one thing is a bit strange. I am using hunks of the patch revert, so "-" is in fact the wrong version and "+" is the correct (but slow) one.

This is the change that converted the use of an array to the use of a bitset. The single location that flips bits in our bitset:

@@ -750,12 +746,7 @@ pack_table (void)
         /* Action of I were already coded for S.  */
         place = base[s];
 
-      /* Store PLACE into POS_SET.  PLACE might not belong to the set
-         of possible values for instance with useless tokens.  It
-         would be more satisfying to eliminate the need for this
-         'if'.  */
-      if (0 <= nstates + place)
-        bitset_set (pos_set, nstates + place);
+      pos[i] = place;
       base[order[i]] = place;
     }
 

pos[i] gets assigned "place", which is an index of some kind (e.g not just 0 or 1). Then the read side of the bitset:

@@ -665,8 +659,10 @@ pack_vector (vector_number vector)
               ok = false;
           }
 
-        if (ok && bitset_test (pos_set, nstates + res))
-          ok = false;
+        if (ok)
+          for (int k = 0; k < vector; k++)
+            if (pos[k] == res)
+              ok = false;
       }
 
       if (ok)

If you look at the conditional that turns ok to false, we check if pos[k] is EQUAL to "res", not just non-zero, but is equal to the loop variable in pack_vector(). This means that whereas earlier we only turned ok to false when pos[k] was equal to the current loop variable, now we do that every time even if that was set in an earlier iteration.

I might be completely at loss here, but the conversion does not seem to be equivalent.

from bison.

bazsi avatar bazsi commented on May 21, 2024

scrap my last comment. sorry for the noise.

from bison.

akimd avatar akimd commented on May 21, 2024

I don't really understand the internals of bison

The problem is... I don't know that part very well either....

If you look at the conditional that turns ok to false, we check if pos[k] is EQUAL to "res", not just non-zero, but is equal to the loop variable in pack_vector().

I don't see that. In the + version above, we iterate over the whole pos (an array of ints) to see if res is in it. Its position in pos is irrelevant.

from bison.

akimd avatar akimd commented on May 21, 2024

I think that the part:

      /* Store PLACE into POS_SET.  PLACE might not belong to the set
         of possible values for instance with useless tokens.  It
         would be more satisfying to eliminate the need for this
         'if'.  */
      if (0 <= nstates + place)
        bitset_set (pos_set, nstates + place);

is worth worrying (I mean, the if: it should be unconditional). And it would also explain why this went unnoticed for so long but hit you: it would be related to useless tokens, that most people don't have.

from bison.

bazsi avatar bazsi commented on May 21, 2024

I think I have the problem, I just can't come up with a good solution yet.

I have added some logging into these places, and I've found differences between the "good" and the "bad" versions and it indeed relates to the condition before bitset_set().

It seems that a bitset_set() is not invoked on a "place" value of -147, because "nstates + place" is indeed less than zero (nstates being 70) When executing bison in the "good" version, -147 is matched in the original "pos" array and thus skipped in pack_vector().

So the problem seems to be that we indeed need to eliminate that conditional by shifting the bitset even more. So the base "nstates" is not the right base that the code uses right now. We need to find the one that shifts "place" to be a positive value and thus eliminate the conditional.

I am not sure what "place" is, for me the most negative value is -147, which does not relate to the number of states (70), the number of vectors (86), the number of entries (50).

This is the access log of the "pos_set" bitset.

  • Whenever you see "res XXX is ok", that means that we haven't found a res value.
  • When you see "not storing: place=XXXX" that means that we don't store a value in the bitset because of the conditional.

The problem starts at the 9th res XXX is ok, where pack_vector() considers -147 and accepts it, and if you see earlier in the log, -147 should already have been stored in pos_set, but it wasn't, since -147 + nstates is still less than zero.

So my conclusion is that we indeed need to drop the conditional and do that by shifting the bitset even more to positive values so that it can encapsulate all potential bits.

res -87 is ok
not storing: place=-87
not storing: place=-87
res -20 is ok
res -125 is ok
not storing: place=-125
res -124 is ok
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
res -147 is ok
not storing: place=-147   <---- THIS IS WHERE WE DON'T STORE -147
res -4 is ok
res -142 is ok
not storing: place=-142
res 16 is ok
res -147 is ok                  <----- THIS IS WHERE THE BUG IS TRIGGERED -147 is already encountered at this point, but wasn't stored in pos_set.
not storing: place=-147
res -146 is ok
not storing: place=-146
res -143 is ok
not storing: place=-143
res -142 is ok
not storing: place=-142
res -141 is ok
not storing: place=-141
res -140 is ok
not storing: place=-140
res -139 is ok
not storing: place=-139
res -138 is ok
not storing: place=-138
res -137 is ok
not storing: place=-137
res -136 is ok
not storing: place=-136
res -136 is ok
not storing: place=-136
res -135 is ok
not storing: place=-135
res -134 is ok
not storing: place=-134
res -133 is ok
not storing: place=-133
res -132 is ok
not storing: place=-132
res -131 is ok
not storing: place=-131
res -129 is ok
not storing: place=-129
res -126 is ok
not storing: place=-126
res -124 is ok
not storing: place=-124
res -123 is ok
not storing: place=-123
res -122 is ok
not storing: place=-122
res -121 is ok
not storing: place=-121
res -120 is ok
not storing: place=-120
res -119 is ok
not storing: place=-119
res 29 is ok
res 1 is ok
res 14 is ok
res 28 is ok
res -9 is ok
res 12 is ok
res -87 is ok
not storing: place=-87
not storing: place=-87
res -20 is ok
res -125 is ok
not storing: place=-125
res -124 is ok
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
not storing: place=-124
res -147 is ok
not storing: place=-147
res -4 is ok
res -142 is ok
not storing: place=-142
res 16 is ok
res -147 is ok
not storing: place=-147
res -146 is ok
not storing: place=-146
res -143 is ok
not storing: place=-143
res -142 is ok
not storing: place=-142
res -141 is ok
not storing: place=-141
res -140 is ok
not storing: place=-140
res -139 is ok
not storing: place=-139
res -138 is ok
not storing: place=-138
res -137 is ok
not storing: place=-137
res -136 is ok
not storing: place=-136
res -136 is ok
not storing: place=-136
res -135 is ok
not storing: place=-135
res -134 is ok
not storing: place=-134
res -133 is ok
not storing: place=-133
res -132 is ok
not storing: place=-132
res -131 is ok
not storing: place=-131
res -129 is ok
not storing: place=-129
res -126 is ok
not storing: place=-126
res -124 is ok
not storing: place=-124
res -123 is ok
not storing: place=-123
res -122 is ok
not storing: place=-122
res -121 is ok
not storing: place=-121
res -120 is ok
not storing: place=-120
res -119 is ok
not storing: place=-119
res 29 is ok
res 1 is ok
res 14 is ok
res 28 is ok
res -9 is ok
res 12 is ok

from bison.

akimd avatar akimd commented on May 21, 2024

I'm at the same place, and I have a working solution: really implementing a set of integers on top of bitsets. The problem is that the "base" (the smallest int to represent) may change, and that's where the current implementation is wrong.
I've also discovered that we are using way too much bits. On your grammar I have:

pos_set (32915, -147) = -147 -146 -143 -142 -141 -140 -139 -138 -137 -136 -135 -134 -133 -132 -131 -129 -126 -124 -123 -122 -121 -120 -119 -58 -20 -9 -4 1 12 14 16 28 29

I.e., we have 32915 bits of range, although we only need to go from -147 to 29. I'm looking a being less eager right now.

from bison.

akimd avatar akimd commented on May 21, 2024

Balazs, I will not have enough time to finish this today. I shall address this tomorrow in the morning. Thanks a lot for having found the bad commit, this was an immense help!
What I really wish I had right now is a small grammar I could add to the test suite to check that in the future. If I can't design one, would you agree to sign disclaimers to the FSF so that I could add your grammar to the test suite?

Cheers!

from bison.

bazsi avatar bazsi commented on May 21, 2024

from bison.

akimd avatar akimd commented on May 21, 2024

Hi Balazs

Thanks a lot for your efforts, really appreciate them.

Well, I'm sorry in the first place for having let the bug in...

I probably have my signature on file with the FSF,

I need something specific to Bison.

but I am willing to do any paperwork necessary. I problem with submitting the code directly is that it's not just my copyright in there. I sold the company that originally wrote syslog-ng, thus some of the copyright is already at its new owner. And I am not in any decisive capacity. Maybe if I would remove the code sections, comments and change naming and only retain the structure of the grammar that would suffice to reproduce the problem and would get rid off any copyrightable material. What do you think?

I can't tell. And actually, the FSF is not really concerned about that: that's really the problem of the current owner. Of course, if I were them, I would definitely understand that:

  • the code is GPL anyway, so it does not make any difference for them
  • none of the genuine content of the code will be used, we just need the parser (not the actions)
  • this is to ensure that a dependency of their tool (Bison) does not wreck the tool

But then again, I can't tell.

However I confess that the most precious test case would be some parsing that does fail without the fix. I dislike tests that just look at the generated tables: some day we might want to change the implementation of these tables. They are irrelevant per-se, what matters is that the parser behaves correctly. Unfortunately reproducing your problem is already difficult when comparing the tables (none of the test case in Bison's test suite actually change with the fix!), it seems even more delicate to have a running example.

So I would really like to have not just your grammar go into the test suite, but also the input that triggers the bad behavior. We still don't care about the genuine actions, they are still irrelevant. But I need something to exercise the behavior rather than the details of the implementation.

from bison.

akimd avatar akimd commented on May 21, 2024

Released in 3.7.5.

from bison.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.