This is an attempt to make a cross-parser grammar specification for parsers written in Ruby. The result is a very opiniated gem that is cranky about the sad state of grammar languages. Grammar doesn't have a grammar file format, or even care what grammar file you use, so long as it can be translated into Grammar objects. The Grammar object hierarchy attempts to support all of the features supported by all of the major parsers. If you create a grammar using features that aren't supported by your favorite parser, then it's up to that parser to complain about the grammar you tried to feed it.
Grammar doesn't attempt to mitigate left-recursive alternations; a parser is expected to either handle the grammar properly, or to report an appropriate error for any grammar element that it can't handle.
Grammars are Ruby Modules, and the grammar rules are constants defined in the grammar module.
grammar :MyGrammar do
MyRule = ...
end
or, equivalently...
module MyGrammar
using Grammar::DSL
MyRule = ...
end
In the land of Grammar, grammar elements (sometimes called 'parslets') are subclasses of the primary Grammar classes (ie. Alternation, Concatenation, etc). Matches resulting from parsing input that matches the grammar are instances of the subclasses. For example:
foo_bar = Grammar::Alternation.with('foo', 'bar') # => Class:0xXXXXX
...
parser.parse('foo') # => #<Grammar::Alternation @match='foo'>
A grammar is a tree of grammar elements that closely mirror the typical BNF, PEG, etc grammars.
A Reference element can be used to refer to a grammar element that will be defined later in the module, but needs to be used before it can be defined. Typically references are used in recursive grammars, and are therefore very similar to the Recursion element. However, not all elements in a recursive chain are actually recursive, and sometimes it's simply more convenient to forward-declare an element.
A Latch is a type of Reference that closely resembles the "back references" of regular expressions. When a Latch first matches its pattern, it saves (i.e. latches) the matching input. Subsequent uses of the Latch must then match the latched input.
Note that Latches are treated as a sort of context variable in that they are attached to a particular parsing context. When a parser enters a context it resets all of the Latches in that context.
QuotedString = concatenation do
latch(:quote) { /['"]/ } # Create a Latch and attach it to the QuotedString context
element quote # First use. This will match a single- or double-quote
element /\w*/ # Match the string
element quote # Second use. Matches whatever the first use matched.
end
Typically, grammar languages implicitly ignore whitespace. In those cases, whitespace is defined according to some internal pattern.
In Grammar, whitespace isn't automatically ignored. However, you can explicitly specify a pattern to be ignored.
module MyGRammar
using Grammar::DSL
ignore /\s*/ # Ignore whitespace
...
end
Most uses of recursion in typical parser grammars are actually a hack for implementing repetition. In Grammar, repetition is represented explicitly using the Repetition class, which eliminates most of the common use cases. However, that still leaves a few cases to be handled. The Parenthesis Language being a notable example.
When using the DSL to specify a grammar module, the relevant methods will attempt to detect common uses of recursion and handle them appropriately.
Grammar provides two syntaxes for creating a recursive element: named and anonymous. Named recursion creates a named constant in the enclosing lexical scope and assigns the resulting recursion proxy object to it. Anonymous recursion passes the proxy object to the block as the first block parameter.
# Named recursion
alternation :Rule0 do
# Reference the recursion using the Rule0 constant
...
end
# Anonymous recursion
alternation do |rule0|
# Reference the recursion using the rule0 block parameter
...
end
All recursive alternations are effecively left-recursive from the perspective of a top-down parser, and may be considered left-recursive by bottom-up parsers (depending on the specific implementation). And we all know how that turns out. Fortunately, Grammar knows that you really wanted a repetition with at least one repeat, and will give you what you wanted instead of what you asked for.
Rule0 = Grammar::Alternation.with('abc', 'def', Rule0) # => NameError('You are a bad person')
If you use the DSL form, you'll get something more reasonable (ie. a repeating alternation).
grammar :MyGrammar do
alternation :Rule0 { elements 'abc', 'def', Rule0} #=> Grammar::Repetition.at_least(1, Grammar::Alternation.with('abc', 'def'))
end
But, you could go easy on yourself and just do it the right way...
Rule0 = Grammar::Alternation.with('abc', 'def').at_least(1)
See how easy that is?
If your grammar does this, you either meant to use repetition, or you should be ashamed of yourself.
Rule0 = Grammar::Concatenation.with(Rule0, ')') # => NameError('Again?')
If you use the DSL it will try to save you from yourself:
grammar :MyGrammar do
concatenation :Rule0 { elements Rule0, ')' } # => Rule0 = Grammar::Repetition.any(')')
end
But you should really just use repetition the way it was meant to be used instead of trying to fake it with recursion.
Rule0 = Grammar::Repetition.any(')')
Again, if you're doing this you should be using repetition.
Rule0 = Grammar::Concatenation.with('(', Rule0) # => NameError('Seriously?')
And once again, the Grammar DSL will clean up your mess:
grammar :MyGrammar do
concatenation :Rule0 { elements '(', Rule0 } # => Rule0 = Grammar::Repetition.one_or_more('(')
end
But seriously. Just stop.
The only element in Grammar that can support center recursion is Concatenation, and the only way to do it is with the DSL (but you already knew that).
grammar :MyGrammar do
concatenation :Rule0 do
elements '(', Rule0, ')'
end
end
The Grammar DSL will handle this by creating a Recursion proxy object instead of the Concatenation that you thought you wanted. It's then up to any parser employing the grammar to map the Recursion proxy to whatever mechanism said parser uses for handling center-recursion. If the parser can't handle center-recursion, then it's expected to report an appropriate error.
There's also the case of the "odd number of x's"
Rule0 = Grammar::Alternation.with(Grammar::Concatenation.with('x', Rule0, 'x'), 'x')
which is normally meant to repesent a repetition that's constrained to an odd number of repeats greater than 2. This can be represented in Grammar with
Rule0 = Repetition.with('x', minimum:3, &:odd?)
Many grammar languages use some very odd tricks for doing various things, and this is one that tends to be used to represent an odd number of some item.
grammar :MyGrammar do
concatenation :Rule0 do
elements Rule0, ',', Rule0
end
end
Someday, Grammar will rewrite this into something a bit less insane, but for now it just gives you an outer-recursive concatenation, as requested.
This particular use of recursion only happens when a recursive Concatenation is nested inside of an Alternation that contains elements other than the recursion-causing Concatenation. Typically, grammar writers use this trick to represent a separated list of items.
Grammar will try to help you out by replacing the outer-recursion with a repeating right-recursion, thereby eliminating the left-recursion.
# This is bad
alternation :Rule0 do
element 'abc'
element 'def'
element concatenation { elements Rule0, ',', Rule0 }
end
# This is what it becomes
Rule0 = Grammar::Concatenation.with(
Grammar::Alternation.with('abc', 'def'),
Grammar::Concatenation.with(',', Rule0).any
)
Add this line to your application's Gemfile:
gem 'grammar'
And then execute:
$ bundle
Or install it yourself as:
$ gem install grammar
After checking out the repo, run bin/setup
to install dependencies. Then, run rake spec
to run the tests. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/bfoz/grammar.