Support keyword synonyms

examples keyword can be Examples or Scenarios, e.g. Should be an erb-only fix, I think. I think I saw something on Greg's fork about this already.

errors caused by comments

I think there are some places where comments should be allowed, but actually are not supported by the parser:


when I've such a table

| header 1 | header 2 | # here is a comment after a table
| cell 1-1 | 'cell 1-2' |
| cell 2-1 | "cell 2-2"|

I get the following error

java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
        at java.util.ArrayList.RangeCheck(
        at java.util.ArrayList.get(
        at gherkin.formatter.PrettyFormatter.flushTable(
        at gherkin.formatter.PrettyFormatter.step(
        at gherkin.formatter.PrettyFormatter.step(
        at gherkin.parser.Parser.step(
        at gherkin.lexer.EN.scan(
        at gherkin.I18nLexer.scan(


gherkin.LexingError: Lexing error on line 2: '@foo @bar # some comment, maybe a tag.

Big C/Java binary gems

The C binary (feature.bundle) right now is not that big (about 45K), but if we're going to generate one for each of the ~40 languages we'll have a pretty big gem. (1.5 Mb - although compression while packaging the gem might help a little). The same size issues would apply to the JRuby gems.

Is this kind of size ok? Is there anything we can do to shrink the size? Do we care?

Tag parsing with treetop

Treetop appears to parse tags in the following format:

@hello @world Scenario: This is a Scenario

Note that there is no newline between the tags and the Scenario start.

Is this a feature? I haven't seen the capability to use inline tags in the documentation and I'm not sure if it should be implemented with Gherkin.


Is there any reason, beside not being used in the specs, that it was removed? It makes testing parser behavior in irb a lot easier. It's much simpler to call #reset! on the listener rather than make new instances of the parser and SexpRecorder.

Syntax error parsing pystrings

Gherkin throws a syntax error when parsing the following feature from Cucumber. As far as I can tell, with the my most recent gherkin and my last two commits on Cucumber to fix typos in a few feature files, it's the only feature within cucumber that still is causing errors.


[:py_string, 6, "        Must buy some <fruits>", 71]
[:py_string, 6, "        Must buy some cucumbers", 75]

(Edit by Aslak - HTML escaped the angle brackets - Github hides them)

Improve parser testing support

The current parser testing is a hack pretty much all around. What we have for table parsing works well enough, but I'm not sure it's going to cut it for full-blown feature file parsing.

Segfault on Ruby 1.8.6 and 1.8.7 on Leopard

To reproduce:

rake compile bench:c_gherkin

Commenting out the following line in parser.c.rl.erb fixes it:
strcat(p, "\n%FEATURE_END%");

(I don't know enough Ruby-C to fix it yet).

Javascript impl

Not high priority right now, but might be useful in the future for a browser based editor. Syntax checking and maybe some simple refactoring tools.


Provide gherkin parser tester?

Mike and I were talking about the best way to start testing the parsing against what Cucumber and treetop currently provide in The Real World. We've got the cucumber parsing specs covered, but I'm sure there are edge cases out there that we don't know about yet, and possibly ones that a lot of people depend on (the laxness of parsing the Feature heading with treetop immediately comes to mind)

What would you think about providing a way (pre-cucumber-integration) for people to install the gem and run the parser against their current feature suite to look for parsing errors? If they get an error, they could send/submit a ticket w/ the feature that's failing to parse and we could attack it (or decide that the Gherkin syntax needs to change or become more restrictive for future releases)

Is that more trouble than it's worth? Seems like it would be a good PR move to try to weed out possible exceptions early rather than waiting to ask everyone to install and test a release candidate.

UTF-8 multibyte character parsing bug

Multibyte characters longer than 2 bytes seem to mess with MRI 1.8.6 and possibly 1.8.7. I haven't tried 1.9.1 yet. I'm unsure whether this can be fixed by setting KCODE to UTF-8, or what. There's a pending table parsing spec ("should allow utf-8") that demonstrates the problem. This:
@listener.should_receive(:table).with([%w{ůﻚ 2}])
@table.scan(" | ůﻚ | 2 | \n")

gives this:
#Gherkin::SexpRecorder:0x119d984 expected :table with ([["ůﻚ", "2"]]) but received it with ([["ůﻚ", "2"]], 1)

And this:
@listener.should_receive(:table).with([%w{ 繁體中文 而且|並且} %w{ 繁體中文 而且|並且}])
@table.scan("| 繁體中文 而且|並且| 繁體中文 而且|並且|\n")

Gives this:
undefined method `w' for #Spec::Example::ExampleGroup::Subclass_7:0x119dd6c


strictness of language for Feature start

In looking for examples to test against and increase the completeness of the Ragel parser, I stumbled upon a few features in cucumber/examples/ which exposed something a little unexpected (at least for me).

Treetop currently allows a Feature file to start with just about anything. The Feature (or i18n equivalent) appears to be unnecessary.

Popsicle:  This is really a feature
Scenario: A scenario following a popsicle
   Given a step in this case
   When it is parsed by treetop
   Then it works as if it were preceded by Feature:

works just fine.

In the example features in tickets/features/177/ (1.feature, 2.feature), there is introductory text that is all glommed up with the feature name, and the files parse normally once it hits 'Scenario:' I'm not sure if that's supposed to be illustrative of how cucumber should work, or if it's an artifact of the gist in the original ticket.

The ragel parser currently needs a feature to begin with optional comments, optional tags, and a required 'Feature:' keyword.

Do we want to be less strict with the Feature heading text like treetop currently is? The wiki instructions for Gherkin seem to indicate that starting with 'Feature:' is required.

Multiline comments

Treetop currently sends multi-line comments as a single message. The ragel parser sends one comment message per comment line. Do you think it's important that comments are glommed up into a single message when they're consecutive?

I18n for C

Several (generated) extconf.rb files. Name binaries and so on. Add them all to the gemspec in Rakefile using Dir[].

Explicit end_feature message for listener?

I was playing around with adding before and after messages in the parser (throwing the after messages on a stack and popping them off at appropriate times) and realized that it's difficult to do so if the listener doesn't know when the feature has finished parsing.

If we're going to move responsibility of handling before/after into the formatters themselves, it may be helpful for them to know the parsing is complete (when there's not an error).


Combine ragel table and feature parser into single machine

The table parser in ragel is fairly compact ( a few lines of ragel and a few actions) compared to the Treetop parser version. I think it would make sense to combine these two parsers, which will simplify a lot of the message passing and the c implementation. Any opposition to this?

Java parser

There has to be a I'll write it.

Complete test feature for syntax policy

The Feature policy handles most of the syntax of Cucumber features, but there are definitely edge cases it won't. We need to flush those out. features/policy_feature.feature makes that pretty easy.

When that is done the feature policy needs to be refactored. It's currently an ugly bunch of booleans and if statements.

Break out common Ragel rules into gherkin_common.rl

Gherkin needs to support multiple parser backends (Ruby, C, Java). To make this easy we need to break out the Ragel rules (basic machines, state charts, scanners, etc.) into a common file, and then write language specific .rl files which implement the actions used in the common file. In other words, gherkin_common.rl will contain the interface, and, gherkin_ruby.rl, e.g. will contain the implementation.

See the ext directories in hpricot and mongrel for examples in C.

'Examples' doesn't parse, but 'Examples:' does

Just tried gherkin on songkick's features and this was the first thing I hit (at the top of a scenario outline's examples table).

I'm ambivalent about whether we should make the colon mandatory or not - it will make the upgrade path a little more awkward for some people, but I guess it also makes the language neater if everyone is forced to do the same thing.

i18n support

i18n crosses the Gherkin syntax where Gherkin recognizes keywords written in many different languages. We need to recognize keywords written in all the languages Cucumber supports. In addition to this, Cucumber can:

  1. specify the language on the command line
  2. load keywords based on a comment language hint a la encoding comments (see cuke/features/language_from_header.feature)
  3. can mix and match languages (possibly within a single feature file, I'm not sure)

This is pretty straightforward with Treetop because it is pure Ruby, but this is not so straightforward with Ragel, because it essentially operates as a pre-processor, generating the state machine in a single pass, at which point the generated code is effectively closed to modification. This means that loading the keywords must happen before the state machine is built, but given number 2 above, the content of the feature files themselves can change what the parser must recognize as a keyword. Hmm... difficulties, difficulties.

ext directory structure

It currently has 2 subdirectories: gherkin and feature. Do we need both? Also, with future java support coming up, any suggestions about how to organise this?

Error handling by parser

In which cases should the parser throw errors? Currently, it pretty much either finds things or doesn't (or grabs too much if you make a typo spelling Scenario, for example).

Identifying and matching against all the ways someone could mess up a feature file will probably be impossible, but is there a minimum set of gotchas or mistakes the parser should look for and raise on?

Compilation fails on 1.8.7-head (rvm) - warnings treated as errors

Sample output:

$ rvm 1.8.7-head
$ rake clean compile


gcc -I. -I/Users/aslakhellesoy/.rvm/ruby-1.8.7-head/include/ruby-1.9.1/i386-darwin9.8.0 -I/Users/aslakhellesoy/.rvm/ruby-1.8.7-head/include/ruby-1.9.1/ruby/backward -I/Users/aslakhellesoy/.rvm/ruby-1.8.7-head/include/ruby-1.9.1 -I../../../../ext/gherkin_lexer_ar -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE   -fno-common -O3 -ggdb -Wextra -Wno-unused-parameter -Wno-parentheses -Wpointer-arith -Wwrite-strings -Wno-missing-field-initializers -Wshorten-64-to-32 -Wno-long-long  -pipe -O0 -Wall -Werror  -o gherkin_lexer_ar.o -c ../../../../ext/gherkin_lexer_ar/gherkin_lexer_ar.c
cc1: warnings being treated as errors
/Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl: In function ‘CLexer_scan’:
/Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:215: warning: comparison between signed and unsigned
/Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:215: warning: comparison between signed and unsigned
/Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:376: warning: comparison between signed and unsigned
/Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:377: warning: comparison between signed and unsigned
/Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:378: warning: comparison between signed and unsigned
{standard input}:5568:non-relocatable subtraction expression, "_rb_eGherkinLexerError" minus "L00000000005$pb"
{standard input}:5568:symbol: "_rb_eGherkinLexerError" can't be undefined in a subtraction expression
gmake: *** [gherkin_lexer_ar.o] Error 1
rake aborted!
Command failed with status (2): [gmake...]

Instead of allowing warnings I think it's safest to fix this. Not sure why the other rubies don't error out.

skip parser option?

Since the parser layer of gherkin exists solely to determine if the order of events is valid and provide useful messages when it's not, what about an option (mike suggested --unpickled) that skips the parser and sends the lexer events directly to cucumber?

This could provide a speed benefit for running large suites of features that are relatively stable and known to have proper syntax, with the caveat that parsing/lexing error messages may not be very useful.

I think it would make sense when working on writing a new feature to have it pass through the parser to ensure validity, but when running rake cucumber to skip the parsing step (or at least have the option to). It's superfluous and adds overhead for a well-written feature.

WDYT? One more item to consider for performance enhancement, I suppose.

SyntaxErrors need context

Currently SyntaxErrors provide no context on what is expected vs received from the parser. Implementing this shouldn't be much harder than defining an expected property on each policy state containing hints on what is expected at that moment.

New Release Requirements

I think we're pretty close to a 0.0.1 release (codename: Feature Envy). In convo with Greg we listed

  • reference listener implementation
  • cleanup of the Ruby parser actions (possibly a module to mixin to each parser class)

as requirements before releasing something. What else?

Segmentation fault on 1.9

rake clean compile
cucumber features/pretty_printer.feature

Feature: Pretty printer
  In order to have pretty gherkin
  I want to verify that all prettified cucumber features parse OK

  Scenario: Parse all the features in Cucumber                # features/pretty_printer.feature:5
    Given I have Cucumber's home dir defined in CUCUMBER_HOME # features/step_definitions/pretty_printer_steps.rb:19
    When I find all of the .feature files                     # features/step_definitions/pretty_printer_steps.rb:24
/Users/aslakhellesoy/scm/gherkin/lib/gherkin/i18n_lexer.rb:15: [BUG] Segmentation fault
ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin9.8.0]

-- control frame ----------
c:0057 p:---- s:0219 b:0219 l:000218 d:000218 CFUNC  :scan
c:0056 p:0055 s:0215 b:0215 l:000214 d:000214 METHOD /Users/aslakhellesoy/scm/gherkin/lib/gherkin/i18n_lexer.rb:15
c:0055 p:0098 s:0209 b:0209 l:000208 d:000208 METHOD /Users/aslakhellesoy/scm/gherkin/features/step_definitions/pretty_printer_steps.rb:11
c:0054 p:0039 s:0201 b:0201 l:001e84 d:000200 BLOCK  /Users/aslakhellesoy/scm/gherkin/features/step_definitions/pretty_printer_steps.rb:35
c:0053 p:---- s:0195 b:0195 l:000194 d:000194 FINISH
c:0052 p:---- s:0193 b:0193 l:000192 d:000192 CFUNC  :each
c:0051 p:0022 s:0190 b:0190 l:001e84 d:000189 BLOCK  /Users/aslakhellesoy/scm/gherkin/features/step_definitions/pretty_printer_steps.rb:30
c:0050 p:---- s:0188 b:0188 l:000187 d:000187 FINISH
c:0049 p:---- s:0186 b:0186 l:000185 d:000185 CFUNC  :instance_exec

Unindent pystrings

The listeners currently get the multiline strings as-is - without leading spaces stripped away. (I discovvered this when I did gherkin/tools/pretty_printer.rb).

Leading spaces should be stripped away before the string is passed to the listener, because consumers want to treat the strings as if they were unindented.

The start_col argument passed to the listener is unnecessary - it should be removed.

Build gem for Windows

It should be possible to build the gem for Windows prior to packaging and releasing gems. This should be possible to do this on a non-Windows OS. This can be achieved with MinGW and MSYS.

Add context to parsing and syntax errors

Currently the parsing and syntax errors only include the line number in the error message. Some context would be very helpful. At the least each should say something like:

"Error on line 2: 'Aand there is a foo'"

For the SyntaxErrors, it would be very nice if they could also include some information about the expected message, e.g.

"FeatureSyntaxError on line 23: 'Given a thingy'. Expected one of 'Scenario', 'Scenario Outline', but received 'Step'"

Build gem for JRuby

The Java bindings (when they exist) should be prebuilt and packaged with the gem targetted for JRuby. Ideally this should use JDK5 and not JDK6 - lots of people are still on JDK5.

If Rubygems has support for building native Java extensions at install time (as with C), we should consider that option.

Listener stacking / chaining API

Just a (possibly crazy) thought: creating an API not unlike Rack's to easily stack or chain Gherkin listeners together. We currently have listeners in various states of completeness for parsing, pretty printing, filtering and stats gathering. Making it easy to manage them and employ them selectively would be quite useful.

Build gem with .gitignore'd files

We generate a lot of files, and they are .gitignore'd. Jeweler tries to be nice, and excludes .gitignore'd files from the gemspec (rake gemspec). Need to figure out how to work around this. Probably patch Jeweler somehow.

Finish PyString parsing

All the pystring specs need to pass. One is currently pending. After that, the implementation could probably be simplified, though I'm not so sure about that one--I have a hard time keeping the requirements for PyString parsing in my head all at once.

