cucumber-attic / gherkin2 Goto Github PK
View Code? Open in Web Editor NEWA fast Gherkin parser in Ragel (The parser behind Cucumber)
License: MIT License
A fast Gherkin parser in Ragel (The parser behind Cucumber)
License: MIT License
At least on OSX/Mono: http://gist.github.com/294641
It would be helpful to run with Debug enabled so we can see line numbers. Not sure how to change that.
examples keyword can be Examples or Scenarios, e.g. Should be an erb-only fix, I think. I think I saw something on Greg's fork about this already.
I think there are some places where comments should be allowed, but actually are not supported by the parser:
*1
when I've such a table
| header 1 | header 2 | # here is a comment after a table
| cell 1-1 | 'cell 1-2' |
| cell 2-1 | "cell 2-2"|
I get the following error
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2
at java.util.ArrayList.RangeCheck(ArrayList.java:547)
at java.util.ArrayList.get(ArrayList.java:322)
at gherkin.formatter.PrettyFormatter.flushTable(PrettyFormatter.java:92)
at gherkin.formatter.PrettyFormatter.step(PrettyFormatter.java:80)
at gherkin.formatter.PrettyFormatter.step(PrettyFormatter.java:116)
at gherkin.parser.Parser.step(Parser.java:83)
at gherkin.lexer.EN.scan(EN.java:597)
at gherkin.I18nLexer.scan(I18nLexer.java:27)
*2
gherkin.LexingError: Lexing error on line 2: '@foo @bar # some comment, maybe a tag.
The C binary (feature.bundle) right now is not that big (about 45K), but if we're going to generate one for each of the ~40 languages we'll have a pretty big gem. (1.5 Mb - although compression while packaging the gem might help a little). The same size issues would apply to the JRuby gems.
Is this kind of size ok? Is there anything we can do to shrink the size? Do we care?
Treetop appears to parse tags in the following format:
@hello @world Scenario: This is a Scenario
Note that there is no newline between the tags and the Scenario start.
Is this a feature? I haven't seen the capability to use inline tags in the documentation and I'm not sure if it should be implemented with Gherkin.
Is there any reason, beside not being used in the specs, that it was removed? It makes testing parser behavior in irb a lot easier. It's much simpler to call #reset! on the listener rather than make new instances of the parser and SexpRecorder.
Gherkin throws a syntax error when parsing the following feature from Cucumber. As far as I can tell, with the my most recent gherkin and my last two commits on Cucumber to fix typos in a few feature files, it's the only feature within cucumber that still is causing errors.
examples/tickets/features/scenario_outline.feature
SYNTAX ERROR [:py_string, 6, " Must buy some <fruits>", 71] SYNTAX ERROR [:py_string, 6, " Must buy some cucumbers", 75]
(Edit by Aslak - HTML escaped the angle brackets - Github hides them)
The current parser testing is a hack pretty much all around. What we have for table parsing works well enough, but I'm not sure it's going to cut it for full-blown feature file parsing.
To reproduce:
rake compile bench:c_gherkin
Commenting out the following line in parser.c.rl.erb fixes it:
strcat(p, "\n%FEATURE_END%");
(I don't know enough Ruby-C to fix it yet).
Not high priority right now, but might be useful in the future for a browser based editor. Syntax checking and maybe some simple refactoring tools.
Options:
Mike and I were talking about the best way to start testing the parsing against what Cucumber and treetop currently provide in The Real World. We've got the cucumber parsing specs covered, but I'm sure there are edge cases out there that we don't know about yet, and possibly ones that a lot of people depend on (the laxness of parsing the Feature heading with treetop immediately comes to mind)
What would you think about providing a way (pre-cucumber-integration) for people to install the gem and run the parser against their current feature suite to look for parsing errors? If they get an error, they could send/submit a ticket w/ the feature that's failing to parse and we could attack it (or decide that the Gherkin syntax needs to change or become more restrictive for future releases)
Is that more trouble than it's worth? Seems like it would be a good PR move to try to weed out possible exceptions early rather than waiting to ask everyone to install and test a release candidate.
Multibyte characters longer than 2 bytes seem to mess with MRI 1.8.6 and possibly 1.8.7. I haven't tried 1.9.1 yet. I'm unsure whether this can be fixed by setting KCODE to UTF-8, or what. There's a pending table parsing spec ("should allow utf-8") that demonstrates the problem. This:
@listener.should_receive(:table).with([%w{ůﻚ 2}])
@table.scan(" | ůﻚ | 2 | \n")
gives this:
#Gherkin::SexpRecorder:0x119d984 expected :table with ([["ůﻚ", "2"]]) but received it with ([["ůﻚ", "2"]], 1)
And this:
@listener.should_receive(:table).with([%w{ 繁體中文 而且|並且} %w{ 繁體中文 而且|並且}])
@table.scan("| 繁體中文 而且|並且| 繁體中文 而且|並且|\n")
Gives this:
undefined method `w' for #Spec::Example::ExampleGroup::Subclass_7:0x119dd6c
Halp?
In looking for examples to test against and increase the completeness of the Ragel parser, I stumbled upon a few features in cucumber/examples/ which exposed something a little unexpected (at least for me).
Treetop currently allows a Feature file to start with just about anything. The Feature (or i18n equivalent) appears to be unnecessary.
Popsicle: This is really a feature
Scenario: A scenario following a popsicle
Given a step in this case
When it is parsed by treetop
Then it works as if it were preceded by Feature:
works just fine.
In the example features in tickets/features/177/ (1.feature, 2.feature), there is introductory text that is all glommed up with the feature name, and the files parse normally once it hits 'Scenario:' I'm not sure if that's supposed to be illustrative of how cucumber should work, or if it's an artifact of the gist in the original ticket.
The ragel parser currently needs a feature to begin with optional comments, optional tags, and a required 'Feature:' keyword.
Do we want to be less strict with the Feature heading text like treetop currently is? The wiki instructions for Gherkin seem to indicate that starting with 'Feature:' is required.
A BNF defining Gherkin is desirable for a number of reasons. See the links below for more.
See this lighthouse ticket
Gherkin parser may already do this - I just thought we should make sure.
I want to put string containing '|' character e.g. "abc|def" in table cell. This issue is similar to https://rspec.lighthouseapp.com/projects/16211/tickets/476-regarding-special-characters-like-in-table, but i couldn't find, if it was registered here.
Treetop currently sends multi-line comments as a single message. The ragel parser sends one comment message per comment line. Do you think it's important that comments are glommed up into a single message when they're consecutive?
Several (generated) extconf.rb files. Name binaries gherkin_lexer_en.so and so on. Add them all to the gemspec in Rakefile using Dir[].
I was playing around with adding before and after messages in the parser (throwing the after messages on a stack and popping them off at appropriate times) and realized that it's difficult to do so if the listener doesn't know when the feature has finished parsing.
If we're going to move responsibility of handling before/after into the formatters themselves, it may be helpful for them to know the parsing is complete (when there's not an error).
WDYT?
The table parser in ragel is fairly compact ( a few lines of ragel and a few actions) compared to the Treetop parser version. I think it would make sense to combine these two parsers, which will simplify a lot of the message passing and the c implementation. Any opposition to this?
Although there are NUnit tests it's hard to keep up and replicate the RSpec and Cucumber tests. We should therefore try to run them on IronRuby to get the same coverage.
There has to be a parser.java.rl.erb. I'll write it.
I had to copy the *.txt files over manually. That should be done by the Rake build script.
(Alternatively, refer to the files in their original location like the build.xml file does)
The Feature policy handles most of the syntax of Cucumber features, but there are definitely edge cases it won't. We need to flush those out. features/policy_feature.feature makes that pretty easy.
When that is done the feature policy needs to be refactored. It's currently an ugly bunch of booleans and if statements.
Just need to implement something similar to the Ruby I18nLexer (for convenience) and push it up to the http://cukes.info/maven repo. Have to write some javadocs too. Then lobby it into JBehave.
Gherkin needs to support multiple parser backends (Ruby, C, Java). To make this easy we need to break out the Ragel rules (basic machines, state charts, scanners, etc.) into a common file, and then write language specific .rl files which implement the actions used in the common file. In other words, gherkin_common.rl will contain the interface, and, gherkin_ruby.rl, e.g. will contain the implementation.
See the ext directories in hpricot and mongrel for examples in C.
Just tried gherkin on songkick's features and this was the first thing I hit (at the top of a scenario outline's examples table).
I'm ambivalent about whether we should make the colon mandatory or not - it will make the upgrade path a little more awkward for some people, but I guess it also makes the language neater if everyone is forced to do the same thing.
i18n crosses the Gherkin syntax where Gherkin recognizes keywords written in many different languages. We need to recognize keywords written in all the languages Cucumber supports. In addition to this, Cucumber can:
This is pretty straightforward with Treetop because it is pure Ruby, but this is not so straightforward with Ragel, because it essentially operates as a pre-processor, generating the state machine in a single pass, at which point the generated code is effectively closed to modification. This means that loading the keywords must happen before the state machine is built, but given number 2 above, the content of the feature files themselves can change what the parser must recognize as a keyword. Hmm... difficulties, difficulties.
It currently has 2 subdirectories: gherkin and feature. Do we need both? Also, with future java support coming up, any suggestions about how to organise this?
In which cases should the parser throw errors? Currently, it pretty much either finds things or doesn't (or grabs too much if you make a typo spelling Scenario, for example).
Identifying and matching against all the ways someone could mess up a feature file will probably be impossible, but is there a minimum set of gotchas or mistakes the parser should look for and raise on?
Sample output:
$ rvm 1.8.7-head $ rake clean compile (snip) gcc -I. -I/Users/aslakhellesoy/.rvm/ruby-1.8.7-head/include/ruby-1.9.1/i386-darwin9.8.0 -I/Users/aslakhellesoy/.rvm/ruby-1.8.7-head/include/ruby-1.9.1/ruby/backward -I/Users/aslakhellesoy/.rvm/ruby-1.8.7-head/include/ruby-1.9.1 -I../../../../ext/gherkin_lexer_ar -D_XOPEN_SOURCE -D_DARWIN_C_SOURCE -fno-common -O3 -ggdb -Wextra -Wno-unused-parameter -Wno-parentheses -Wpointer-arith -Wwrite-strings -Wno-missing-field-initializers -Wshorten-64-to-32 -Wno-long-long -pipe -O0 -Wall -Werror -o gherkin_lexer_ar.o -c ../../../../ext/gherkin_lexer_ar/gherkin_lexer_ar.c cc1: warnings being treated as errors /Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl: In function ‘CLexer_scan’: /Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:215: warning: comparison between signed and unsigned /Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:215: warning: comparison between signed and unsigned /Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:376: warning: comparison between signed and unsigned /Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:377: warning: comparison between signed and unsigned /Users/aslakhellesoy/scm/gherkin/tasks/../ragel/i18n/ar.c.rl:378: warning: comparison between signed and unsigned {standard input}:5568:non-relocatable subtraction expression, "_rb_eGherkinLexerError" minus "L00000000005$pb" {standard input}:5568:symbol: "_rb_eGherkinLexerError" can't be undefined in a subtraction expression gmake: *** [gherkin_lexer_ar.o] Error 1 rake aborted! Command failed with status (2): [gmake...]
Instead of allowing warnings I think it's safest to fix this. Not sure why the other rubies don't error out.
Since the parser layer of gherkin exists solely to determine if the order of events is valid and provide useful messages when it's not, what about an option (mike suggested --unpickled) that skips the parser and sends the lexer events directly to cucumber?
This could provide a speed benefit for running large suites of features that are relatively stable and known to have proper syntax, with the caveat that parsing/lexing error messages may not be very useful.
I think it would make sense when working on writing a new feature to have it pass through the parser to ensure validity, but when running rake cucumber to skip the parsing step (or at least have the option to). It's superfluous and adds overhead for a well-written feature.
WDYT? One more item to consider for performance enhancement, I suppose.
I think we can do with Scenario only, and use the mere presence of Examples underneath to decide on the semantics. See also: https://rspec.lighthouseapp.com/projects/16211/tickets/432-deprecate-scenario-outline-keyword-scenario-enough#ticket-432-1
Currently SyntaxErrors provide no context on what is expected vs received from the parser. Implementing this shouldn't be much harder than defining an expected property on each policy state containing hints on what is expected at that moment.
I think we're pretty close to a 0.0.1 release (codename: Feature Envy). In convo with Greg we listed
as requirements before releasing something. What else?
rake clean compile cucumber features/pretty_printer.feature Feature: Pretty printer In order to have pretty gherkin I want to verify that all prettified cucumber features parse OK Scenario: Parse all the features in Cucumber # features/pretty_printer.feature:5 Given I have Cucumber's home dir defined in CUCUMBER_HOME # features/step_definitions/pretty_printer_steps.rb:19 When I find all of the .feature files # features/step_definitions/pretty_printer_steps.rb:24 /Users/aslakhellesoy/scm/gherkin/lib/gherkin/i18n_lexer.rb:15: [BUG] Segmentation fault ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin9.8.0] -- control frame ---------- c:0057 p:---- s:0219 b:0219 l:000218 d:000218 CFUNC :scan c:0056 p:0055 s:0215 b:0215 l:000214 d:000214 METHOD /Users/aslakhellesoy/scm/gherkin/lib/gherkin/i18n_lexer.rb:15 c:0055 p:0098 s:0209 b:0209 l:000208 d:000208 METHOD /Users/aslakhellesoy/scm/gherkin/features/step_definitions/pretty_printer_steps.rb:11 c:0054 p:0039 s:0201 b:0201 l:001e84 d:000200 BLOCK /Users/aslakhellesoy/scm/gherkin/features/step_definitions/pretty_printer_steps.rb:35 c:0053 p:---- s:0195 b:0195 l:000194 d:000194 FINISH c:0052 p:---- s:0193 b:0193 l:000192 d:000192 CFUNC :each c:0051 p:0022 s:0190 b:0190 l:001e84 d:000189 BLOCK /Users/aslakhellesoy/scm/gherkin/features/step_definitions/pretty_printer_steps.rb:30 c:0050 p:---- s:0188 b:0188 l:000187 d:000187 FINISH c:0049 p:---- s:0186 b:0186 l:000185 d:000185 CFUNC :instance_exec
The listeners currently get the multiline strings as-is - without leading spaces stripped away. (I discovvered this when I did gherkin/tools/pretty_printer.rb).
Leading spaces should be stripped away before the string is passed to the listener, because consumers want to treat the strings as if they were unindented.
The start_col argument passed to the listener is unnecessary - it should be removed.
It should be possible to build the gem for Windows prior to packaging and releasing gems. This should be possible to do this on a non-Windows OS. This can be achieved with MinGW and MSYS.
Currently the parsing and syntax errors only include the line number in the error message. Some context would be very helpful. At the least each should say something like:
"Error on line 2: 'Aand there is a foo'"
For the SyntaxErrors, it would be very nice if they could also include some information about the expected message, e.g.
"FeatureSyntaxError on line 23: 'Given a thingy'. Expected one of 'Scenario', 'Scenario Outline', but received 'Step'"
Already started.
The Java bindings (when they exist) should be prebuilt and packaged with the gem targetted for JRuby. Ideally this should use JDK5 and not JDK6 - lots of people are still on JDK5.
If Rubygems has support for building native Java extensions at install time (as with C), we should consider that option.
Just a (possibly crazy) thought: creating an API not unlike Rack's to easily stack or chain Gherkin listeners together. We currently have listeners in various states of completeness for parsing, pretty printing, filtering and stats gathering. Making it easy to manage them and employ them selectively would be quite useful.
We generate a lot of files, and they are .gitignore'd. Jeweler tries to be nice, and excludes .gitignore'd files from the gemspec (rake gemspec). Need to figure out how to work around this. Probably patch Jeweler somehow.
"\r\n" causes improper messages to be sent to the listener (e.g. send the step twice, include the \r in some cases).
Mike and I are looking into this.
All the pystring specs need to pass. One is currently pending. After that, the implementation could probably be simplified, though I'm not so sure about that one--I have a hard time keeping the requirements for PyString parsing in my head all at once.
Just tried gherkin on songkick's features and this was the first thing I hit (at the top of a scenario outline's examples table).
I'm ambivalent about whether we should make the colon mandatory or not - it will make the upgrade path a little more awkward for some people, but I guess it also makes the language neater if everyone is forced to do the same thing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.