pointlander / peg Goto Github PK

View Code? Open in Web Editor NEW

987.0 30.0 120.0 697 KB

Peg, Parsing Expression Grammar, is an implementation of a Packrat parser generator.

License: BSD 3-Clause "New" or "Revised" License

Go 100.00%

peg's People

Contributors

Stargazers

Watchers

Forkers

welterde knieriem machinaut taruti wangxuesong vividcortex hfeeki debackerl bithoarder art4711 hellcoderz brucehsu xaviershay prodigeni scampi sethwklein tj mnetship tmc vvakame kleopatra999 fredrikbryntesson alecthomas ivotron archs tgpfeiffer karlll johnicholas baijum cmogilko adragomir onef9day rulexec okke leepro smira pombredanne nevernet mishrabhinav wjian egorse yujinqiu nilium linearregression etsangsplk mixmasterfresh chrisbarrett kmizu avan06 bdamm heyitsanthony synapse-nii preetam kimshrier evan-wheeler zhaozhongshu masatake lifeqiuzhi520 elireisman adler99 dimus zeslava cch123 renesugar remerge blworld mbertschler lazdmx jmoiron crixalis2013 zkry cdliyi dzpao shink49 jaz303 jr81 sjansen slayercat tawawhite salil9999 forkkit alanbernstein lolbinarycat jimmysimmons miketartar marler8997 nareix rajesh-ibm-power hachi8833 lang-golang jinqhe amery fjl go-sqlparser elvinmark standardgalactic peg-parser bryce-shang fy0 andrewhop

peg's Issues

Bug with negative lookahead and -switch

The following grammar produces invalid Go when using the -switch option:

package main
type test Peg {}

Begin <- !('A' / 'B' / !.) !.

test.peg.go: test.peg.go:730:9: illegal rune literal

Line 730:

case '<nil>':

Cannot parse a{b,c}d

I can't figure out how to parse a{b,c}d into [TextNode, BraceNode [ Union[TextNode, TextNode] ], TextNode]. (If I could get the parser to parse, I know how to build up the nodes.)

I have tried variations of the following, but cannot figure it out. I would expect it to match the braces rule first. I'm not sure how to decipher the error output, in case that's telling me something useful.

package main

type Query Peg {
}

expression <- expr combinators? !.
expr <- space (value?)

combinators <- space (braces / union)

union     <- ',' expr
braces    <- '{' expr '}' expr

value <- [[a-z]]+
space <- ' '*

package main

import (
  "log"
)

func parse(query string) {
  log.Printf("%s\n", query)
  r := &Query{Buffer: query}
  r.Init()
  if err := r.Parse(); err != nil {
    log.Fatal(err)
  }
}

func main() {
  parse("")
  parse("a")
  parse("a,b")
  parse("a{b,c}d")
}

> ~/gopath/bin/peg test.peg &&  go run test.peg.go test.go | pbcopy
2014/04/16 15:26:06 
2014/04/16 15:26:06 a
2014/04/16 15:26:06 a,b
2014/04/16 15:26:06 a{b,c}d
2014/04/16 15:26:06 
parse error near Unknown (line 1 symbol 1 - line 1 symbol 1):

parse error near value (line 1 symbol 3 - line 1 symbol 4):
b
parse error near expr (line 1 symbol 3 - line 1 symbol 4):
b
parse error near space (line 1 symbol 2 - line 1 symbol 2):

parse error near expr (line 1 symbol 1 - line 1 symbol 2):
a
parse error near Unknown (line 1 symbol 1 - line 1 symbol 1):

exit status 1

Any ideas?

Thanks,
Xavier

peg miscounts UTF characters

peg fails to adjust the buffer pointer when UTF characters are encountered in the source to parse, resulting in text captures that do not align with the parsed tokens.

For example:

/* ’ */int foobear;

When parsed by the C grammar, the ExternalDeclaration should have text "int foobear" but instead gives "*/int foobe" due to the shifting of characters as a result of counting the right-single-quote as a single character in the buffer instead of as three, as it would be encoded.

Is it possible to parse multiple peg files for use in the same package?

This works beautifully when I specify a single peg file like so:
./peg -inline -switch peg.peg

When I try to include multiple peg.go files in a package however, I end up with some previous declaration errors for constants, rules, and methods.
Do you have a workaround for this?

Execute user code in Parse() rather than Execute()

I was hoping to use this parser generator to parse YAML, but to do this, I need to depend on indenting while matching rules. Executing user code in Execute() thus is too late for me.

Is it possible, and if not, how difficult would it be to add functionality needed to do something that'd enable early code execution?

Parse Tree

Hi
We are trying to write a parser for a grammar that needs more lookahead than what LR(1) in http://code.google.com/p/gocc/ provides. I am pretty sure we will be able to Pegify the grammar, but I was wondering if we are able to access the parse tree in some way or is it easy to build an AST in a bottom up way in this PEG implementation?

We have really easy to use SDT rules in gocc to build up an AST in a bottom up way.

I have not used PEG before, but I have read an article and I am really amped :)

Please help, Thank you
Walter Schulze

Which piumarta's peg version is this implementation based on?

I'd like having 0.1.9 feature of having globlal state in a structure and I'm thinking on implement it, if it is necessary, but rather knowing beforehand to which version of piumarta's code this implementation corresponds.

C grammar should capture acceptable but non-standard C code

Production C code may be extended with non-standard grammars. For example:

"__attribute__((packed))" and "__attribute((suppress))" may be after parameters, type qualifiers, function declarations, and even anonymous unions. It's tempting to consider it a sub to Spacing given how widely declared it can be.
An empty file with nothing but comments is considered valid by at least clang and gcc. The grammar rejects it for requiring at least one ExternalDeclaration.
gcc (did not test clang) will accept a function declaration terminated with a SEMI:

int func() { /* code in here. Or not. */ }; // <-- Note the SEMI.

Also, a struct with an empty declarator is considered valid:

struct a { int b;  ;  };

"__inline" appears to be an acceptable alternative to "inline" for clang. Didn't test gcc on that one.
An "asm" keyword and a non-trivial grammar is allowed by gcc and clang (and they might not be quite the same grammar, not 100% sure yet.)
"\%" is parsed by gcc and clang as a valid escape (and is reduced to just %) but the grammar rejects this.
__attribute__((format(printf, blah, etc))) also see, but disallowed by the current grammar.

Once I figure out pull requests I can get provide a diff with all of these.

case insensitive grammars

Hi,

I'm quite interested in using this, but I've found a fairly major sticking point. There doesn't appear to be any easy way to parse case insensitive grammars, at least that I can see. Given how prevalent case insensitive language grammars are, it'd be nice if peg supported an easier way to parse them.

I've done some searching, and it appears that this is a common problem with things based peg. I see some discussion on the pegjs project to use a "characters"i syntax to denote case insensitive character chunks:

pegjs/pegjs#34

I'm not sure if you like that syntax or not, but something similar to ease case insensitive grammars would be super useful.

Control package name generation

~~It would be nice to be able to control the name given to the package of the parser.peg.go instead of modifying it after the code was generated.~~

Sorry, nevermind. Figured it depends on the name given in peg file.

pegRule overflows

I'm trying to create a parser from the PEG grammar for the lojban language. The grammar and the outputed parser are here

pegRule on line 8 is defined as uint8, but there are 886 enums and this overflows pegRule. It should probably be uint or something, instead.

Bug with a few captures

Hi,
parser.peg:

package main

type Parser Peg {
    req string
    opt string
}

rule <- <req> opt? {
    p.req = buffer[begin:end]
}
req <- 'req'
opt <- <'opt'> {
    p.opt = buffer[begin:end]
}

main.go:

package main

import "log"
import "fmt"

func main() {
    p := &Parser{Buffer: "reqopt"}
    p.Init()
    if err := p.Parse(); err != nil {
        log.Fatal(err)
    }
    p.Execute()
    fmt.Printf("%v %v\n", p.req, p.opt)
}

This example prints: opt opt but it should print req opt.
It seems that captures has a bug.

Improve handler integration

So I've used PegJS a bunch and I really miss a few of the features from it.. (http://pegjs.majda.cz/online)

One of the most awesome features is how you can name expressions and use it in the associated javascript code for a rule. Like so:

additive
  = left:multiplicative "+" right:additive { return left + right; }
  / multiplicative

So essentially you can name parts of the expression. I believe this is possible in your go implementation, but requires more rejiggering then I was up for when I tried to figure out what it would take.

I believe if you simply made the associated go handler for each role return 'interface{}' and return the underlying result from the associated rule.

I think this would make your peg parser pretty awesome and much more succinct to use.

PEG and GopherJS

I compiled a grammar using PEG which I then translated to javascript using gopherjs.
I get the following error while parsing an input string. However, I only get this error while using the js version. Does PEG provides a debug flag which would help understand this issue ?

parse error near Unknown (line 1 symbol 1 - line 1 symbol 1):

parse error near prolog (line 1 symbol 1 - line 1 symbol 1):

parse error near Unknown (line 1 symbol 1 - line 1 symbol 1):

Memoization

It doesn't seem like pointlander/peg currently handles memoization. Is that right? What's required to make it happen?

Implement special form of & predicate

&{ expression }

          In  this  predicate  the  simple C expression (not statement) is
          evaluated immediately when the parser reaches the predicate.  If
          the  expression  yields non-zero (true) the 'match' succeeds and
          the parser continues with the next element in the  pattern.   If
          the  expression  yields  zero  (false) the 'match' fails and the
          parser backs up to look for an alternative parse of the input.

Via http://piumarta.com/software/peg/peg.1.html,

avoiding calls to panic in cases of `rule used but not defined'

Hi! In the rare cases when a grammar is using a name of a rule in an expression but does not define this rule -- as during the development of a grammar file --, peg crashes rather than printing out a warning.

If peg encounters an identifier within an expression, it is added via AddName to the tree. If there is, as said above, no rule defined under that name, there will be no AddRule called, i.e. no item of type rule added to the tree. Later, within countRules and checkRecursion, when looking at items of TypeName, a rule of the same name will be looked for in the t.rules map. Since no rule of that name had been defined, a nil value will be returned, which is forwarded to a recursive call of functions countRules vs. checkRecursion, followed by the panic(s). A test at the end of peg.go, which normally would address this issue, cannot be reached in these cases.

I have written a patch for peg.go which tries to avoid the panics: gist 723666

It contains the following changes:

In method AddName, an empty rule is registered in map t.rules, so that later, near the top of method Compile, for each name that has no corresponding rule a dummy rule can be PushBacked into the tree.
a new type TipNil has been added, and a nilNode is defined as a pointer to a token of type TypeNil.

Method Rule.GetExpression() is adjusted to return nilNode in case rule.expression == nil. This makes it easy to avoid changes at places where GetExpression() is called, as otherwise a check against nil would have to be done.

With these changes, it seems the panics can be avoided, and the warning mentioned above actually gets printed.

Public parse Error type

It'd be nice to have a public variant of parseError{}, in my specific case I'd like to have more flexibility with the error message, so having access to some positional information would be great. Happy to send a PR.

Don't matching the longest alternative

Hi, seems parser don't match the longest alternative, but match first.
Peg:

package main

type Test Peg {
}

start <- (keyword / string) ending

ending <- ";"

keyword <- <"KEYWORD"> { fmt.Println("keyword:", buffer[begin:end]) }

string <- <char+> { fmt.Println("string:", buffer[begin:end]) }

char <- .

Go:

package main

import (
    "log"
)

func main() {
    test := &Test{Buffer: "keyword2;"}
    test.Init()
    if err := test.Parse(); err != nil {
        log.Fatal(err)
    }
    test.Execute()
}

It prints error when matching keyword rule and seeing unhandled 2, but right result is string rule.

Bug with positive lookahead operator

In the grammar below:

package main

type Dot Peg {
}

init <- ( '?' / '$' ) &alpha !.
alpha <- [a-z]+

the '&' operator causes the generated parser to be malformatted. It throws the following error:

dot.peg.go: dot.peg.go:768:8: expected ';', found 'IDENT' position

please version this repository

Hello,

Could you please tag this repository?

I am the Debian Maintainer for this project and tags would help Debian keep up with new releases/bugfixes.

See:

nil is an invalid rune

==Input ==

package main
type GccNode Peg {}
OneAttr <- (StringAttr/AddrAttr/SpecValue/NodeAttr/SourceAttr/IntAttr/SignAttr/IntAttr3/
        TagAttr/RandomSpec/
        BodyAttr/AccsAttr/
        NoteAttr/
        LinkAttr/
        QualAttr/IntAttr2/SignedIntAttr/LngtAttr
        )

==output ==
/home/mdupont/gocode/bin/peg -inline -switch test_32.peg
rule 'StringAttr' used but not defined
rule 'AddrAttr' used but not defined
rule 'SpecValue' used but not defined
rule 'NodeAttr' used but not defined
rule 'SourceAttr' used but not defined
rule 'IntAttr' used but not defined
rule 'SignAttr' used but not defined
rule 'IntAttr3' used but not defined
rule 'TagAttr' used but not defined
rule 'RandomSpec' used but not defined
rule 'BodyAttr' used but not defined
rule 'AccsAttr' used but not defined
rule 'NoteAttr' used but not defined
rule 'LinkAttr' used but not defined
rule 'QualAttr' used but not defined
rule 'IntAttr2' used but not defined
rule 'SignedIntAttr' used but not defined
rule 'LngtAttr' used but not defined
test_32.peg.go: test_32.peg.go:315:9: illegal rune literal (and 16 more errors)

==generated==

func() bool {
   {
position1 := position
   {
   switch buffer[position] {
   case '<nil>':
   {

Option to omit Pretty() function

Hello, I noticed you can't remove the Pretty() func without -noast, if you're open to a flag to remove it I'll send a PR!

I have a top-level grammar in https://github.com/tj/go-naturaldate so it ends up being part of the public API.

Cleanup generated code for golint?

Thanks for writing peg! I noticed that the generated code produces some warnings for golint:

$ ./golint ./...
configfile.peg.go:10:7: don't use underscores in Go names; const end_symbol should be endSymbol
configfile.peg.go:29:2: don't use underscores in Go names; const rule_ should be rule
configfile.peg.go:38:2: don't use underscores in Go names; const rulePre_ should be rulePre
configfile.peg.go:39:2: don't use underscores in Go names; const rule_In_ should be ruleIn
configfile.peg.go:40:2: don't use underscores in Go names; const rule_Suf should be ruleSuf
configfile.peg.go:101:1: receiver name ast should be consistent with previous receiver name node for node32
configfile.peg.go:210:10: should omit 2nd value from range; this loop is equivalent to `for i := range ...`
configfile.peg.go:349:9: should omit 2nd value from range; this loop is equivalent to `for i := range ...`

Are you interested in a PR which fixes these (stylistic) issues?

panic: runtime error: index out of range

Error

panic: runtime error: index out of range

Peg file

package main
type Parser Peg {
}
S <- . !.

Program

package main

import (
    "bytes"
    "log"
)

func main() {
    buffer := bytes.NewBufferString("a")
    parser := &Parser{Buffer: buffer.String()}
    parser.Init()

    if err := parser.Parse(); err != nil {
        log.Fatal(err)
    }
    parser.Highlighter()
}

Parser completely breaks without warning if you have more than 65536 tokens

I am parsing a medium sized file (60 kb) and the parser breaks if you have more than 16 bits worth of tokens parsed. The AST tree will be completely wrong and it will be missing tokens because it can't have more than 16 bits worth.

I made it work correctly by manually editing the generated file and changed int16 to int32. However it also looks like you preallocated slices like this make([]int32, 1, Math.MaxInt16) which cannot simply be changed to 0 and cannot be changed to MaxInt32 because no one has that much memory. So I changed it to 18 bits, but this obviously will not work for files with more tokens.

I don't feel comfortable submitting a patch for this myself because it looks like this is used in quite a few places and will probably require a significant change to remove these static limits.

implemented Gherkin parser using peg

Go gherkin package: https://github.com/muhqu/go-gherkin

I like to thank you for peg!

Return several buffers

I played a little with peg, and what was showstopper for me is that a rule like
value <- < [0-9]+ > s { p.AddValue(buffer[begin:end]) }
captures only single bracket <>

Is it possible to return several ?

I.e. like in Peg.js
http://pegjs.majda.cz/documentation
multiplicative
= left:primary "*" right:multiplicative { return left * right; }
/ primary

Omit unnecessary break at the end of case clause

We use the revive linter in our project and it complains about unnecessary break statements in case clauses. The generated code looks like this: https://github.com/pointlander/peg/blob/2356a7b0ab08d0dcd5e305164ebbf4ee043ef20e/peg.peg.go#L699:L736

Linter error message:

✘  https://revive.run/r#unnecessary-stmt  omit unnecessary break at the end of case clause  
   ....peg.go:123:7

Maybe the generated code could be improved by not outputting those breaks.

Cannot parse \x00

When parsing [\x00-\x20] the generator put a NUL character in the generated go file. The go compiler then complains about this NUL character when compiling the file:

nquads.peg.go: nquads.peg.go:1407:449: illegal character NUL

The generated code looks like:

  /* 9 IRIREF <- <('<' <((!((&('\\') '\\') | (&('`') '`') | (&('^') '^') | (&('|') '|') | (&('}') '}') | (&('{') '{') | (&('"') '"') | (&('>') '>') | (&('<') '<') | (&('\x00' | '\x01' | '\x02' | '\x03' | '\x04' | '\x05' | '\x06' | '\a' | '\b' | '\t' | '\n' | '\v' | '\f' | '\r' | '\x0e' | '\x0f' | '\x10' | '\x11' | '\x12' | '\x13' | '\x14' | '\x15' | '\x16' | '\x17' | '\x18' | '\x19' | '\x1a' | '\x1b' | '\x1c' | '\x1d' | '\x1e' | '\x1f' | ' ') [**<NUL>**- ])) .) / UCHAR)*> Action6 '>')> */

The grammar rule causing this is:

IRIREF          <-  '<' <([^\0x01-\0x20<>"{}|^`\\] / UCHAR)*>{ p.setIri(buffer[begin:end]) } '>'
UCHAR           <-  '\\u' HEX HEX HEX HEX / '\\U' HEX HEX HEX HEX HEX HEX HEX HEX
HEX             <-  [0-9A-Fa-f]

Nil dereference with negative lookahead and -switch

The following grammar causes a nil pointer dereference when using the -switch option:

package main
type test Peg {}

Begin <- !('A' / 'B' / 'BC' / !.) [D]* !.

peg -switch test.peg
panic: runtime error: invalid memory address or nil pointer dereference
[signal 0xb code=0x1 addr=0x0 pc=0x40cf40]

goroutine 16 [running]:
runtime.panic(0x5aae80, 0x6db453)
        /usr/local/go/src/pkg/runtime/panic.c:279 +0xf5
main.(*set).intersects(0xc208005020, 0x0, 0x0)
        /home/cbandy/go/src/github.com/pointlander/peg/set.go:43 +0xa0
main.func·039(0x7f16d68ed610, 0xc208044700, 0x445f00, 0xc208004fc0)
        /home/cbandy/go/src/github.com/pointlander/peg/peg.go:1118 +0xa25
main.func·039(0x7f16d68ed610, 0xc208044880, 0x0, 0x0)
        /home/cbandy/go/src/github.com/pointlander/peg/peg.go:1199 +0x1cc7
main.func·039(0x7f16d68ed610, 0xc208044b40, 0x0, 0xc208004f60)
        /home/cbandy/go/src/github.com/pointlander/peg/peg.go:1196 +0x1bb5
main.func·039(0x7f16d68ed610, 0xc208044940, 0x7f16d68e2100, 0x0)
        /home/cbandy/go/src/github.com/pointlander/peg/peg.go:1204 +0x1e38
main.func·039(0x7f16d68ed610, 0xc208044640, 0x0, 0x0)
        /home/cbandy/go/src/github.com/pointlander/peg/peg.go:1071 +0x1b3
main.(*Tree).Compile(0xc208002c60, 0xc208036230, 0xb)
        /home/cbandy/go/src/github.com/pointlander/peg/peg.go:1223 +0xb96
main.main()
        /home/cbandy/go/src/github.com/pointlander/peg/main.go:75 +0x84d

http://pointlander.info/projects/peg/ appears to be down

Fairly self-explanatory.

i get ERR_CONNECTION_REFUSED on Chrome, Mac.

Document `Init()`, `Parse()`, etc

When a parser, p, for some grammar is created, it has a number of public methods such as p.Parse(), p.Init() and others. I could not find these enumerated anywhere (other than grepping the generated .peg.go file).

I could not find documentation for those methods either. Perhaps I overlooked something, but currently I am just trying to guess at their expected behavior by studying the _test.go files.

No Long Matches Possible

Hello,

I ran into a problem parsing very long strings in PHP code with your library. The rule to parse strings is the following (simplified):

String <- '\"' (!'\"' .)* '\"'

When i try to match the following string, the String rule does not match:

"XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"

When I remove just one X on my machine, the String rule matches.

PrintSyntaxTree functions indentation breaks when used with io.Writer that is not os.Stdout

When I use bytes.Buffer as io.Writer the ouput does not have indentations because of a typo in

func (node *node32) print(w io.Writer, pretty bool, buffer string)

I pointed at the typo at #79

Bootstrap: Rule defined but not used

Hi pointlander. I'm trying to make the peg package for go, but I get this message:

make -C bootstrap/ bootstrap
make[1]: Entering directory `$HOME/peg/bootstrap'
8g -I ./ ../peg.go
8g -I ./ main.go
8l -L ./ -o bootstrap main.8
make[1]: Leaving directory `$HOME/peg/bootstrap'
./bootstrap/bootstrap
8g -I ./ peg.go bootstrap.go
bootstrap.go:314: label l13 defined and not used
make: *** [peg.8] Error 1

If you could help me, that would very much be appreciated. Thanks!

c grammar failed with multiline macro define

#define sum (a,b,c )
a +b + c

(Slight) documentation request

Do consider adding some comments to your calculator example, especially in the calculator.peg? While I am familiar with PEG grammars, I find that your example makes the use of your library no more clear. While I might be at fault, I do not believe that a bit of commenting would go amiss.

"TypeStateChange" missing from TypeMap in peg/tree/peg.go

It looks like the Type constants and the TypeMap array in peg/tree/peg.go are not in sync.

Rules defined in the grammar can conflict with identifiers within the parser itself

I'm trying to create a parser from the PEG grammar for the lojban language. The grammar and the outputed parser are here

After tweaking the outputted parser to correct the problem described in this issue, I encounter several errors like this on multiple lines:

./lojban.peg.go:22924: non-integer array index rules

...as well as this:

./lojban.peg.go:24115: cannot use rules (type [882]func() bool) as type pegRule in function argument

If I understand what's going on correctly, the parser generator prefixes rule to the front of the name of every rule defined in the grammar to create its list of enums. lojban.peg creates a rule for every letter in the lojban alphabet, which includes the letter 's', so the rule in the parser is named rules. This conflicts with a variable in the parser by the same name which is defined as an array of func() bool.

Parser very slow

The parser allocates an enormous amount of memory

var tree tokenTree = &tokens32{tree: make([]token32, math.MaxInt16)}

And then uses own vector doubling scheme

func (t *tokens32) Expand(index int) tokenTree {
        tree := t.tree
        if index >= len(tree) {
                expanded := make([]token32, 2*len(tree))
                copy(expanded, tree) 
                t.tree = expanded
        }
        return nil
}

Both of these causes the parser to be very slow because it generates an big amount of garbage. Should probably be optimized.

RFE: Support Unicode character classes

Request For Enhancement: Support Unicode character classes.

Currently, this implementation supports literal character classes such as [a-z], but that is impossibly clumsy for many things such as exposed by the standard library unicode package.

I have a use for Unicode classes: I'm working on a template language that borrows some syntax from Go. Identifiers in Go aren't [a-zA-Z0-9_], they're Lu, Ll, Lt, Lm, Lo, Ld, or '_' as seen in the spec, and I'd like to be able to allow the full range.

Obtain better understandable parsing error messages

I was wondering if there is any chance to get parser error messages that are easier to understand, like "got X, was expecting Y or Z"; currently it is quite tough to see what is going wrong.

[bug] undefined: RulePegText

./m2.peg.go:630: undefined: RulePegText
what is RulePegText ? how to define it?

./m2.peg

package main

type JsonParser Peg{
  Json
}
json <- may_space (json_object / json_array / json_string / json_number / json_true / json_false / json_null) may_space
json_object <- '{' may_space '}' / '{' (json_object_pair ',')* json_object_pair  '}'
json_object_pair <- may_space json_string may_space ':' json
json_array <- '[' may_space ']' / '[' (json ',')* json ']'
json_true <- 'true' { p.addJson(buffer[begin:end]) }
json_false <- 'false'
json_null <- 'null'
json_string <- '"' json_double_char* '"'
json_double_char <- [^"\\] / '\\' ["\\/bfnrt] / '\\u' json_hex_char json_hex_char json_hex_char json_hex_char
json_hex_char <- [0-9a-fA-F]
json_number <- '-'? ('0' / [1-9][0-9]*) ('.' [0-9]+)? ([eE][+-]?[0-9]+)? may_space

space_char <- [ \n\r\t]
#space <- space_char+
may_space <- space_char*

./main.go

package main

import (
  "fmt"
  "io/ioutil"
  "launchpad.net/goyaml"
)
type Json struct{
}
func (j *Json) addJson(json string){
  fmt.Println(json)
}
type JsonTest map[string]string
func main(){
  test_yaml_string,err:= ioutil.ReadFile("json_test.yml")
  if err!=nil{
    fmt.Println(err)
    return
  }
  json_test_data:=make(JsonTest)
  err = goyaml.Unmarshal(test_yaml_string,json_test_data)
  if err!=nil{
    fmt.Println(err)
    return
  }
  for test_name,test_data:=range json_test_data{
    parser:= &JsonParser{Buffer:test_data}
    parser.Init()
    err := parser.Parse()
    if err!=nil{
      fmt.Println("FAIL "+test_name+" ",err)
    }else{
      fmt.Println("PASS "+test_name)
    }
  }
  fmt.Println("success")
}

Any char matcher not working

None of the following regex* rules match the given input. I would expect at least one of them to work?

package main

type Query Peg {
}

expression <- regex3 !.

regex <- '/' [~/]* '/'
regex2 <- '/' !'/'* '/'
regex3 <- '/' .* '/'

package main

import (
  "log"
)

func parse(query string) {
  log.Printf("%s\n", query)
  r := &Query{Buffer: query}
  r.Init()
  if err := r.Parse(); err != nil {
    log.Fatal(err)
  }
}

func main() {
  parse("/a/")
}

Don't add "//go generate" to generated parser

I recently updated peg and run into an issue with the following commit: 190353a

May I ask why this was added? This adds the //go:generate peg -switch -inline abc/xyz.peg to the generated abc/xyz.peg.go file. Subsequent calls to go generate ./... will fail with open abc/xyz.peg: no such file or directory as the path is not correct in the output file.

I think this directive should not be in the output file. May be instead: ^// Code generated .* DO NOT EDIT.$ (golang/go#13560)

Opened a PR: #84

Feasible to port grammar file from peg.js? Any caveats?

I'm wondering how feasible it would be to port a peg.js grammar file to this (peg)? Example:

https://raw.githubusercontent.com/mongodb-js/log/88daf5f4259652f95f2d4a7a01bd3f14751e4812/mongodb-log.pegjs

I'm new to PEGs, but just skimming through the docs, it seems like it should be easy just to translate each line 1:1 - is there anything we need to be wary of?

Curious if it would ever be possible to have a feature to automatically import a grammar file from peg.js? =) (i.e. if this is something that can easily be automated).

Parse and Reset as struct methods instead of struct fields

Parse and Reset are now defined as fields in the generated parser struct, not methods on the struct. One consequence of this choice is that we cannot define an interface on the generated struct because interface cannot contain fields. This comes up when one wants to abstracts over multiple parsers.

My current workaround is to define wrappers for each generated struct, say:

func (p *Listing) ParseIntf() error {
	return p.Parse()
}

and define the interface as (simplified example):

type Listing interface {
	Init()
	ParseIntf() error
	Execute()
}

I wish you would consider lifting Parse and Reset to methods. Thanks!

bufio.Reader

Are there any plans to allow the use of a bufio.Reader instead of a string as input?

We need would like to hide segmented input behind an io.Reader interface, which can easily be converted to a more parser friendly bufio.Reader.

C grammar fails with trivial but legal C snippet

int main() {
(a)||1;
}

In general, the two-place operator symbols fail to parse in this configuration (||, &&, ->, >>, <<) as well as the two-place postfix operators ("(a)--", "(a)++"). However I did notice that ">" and "<" also fail.

The key here is the LPAR and RPAR wrapping the expression. This seems to trigger it.

I'm rather losing my mind over the bug. Any help very much appreciated!