strblr / pegase Goto Github PK

An inline, fast, powerful and lightweight PEG parser generator for JavaScript and TypeScript, with semantic actions, parametrized rules, support for native regexps, error recovery, warnings, integrated AST generation and visitors, cut operator, back references, grammar merging, and a lot more.

License: MIT License

TypeScript 100.00%

parser parsing peg grammar syntax-analysis javascript lexer typescript compiler

pegase's People

Contributors

Stargazers

Watchers

Forkers

abnerlee

pegase's Issues

Back-references and AST Nodes

If I am understanding back-references correctly, the general idea is that with a parser like the following,

const parser = peg`
add: <a>$addend '+' >a< ${({a}) => 2 * Number(a)}

$addend @raw: [0-9]+
`

The value of an input ran through this will either be twice whatever $addend resolves to, or a failure, as the back-reference cannot correctly match whatever the first value is.

Imagine a scenario where an AST is being constructed, where given non-terminals in the grammar generate nodes corresponding to matched text. Is is possible, with back-references, to exactly match the same AST found earlier in a production rule?

For example:

const parser2 = peg`
add: <a>multiply '+' >a< => 'ADD'

multiply: <a>$factor '*' <b>$factor => 'MULTIPLY'

$factor @raw: [0-9]+
`

The idea being that multiplications could have any arbitrary numbers multiplied together, forming a Node, which would then be exactly matched twice by the addition. So,

parser2.test('2*3 + 2*3') // pass
parser2.test('2*3 + 3*4') // fail

As far as I can tell, (as of 0.5.5) the back-reference in parser2 will fail, asserting that it is expecting an object. Is there enough context present at the point a back-reference might be used to attempt to reparse an exact match? Is this impossible to represent in PEGs, or grammars more broadly?

Thanks for your consideration.

Any plans to have an online playground ?

Hello !
Any plans to have an online playground ?

Something like:

Cheers !

Creating an AlternativeParser from string injected into template literal

There seems to be a slight unexpected behavior in how peg interprets a string passed into a template literal which already has alteration formatting applied to it:

const alternatives = ['a', 'b', 'c'];
const asAlternative = alternatives.map(a => `"${a}"`).join(' | ')
const normal = peg`"a" | "b" | "c"`
const injection = peg`${asAlternative}`

In the above snippet, normal should generate an AlternativeParser, but injection instead generates a LiteralParser:

console.log(normal) =>

AlternativeParser {
  defaultOptions: {},
  parsers: [
    LiteralParser {
      defaultOptions: {},
      literal: 'a',
      emit: true,
      expected: [Object]
    },
    LiteralParser {
      defaultOptions: {},
      literal: 'b',
      emit: true,
      expected: [Object]
    },
    LiteralParser {
      defaultOptions: {},
      literal: 'c',
      emit: true,
      expected: [Object]
    }
  ]
}

console.log(injection) => 

LiteralParser {
  defaultOptions: {},
  literal: '"a" | "b" | "c"',
  emit: false,
  expected: { type: 'LITERAL', literal: '"a" | "b" | "c"' }
}

As can be seen from the console logging on injection, the string value passed into the template literal appears to be a valid alternative expression for a PEG. Is there a mechanism---or potential road map---to support a use case along these lines?

(Obvious caveat: there could be some oddity in how template literals work, to which I am ignorant, regarding embedding string fragments in the fashion suggested above. I cannot rule user error out, so preemptive apologies should that be the case.)

Is the $token shortcut syntax still supported (c 0.5.5)

Given the following grammar:

const tokenTest = peg`
expression: tokenTerminal

$tokenTerminal: [a-z]+
`

A parsing of any input throws an error:

tokenTest.parse('abc') //=> ReferenceError: r_tokenTerminal is not defined

Given that the above syntax is just sugar around @token directives, this isn't terribly important. However, just wondering if the syntax is still meant to be supported, and if the above error is unintended behavior.

Expectation failures can repeat expected (non-) terminals

Given the following recursive parser:

const parser = peg`s: a b c
a: "a"*
b: "b"* a
c: "c"* b`

When parsing an input string which fails to match the tokens the parser is expecting, the generated error message can have (seemingly redundant) expectations:

parser.children('a d a')

/* =>
(1:3) Failure: Expected "a", "b", "a", "c", "b", "a" or end of input

> 1 | a d a
    |   ^
*/

Presumably, this is due to the multiple different recursive pathways the parser could expand the non-terminals into. However, for error reporting to the user, these repeated expectations don't provide much insight: what, fundamentally, is different between the first expected "a", the second, and the third?

Would it be possible to filter the expectations when generating the error message to only keep the distinct, unique tokens? I see from stringifyEntry you are map/reducing the expected array. Perhaps something similar to what this stackoverflow answer to a similar problem would be applicable, as a step performed before the map?

This is very low priority---as in I don't need this functionality in the immediate future.

v1 milestone

Pegase v1 is underway. There will be some major changes compared to the current pre-release API.

The underlying parsing strategy will change: instead of parsing directly in class methods on Parser derivatives, a Function instance will be generated for the root Parser. This Function is a compiled and optimized version of the parsing process. This allows for some interesting tricks, and is also roughly two times faster. (Done)
The Parser classes will be entirely rewritten (obviously). Instead of an exec member, they will have a generate member to generate source code for the compiled Function. The source code will be recursively generated via parser.compile(). This method will not have to be called explicitly, it'll be done in the peg tag. (Done)
Parametrized rules will be a thing. (Done)
Warnings and failures might be collected directly on the options object, not logger which might be removed in favor of a smaller indexer or at method. (Done)
Captures will be handled differently. Right now, there are some scope issues. (Done)

strblr / pegase Goto Github PK

pegase's People

Contributors

Stargazers

Watchers

Forkers

pegase's Issues

Back-references and AST Nodes

Any plans to have an online playground ?

Creating an AlternativeParser from string injected into template literal

Is the $token shortcut syntax still supported (c 0.5.5)

Expectation failures can repeat expected (non-) terminals

v1 milestone

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent