Coder Social home page Coder Social logo

strblr / pegase Goto Github PK

View Code? Open in Web Editor NEW
13.0 2.0 1.0 3.71 MB

An inline, fast, powerful and lightweight PEG parser generator for JavaScript and TypeScript, with semantic actions, parametrized rules, support for native regexps, error recovery, warnings, integrated AST generation and visitors, cut operator, back references, grammar merging, and a lot more.

License: MIT License

TypeScript 100.00%
parser parsing peg grammar syntax-analysis javascript lexer typescript compiler

pegase's People

Contributors

dependabot[bot] avatar strblr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

abnerlee

pegase's Issues

Back-references and AST Nodes

If I am understanding back-references correctly, the general idea is that with a parser like the following,

const parser = peg`
add: <a>$addend '+' >a< ${({a}) => 2 * Number(a)}

$addend @raw: [0-9]+
`

The value of an input ran through this will either be twice whatever $addend resolves to, or a failure, as the back-reference cannot correctly match whatever the first value is.

Imagine a scenario where an AST is being constructed, where given non-terminals in the grammar generate nodes corresponding to matched text. Is is possible, with back-references, to exactly match the same AST found earlier in a production rule?

For example:

const parser2 = peg`
add: <a>multiply '+' >a< => 'ADD'

multiply: <a>$factor '*' <b>$factor => 'MULTIPLY'

$factor @raw: [0-9]+
`

The idea being that multiplications could have any arbitrary numbers multiplied together, forming a Node, which would then be exactly matched twice by the addition. So,

parser2.test('2*3 + 2*3') // pass
parser2.test('2*3 + 3*4') // fail

As far as I can tell, (as of 0.5.5) the back-reference in parser2 will fail, asserting that it is expecting an object. Is there enough context present at the point a back-reference might be used to attempt to reparse an exact match? Is this impossible to represent in PEGs, or grammars more broadly?

Thanks for your consideration.

Creating an AlternativeParser from string injected into template literal

There seems to be a slight unexpected behavior in how peg interprets a string passed into a template literal which already has alteration formatting applied to it:

const alternatives = ['a', 'b', 'c'];
const asAlternative = alternatives.map(a => `"${a}"`).join(' | ')
const normal = peg`"a" | "b" | "c"`
const injection = peg`${asAlternative}`

In the above snippet, normal should generate an AlternativeParser, but injection instead generates a LiteralParser:

console.log(normal) =>

AlternativeParser {
  defaultOptions: {},
  parsers: [
    LiteralParser {
      defaultOptions: {},
      literal: 'a',
      emit: true,
      expected: [Object]
    },
    LiteralParser {
      defaultOptions: {},
      literal: 'b',
      emit: true,
      expected: [Object]
    },
    LiteralParser {
      defaultOptions: {},
      literal: 'c',
      emit: true,
      expected: [Object]
    }
  ]
}

console.log(injection) => 

LiteralParser {
  defaultOptions: {},
  literal: '"a" | "b" | "c"',
  emit: false,
  expected: { type: 'LITERAL', literal: '"a" | "b" | "c"' }
}

As can be seen from the console logging on injection, the string value passed into the template literal appears to be a valid alternative expression for a PEG. Is there a mechanism---or potential road map---to support a use case along these lines?

(Obvious caveat: there could be some oddity in how template literals work, to which I am ignorant, regarding embedding string fragments in the fashion suggested above. I cannot rule user error out, so preemptive apologies should that be the case.)

Is the $token shortcut syntax still supported (c 0.5.5)

Given the following grammar:

const tokenTest = peg`
expression: tokenTerminal

$tokenTerminal: [a-z]+
`

A parsing of any input throws an error:

tokenTest.parse('abc') //=> ReferenceError: r_tokenTerminal is not defined

Given that the above syntax is just sugar around @token directives, this isn't terribly important. However, just wondering if the syntax is still meant to be supported, and if the above error is unintended behavior.

Expectation failures can repeat expected (non-) terminals

Given the following recursive parser:

const parser = peg`s: a b c
a: "a"*
b: "b"* a
c: "c"* b`

When parsing an input string which fails to match the tokens the parser is expecting, the generated error message can have (seemingly redundant) expectations:

parser.children('a d a')

/* =>
(1:3) Failure: Expected "a", "b", "a", "c", "b", "a" or end of input

> 1 | a d a
    |   ^
*/

Presumably, this is due to the multiple different recursive pathways the parser could expand the non-terminals into. However, for error reporting to the user, these repeated expectations don't provide much insight: what, fundamentally, is different between the first expected "a", the second, and the third?

Would it be possible to filter the expectations when generating the error message to only keep the distinct, unique tokens? I see from stringifyEntry you are map/reducing the expected array. Perhaps something similar to what this stackoverflow answer to a similar problem would be applicable, as a step performed before the map?

This is very low priority---as in I don't need this functionality in the immediate future.

v1 milestone

Pegase v1 is underway. There will be some major changes compared to the current pre-release API.

  • The underlying parsing strategy will change: instead of parsing directly in class methods on Parser derivatives, a Function instance will be generated for the root Parser. This Function is a compiled and optimized version of the parsing process. This allows for some interesting tricks, and is also roughly two times faster. (Done)
  • The Parser classes will be entirely rewritten (obviously). Instead of an exec member, they will have a generate member to generate source code for the compiled Function. The source code will be recursively generated via parser.compile(). This method will not have to be called explicitly, it'll be done in the peg tag. (Done)
  • Parametrized rules will be a thing. (Done)
  • Warnings and failures might be collected directly on the options object, not logger which might be removed in favor of a smaller indexer or at method. (Done)
  • Captures will be handled differently. Right now, there are some scope issues. (Done)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.