Coder Social home page Coder Social logo

palle-k / covfefe Goto Github PK

View Code? Open in Web Editor NEW
61.0 5.0 8.0 4.69 MB

A parser for nondeterministic context free languages

Home Page: https://palle-k.github.io/Covfefe/

License: MIT License

Swift 99.74% Shell 0.07% Ruby 0.19%
mathematical-expressions earley-parser cyk-parser grammar syntax-tree context-free-grammar context-free-language cyk earley earley-algorithm

covfefe's People

Contributors

mlajtos avatar palle-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

covfefe's Issues

Ambiguous Grammar

Hello,

I am using EarleyParser with following EBNF grammar:

    start           = [expression];
    expression      = atom | operation;
    operation       = expression, atom, atom;
    atom            = brackets | number | symbol;
    brackets        = '(', expression, ')';
    number          = {digit};
    symbol          = letter | "+" | "-";
    letter          = 'a' ... 'z';
    digit           = '0' ... '9';

(Yes, no operator precedence – '1+2-3+4' should be parsed as ((1+2)-3)+4.)

String '121' is ambiguous – sometimes parsed as number '121' and sometimes as infix operation '1 2 1' where '2' is operator and ones are operands. If I use method allSyntaxTrees(for:) I get both, which is okay, but method syntaxTree(for:) returns non-deterministicaly both trees.

Is there any way to:

  1. always return the same tree for ambiguous grammar, or
  2. make the grammar unambiguous, so '121' is parsed as number, or
  3. do something else which I don't know?

I have an Ohm/JS (Packrat parser) background and there isn't such problem, so I am a bit confused right now.

dot output label is not human friendly

I see that you are moving forward with swift 4.1 but is there any chance you could point me in the right direction to clean up the dot output as illustrated with your parser.syntaxTree(for: "(a + b)*(-c)") example?

	node29 [label="Index(_compoundOffset: 40, _cache: Swift.String.Index._Cache.utf16)..<Index(_compoundOffset: 44, _cache: Swift.String.Index._Cache.utf16)" shape=box]

Thanks.

Example usage

If I have a simple arithmetic grammar, how should I go on and traverse the syntax tree and implement the eval?

import Covfefe

let grammarString = """
expression       = binary-operation | brackets | unary-operation | number;
brackets         = '(', expression, ')';
binary-operation = expression, binary-operator, expression;
binary-operator  = '+' | '-' | '*' | '/';
unary-operation  = unary-operator, expression;
unary-operator   = '+' | '-';
number           = {digit};
digit            = '0' ... '9';
"""

let grammar = try Grammar(ebnf: grammarString, start: "expression")
let parser = EarleyParser(grammar: grammar)

let syntaxTree = try parser.syntaxTree(for: "(23+47)*(-3)")

// syntaxTree.map(transform: { node in ???})

I am coming from Ohm/JS world and I feel lost on how to proceed from here. From the source I gathered that I have to guard for terminal vs. non-terminal and desctructure accordingly, but I can't really put that into Swift code. Can you please provide some snippet to get me going? I think README would benefit from it too.

BNF.md says unicode scalars are supported, but they aren't

I understand if BNF.md is supposed to be a general reference, rather than a reference of the syntax of this specific parser, but then it shouldn't be implied that it's this parser's grammar in the readme:

Grammars can be specified in a superset of EBNF or a superset of BNF, which adopts some features of EBNF (documented [here](/BNF.md)).

so I assumed that this documents the version of (E)BNF this parser uses.

I see in BNF.md that Unicode scalars are supported:

Covfefe/BNF.md

Lines 182 to 184 in 8c69c03

unicode-scalar = "\\", "u", "{", unicode-scalar-digits, "}";
unicode-scalar-digits = [digit], [digit], [digit], [digit], [digit], [digit], [digit], digit;
digit = '0' ... '9' | 'A' ... 'F' | 'a' ... 'f';

so I tried to implement an "any non-control character goes" rule like this:

comment_char = \u{000020} ... \u{1FFFFF} ;

but I get this error message:

Error: Unmatched pattern at L1:16: '\', expected: digit | delimiting-rule-name-char | whitespace | literal | expression-optional | comment | expression-repetition | single-char-literal | delimiting-rule-name-char | expression-group

Swift 5.5 and Linux Unit tests

Recently, it has come to my attention, that swift test no longer needs, nor allows the LinuxMain.swift file in the Tests/ directory.

futuredapp/FTAPIKit@3feab35

Using #if swift(>=5.5) && !os(Linux) statement in the LinuxMain.swift won't work, the compilation (or linking) will fail as long as the LinuxMain.swift file is present.

I suppose this is a bug in the SPM itself, since the SPM should ignore all files that are outside of the target structure.

This bug is relevant on Swift 5.5. Since the minimal compiler version of this package is Swift 5.4, I suggest taking no action at this time, since deleting the LinuxMain.swift file will break the tests on Swift 5.4.

Re-creating the image in the readme

Hello,

I have a problem with grammar I'm working on (stack overflows and other crashes).

The graph in the README looks like an output of graphviz. Do you have a program that translates the syntax tree into the DOT language?

If so, would you share it with me?

Otherwise, I would attempt to implement such a program.

-Mikolas

Performance issues related to memory

Hello,

I have been playing with the package recently and I have experienced some issues.

The program seems to struggle with larger texts. At first I suspected, there might be an unreasonable amount of copying involved. After some debugging, I've came to conclusion, that this is in fact a problem with stack related to usage of recursion.

Reproduction example:

        let grammar = Grammar(start: "initial") {
            "initial"       --> n("initial") <+> t("a")
                            <|> t()
        }

        let parser = EarleyParser(grammar: grammar)
        let text = String.init(repeating: "a", count: 2500)
        let syntaxTree = try parser.syntaxTree(for: text )

I'm trying to work with files aroung 10K characters. At this time, I am able to work with 2000 characters long texts using this simple grammar.

For the time being, I will try to find a workaroud. I might have time to fix the issue in the future.

Result Builders for Grammar

Hello,

I am going to use your library for rather large grammar of Relax NG Compact (https://www.oasis-open.org/committees/relax-ng/compact-20020607.html).

After some attempts to rewrite the original notation to the Covfefe notation as defined in https://github.com/palle-k/Covfefe/blob/master/BNF.md, I've came to the conclusion, it would be much better for me, to write a DSL.

I've completed a simple proof of concept implementation which supports all expressions with exception for ranges. The DSL I've designed looks like this:

    Grammar.build {
        %"person"    %"name" .. %"space" .. (%"name" .. %"space")-? .. %"surname"
        %"name"      ^"Mikolas"
                    | ^"Jan"
                    | ^"Lukas"
        %"surname"   ^"Stuchlik"
                    | ^"Dvorak"
        %"space"     ^" "
    }

translating to

name = ('Mikolas' | 'Jan' | 'Lukas');
space = ' ';
surname = ('Stuchlik' | 'Dvorak');
person = name, space, [name, space], surname;

I've had to deal with Swift's limitations on which characters are allowed as an operator.

I'm open to suggestions.

I see two main advantages in using Function Builders

  1. some degree of static analysis is provided by the Swift compiler
  2. the user of the library is able to write some additional checks, for example, whether there are any unused non-terminals or undefined non-terminals, warnings when a non-terminal has the same name as a terminal string etc.

Would you be interested in having a Builder as a part of the main library?

Have a nice day, Mikoláš

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.