Coder Social home page Coder Social logo

chartparser's Introduction

Chart parser

This parser will soon replace the custom one in lever/, until then it has not been throroughly tested. Early users beware for bugs.

Imagine you'd have to make sense of some input, commonly an user interaction. You get a sequence of symbols and have to recognize some structure for it. This is when you need a parsing library. Chart parsers are straightforward but potentially resource consumptive tool for this. For small inputs or simple languages, they're often sufficient.

Chart parser takes in a Context-free grammar and the input sequence.

Here's how to input a context-free grammar that the chartparser can read:

from chartparser import NonTerminal, Terminal, Rule

s, a = Nonterminal('s'), Nonterminal('a')
x = Terminal('x')
grammar = [
    Rule(s, [s, a]),
    Rule(s, []),
    Rule(a, [x]),
]

Note that you can pass a third parameter to Rule, that will be bound to annotation, how you use this field is entirely up to you.

Chart parsers are not usually preprocessing-free, but the preprocessing tends to happen in such short time that nobody worries about it. Here's how to form a parser:

from chartparser preprocess

parser = preprocess(grammar, accept=s)()

Note that preprocess gives a function that you can call to instantiate a parser. This is done so you can parse many inputs with single preprocessing step.

Attention: Many chart parsers can allow you to select the symbol when you start parsing, rather than when you preprocess the grammar. You are required to supply the default symbol in preprocess. But this is also acceptable:

new_parser = preprocess(grammar, accept=s)
parser = new_parser(accept=a)

If this is not possible on some other variation of this library, the new_parser() takes a liberty to crash on argument error.

The parser contains several interfaces that you probably need for your usecase. The first one is a .step, and you use it to give the input into the parser. The first argument is the symbol this item represents. It can be both Terminal and Nonterminal. The second is the token put into this position. Here's an example that uses makeshift input:

terminals = {"x": x}
input_string = "xxxxxx"
for token in input_string:
    parser.step(terminals[token], token)

Additionally you have following interfaces that are available during parsing:

  • parser.accepted tells whether the input so far can be accepted and traversed.
  • parser.expect contains terminals and nonterminals that when .step next will result in a non-empty parsing state.
  • parser.expecting(x) can be used to query whether parsing can .step with the given symbol.

Finally when you've finished your input, you want to make sense of it by .traverse. Here's how to do it:

def postorder_call(rule, rhs):
    return '(' + ' '.join(rhs) + ')'
def blank(symbol):
    return ''
def ambiguous(sppf):
    for p in sppf:
        print p
    raise Exception("ambiguous parse")
    # This may also choose one interpretation among ambiguous results.
    # in that case return one of 'p' from this function.
result = parser.traverse(postorder_call, blank, ambiguous)

Note that blank rules will never reach postorder_call. Instead the blank is called with the appropriate symbol. This is done so the chart parser doesn't need to produce permutations of the empty rules, if they appear. If you rather want to derive empty objects from the blank rules, you can obtain .nullable and .blankset during preprocessing:

new_parser = preprocess(grammar, accept=s)
print new_parser.blankset
print new_parser.nullable

Supported

I make a promise to respond on support issues as long as I keep writing python code. I also try to keep the interface described in this readme mostly unchanged from what it is now. I did lot of care to get it this clean.

The parsing library exposes lot of other details you may need necessary to use. For example, my single traversing approach might not satisfy every urge. Those details may change in future.

Origins

This module is using the work of Jeffrey Kegler. He introduced these concepts to me, and the papers about Marpa helped me to refine an adaptation of the parser on python.

Before I got to read how to write a practical parser of this kind, I found it worthwhile to read loup's explanation on earley parsing, that is worthwhile to mention.

chartparser's People

Contributors

cheery avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

chartparser's Issues

Unable to parse with a given grammar

I am trying to figure out how Earley parser, esp Leo Joop's optimization works, and found your implementation. While looking at it, I think I found a bug.

Here is my grammar

    d0 = Terminal('0')
    d1 = Terminal('1')
    d2 = Terminal('2')
    d3 = Terminal('3')
    d4 = Terminal('4')
    d5 = Terminal('5')
    d6 = Terminal('6')
    d7 = Terminal('7')
    d8 = Terminal('8')
    d9 = Terminal('9')
    plus = Terminal('+')
    minus = Terminal('-')
    div = Terminal('/')
    mul = Terminal('*')
    open_ = Terminal('(')
    close_ = Terminal(')')
    dot = Terminal('.')
    start = Nonterminal('start')
    expr = Nonterminal('expr')
    term = Nonterminal('term')
    factor = Nonterminal('factor')
    integer = Nonterminal('integer')
    digit = Nonterminal('digit')

    grammar = [
        Rule(start, [expr]),
        Rule(expr, [term, plus,expr]),
        Rule(expr, [term, minus,expr]),
        Rule(expr, [term]),
        Rule(factor, [plus, factor]),
        Rule(factor, [minus, factor]),
        Rule(factor, [open_, expr, close_]),
        Rule(factor, [integer]),
        Rule(factor, [integer, dot, integer]),
        Rule(integer, [digit, integer]),
        Rule(integer, [digit]),
        Rule(digit, [d0]),
        Rule(digit, [d1]),
        Rule(digit, [d2]),
        Rule(digit, [d3]),
        Rule(digit, [d4]),
        Rule(digit, [d5]),
        Rule(digit, [d6]),
        Rule(digit, [d7]),
        Rule(digit, [d8]),
        Rule(digit, [d9]),
    ]

    accept = start
    terminals = {
            "0": d0,
            "1": d1,
            "2": d2,
            "3": d3,
            "4": d4,
            "5": d5,
            "6": d6,
            "7": d7,
            "8": d8,
            "9": d9,
            "+": plus,
            "-": minus,
            "/": div,
            "*": mul,
            "(": open_,
            ")": close_,
            ".": dot,
            }

    user_grammar = grammar

And here is the output

python chartparser.py 
Traceback (most recent call last):
  File "cp.py", line 464, in <module>
    main()
  File "cp.py", line 82, in main
    parser.step(terminals[token], token)
  File "cp.py", line 142, in step
    for eim, bb in self.chart[location-1][term]:
KeyError: T'3'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.