Coder Social home page Coder Social logo

rustomata's Introduction

rustomata

Framework for (weighted) automata with storage

main functionalities:

  • parsing of LCFRSs using their Chomsky-Schützenberger characterisation
  • parsing of PMCFGs/MCFGs/LCFRSs and CFGs using their automata characterisation
  • approximation of MCFGs/LCFRSs and CFGs using several approximation strategies
  • coarse-to-fine parsing of MCFGs/LCFRSs and CFGs using their automata characterisation (under construction)

quick start

  • create a grammar file
    cat <<EOF > example.mcfg
    initial: [S]
    S → [[Var 0 0, Var 1 0, Var 0 1, Var 1 1]] (A, B)
    A → [[T a, Var 0 0],  [T c, Var 0 1]     ] (A   )   # 0.5
    A → [[],  []                             ] (    )   # 0.5
    B → [[T b, Var 0 0],  [T d, Var 0 1]     ] (B   )   # 0.5
    B → [[],  []                             ] (    )   # 0.5
    EOF
  • construct a tree-stack automaton from the given grammar, print out that automaton
    cargo run mcfg automaton grammar.gr
  • construct a tree-stack automaton from the given grammar and recognise the word "a a b c c d", print out a best (maximum weight) accepting configuration (if there is one)
    echo "a a b c c d" | cargo run mcfg parse grammar.gr

the grammar formats

Rustomata can deal with multiple context-free grammars (short: MCFGs) and context-free grammars (short: CFGs). MCFGs are expressively equivalent to linear context-free rewriting systems (LCFRSs) and simple range concatenation grammars (sRCG). The algorithms of rustomata use automata (tree-stack automata, short: TSA, for PMCFGs; and pushdown automata, short: PDA, for CFGs) or the Chomsky-Schützenberger representation to deal with grammars. The weights are (for now) assumed to be from the algebra (ℝ₊, ⋅, 1) of non-negative reals with multiplication; any parser considers greater values to be better.

an example of an MCFG (in the notation of an sRCG):

initial non-terminals: only S

productions:
S(x₁y₁x₂y₂) ← A(x₁, x₂) B(y₁, y₂)  with weight 1
A(ax₁, cx₂) ← A(x₁, x₂)            with weight 0.5
A(ε, ε)     ← ε                    with weight 0.5
B(by₁, dy₂) ← B(y₁, y₂)            with weight 0.5
B(ε, ε)     ← ε                    with weight 0.5
  • the same grammar in rustomata's notation:
    initial: [S]
    
    S → [[Var 0 0, Var 1 0, Var 0 1, Var 1 1]] (A, B)
    A → [[T a, Var 0 0],  [T c, Var 0 1]     ] (A   )   # 0.5
    A → [[],  []                             ] (    )   # 0.5
    B → [[T b, Var 0 0],  [T d, Var 0 1]     ] (B   )   # 0.5
    B → [[],  []                             ] (    )   # 0.5
    
  • Lines that contain no non-whitespace characters are ignored by the parser.
  • Lines whose first non-whitespace character is a % are also ignored (comments).
  • A single production may not contain newline characters.
  • A production may be followed by a comment (starting with %).
  • The weight definition (e.g. # 1.0) may be omitted. Rustomata then assumes a weight of 1.0.

an example of a CFG:

initial non-terminals: S

S → a S b    with weight 0.4
S → ε        with weight 0.6
  • the same grammar in rustomata's notation:
    initial: [S]
    
    S → [T a, Nt A, T b]  # 0.4
    S → []                # 0.6
    
  • The parser specifics of MCFGs also apply for CFGs.

constructing automata

  • create a tree-stack automaton that is equivalent to the given MCFG:
    cargo run mcfg automaton example.mcfg
  • create a push-down automaton that is equivalent to the given CFG:
    cargo run cfg automaton example.cfg

recognition functionality

  • parse an MCFG (internally constructing an tree-stack automaton):

    cargo run mcfg parse grammar.gr
  • parse a CFG (internally constructing a pushdown automaton):

    cargo run cfg parse grammar.gr

Chomsky-Schützenberger parsing for LCFRS

construct a Chomsky-Schützenberger representation (binary file) of a given LCFRS

  • for LCFRS files in rustomata's format

    cargo run -- csparsing extract examples/example.pmcfg > example.cs
  • for LCFRS files in discodop's format

    cargo run -- csparsing extract --disco \
                                   --lexer ⟨some_lexer_file_from_discodop⟩ \
                                   ⟨some_gzipped_grammar_from_discodop⟩ \
                                   > example.cs
    • …or without lexer

      cargo run -- csparsing extract --disco \\
                                     ⟨some_gzipped_grammar_from_discodop⟩ \
                                     > example.cs
    • …or with unzipped grammar

      cargo run -- csparsing extract --disco \
                                     --unzipped \
                                     --lexer ⟨some_lexer_file_from_discodop⟩ \
                                     ⟨some_unzipped_grammar_from_discodop⟩ \
                                     > example.cs
  • parse a space separated word using a Chomsky-Schützenberger representation

    echo "a a b c c d" | cargo run -- csparsing parse example.cs

approximation

Rustomata contains several approximation strategies, allowing the transformation of automata with storage into other automata with storage. Available are

multiple context-free → context-free

  • approximation of an MCFG (via a tree-stack automaton) by a pushdown automaton:
    cargo run approximation tts automaton example.mcfg
  • parse a word with the approximation automaton for the given MCFG:
    cargo run approximation tts parse example.mcfg

context-free → context-free

  • approximate a CFG (via a pushdown automaton) by a pushdown automaton using an equivalence relation on the non-terminal symbols. An equivalence relation is defined by specifying equivalence classes:
    S [S]
    N [A,B]
    R *
    
    with * matching the remaining non-terminals. To get the approximation automaton:
    cargo run approximation relabel automaton example.cfg example.classes
  • to parse with the approximation pushdown automaton:
    cargo run approximation relabel parse example.cfg example.classes

context-free → recognisable

  • approximation of a CFG (via a pushdown automaton) by a finite state automaton using a restriction of the underlying pushdown to height k:
    cargo run approximation ptk automaton example.cfg k
  • parse with the approximation finite state automaton:
    cargo run approximation ptk parse grammar.gr k

coarse-to-fine parsing

currently being refactored

rustomata's People

Contributors

denki avatar feliix42 avatar kornmaxhohnstein avatar lodifice avatar truprecht avatar uncomputable avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.