Coder Social home page Coder Social logo

penman's Introduction

Branch Status
master Build Status
develop Build Status

This module models graphs encoded in the PENMAN notation (e.g., AMR). It may be used as a Python library or as a script. It does not include any of the concept inventory or text-generation capabilities of the PENMAN project.

Features

Serialization between graphs and either PENMAN notation or triple conjunctions is provided by the PENMANCodec class's encode(), decode(), and iterdecode() methods. Module-level functions provide a convenient interface to this class:

  • encode(g) - serialized graph g and return the string
  • decode(s) - deserialize s and return the graph
  • load(f) - return all graphs in file f
  • loads(s) - return all graphs in string s
  • dump(gs, f) - serialize all graphs in gs and write to file f
  • dumps(gs) - serialize all graphs in gs and return the string

Passing triples=True to the above functions does (de)serialization to/from conjunctions of triples. The indent parameter of encode(), dump(), and dumps() changes how PENMAN-serialized graphs are indented (by default, they are adaptively indented to line up with their containing node). Deserialized Graph objects may be inspected and queried for their variables (nonterminal node identifiers), triples, etc. For more information, please consult the documentation, and see the example below.

Library Usage

>>> import penman
>>> g = penman.decode('(b / bark :ARG0 (d / dog))')
>>> g.triples()
[Triple(source='b', relation='instance', target='bark'), Triple(source='d', relation='instance', target='dog'), Triple(source='b', relation='ARG0', target='d')]
>>> print(penman.encode(g))
(b / bark
   :ARG0 (d / dog))
>>> print(penman.encode(g, top='d', indent=6))
(d / dog
      :ARG0-of (b / bark))
>>> print(penman.encode(g, indent=False))
(b / bark :ARG0 (d / dog))

Script Usage

$ python penman.py --help
Penman

An API and utility for working with graphs in PENMAN notation.

Usage: penman.py [-h|--help] [-V|--version] [options]

Options:
  -h, --help                display this help and exit
  -V, --version             display the version and exit
  -i FILE, --input FILE     read graphs from FILE instead of stdin
  -o FILE, --output FILE    write output to FILE instead of stdout
  -t, --triples             print graphs as triple conjunctions
  --indent N                indent N spaces per level ("no" for no newlines)
  --amr                     use AMR codec instead of generic PENMAN one

$ python penman.py <<< "(w / want-01 :ARG0 (b / boy) :ARG1 (g / go :ARG0 b))"
(w / want-01
   :ARG0 (b / boy)
   :ARG1 (g / go
            :ARG0 b))

Requirements

  • Python 2.7 or 3.3+
  • docopt (for script usage)

PENMAN Notation

The PENMAN project was a large effort at natural language generation, and what I'm calling "PENMAN notation" is more accurately "Sentence Plan Language" (SPL; [Kaspar 1989]), but I'll stick with "PENMAN notation" because it may be a more familiar name to modern users and it also sounds less specific to sentence representations, e.g., in case someone wants to use the format to encode arbitrary graphs.

This module expands the notation slightly to allow for untyped nodes (e.g., (x)) and anonymous relations (e.g., (x : (y))). It is also very permissive for the form of node identifiers (and other atoms). A PEG* definition for the notation is given below (for simplicity, whitespace is not explicitly included; assume all nonterminals can be surrounded by /\s+/):

Start    <- Node
Node     <- '(' NodeData ')'
NodeData <- Variable ('/' NodeType)? Edge*
NodeType <- Atom
Variable <- Atom
Edge     <- Relation Value
Relation <- /:[^\s(),]*/
Value    <- Node | Atom
Atom     <- String | Float | Integer | Symbol
String   <- /"[^"\\]*(?:\\.[^"\\]*)*"/
Float    <- /[-+]?(((\d+\.\d*|\.\d+)([eE][-+]?\d+)?)|\d+[eE][-+]?\d+)/
Integer  <- /[-+]?\d+/
Symbol   <- /[^\s()\/,]+/

* Note: I use | above for ordered-choice instead of / so that / can be used to surround regular expressions.

A more restricted variant of the grammar for AMR might make the ('/' NodeType) group required, and NodeTypes (maybe renamed "Concepts") could be given as a disjunction of allowed names. Similarly, Relations could be a disjunction of allowed names and possible inversions, or otherwise require at least one character after :. It might also restrict Variables to a form like /[a-z]+\d*/ and also restrict Atom values in some way. The included AMRCodec employs most of these restrictions and raises DecodeErrors for graphs it deems invalid. See also Nathan Schneider's PEG for AMR.

Disclaimer

This project is not affiliated with ISI, the PENMAN project, or the AMR project.

penman's People

Contributors

danielhers avatar goodmami avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.