Coder Social home page Coder Social logo

kiki's Introduction

Kiki

crates.io

Kiki is a minimalist parser generator for Rust.

Table of contents

How to use Kiki?

Read the quickstart.

Why use Kiki?

  • Easy to learn. If you've used other parser generators before (e.g., Bison/yacc), you can learn Kiki in under 10 minutes.
  • Easy to write. Tools like Bison or lalrpop force you to write a grammar, semantic actions, and syntax tree type definitions. Kiki lets you write only the type definitions. Kiki infers the grammar and semantic actions from the type definitions.
  • Easy to read. Kiki has a minimalist syntax. This makes it easy to learn, and easy to read.

Kiki's limitations

  • Kiki only supports LALR(1) grammars.
  • Kiki parses token sequences, not strings.
    • In other words, you must provide your own lexer. You can either implement the lexer by hand, or use a lexer generator (e.g., logos).

Example

In this section, we build a toy parser that recognizes the arithmetic expressions. For example:

  • 42
  • 42 + 53
  • 29 + (893 * 7)

For simplicity, this language does not have operator precedence. Instead, you must use parentheses (e.g., 29 + (893 * 7)).

Let's compare how we build the parser using Bison and Kiki.

With Bison

Suppose Bison hypothetically supported Rust (instead of only C/C++). Then you might write:

%{
    enum Expr {
        Num(i32),
        Op {
            left: Box<Expr>,
            kind: OpKind,
            right: Box<Expr>,
        },
    }

    enum OpKind {
        Add,
        Sub,
        Cons,
        Div,
    }
}

%token <i32> NUM

// Other than NUM, the rest of the tokens
// only have one possible value each.
// So, we set their type to the unit type (`()`).
%token <()> PLUS
%token <()> MINUS
%token <()> STAR
%token <()> SLASH
%token <()> LPAREN
%token <()> RPAREN

%start expr

%%

expr
    : term
        {
            $$ = $1;
        }
    | term PLUS term
        {
            $$ = Expr::Op {
                left: Box::new($1),
                kind: OpKind::Add,
                right: Box::new($3),
            };
        }
    | term MINUS term
        {
            $$ = Expr::Op {
                left: Box::new($1),
                kind: OpKind::Sub,
                right: Box::new($3),
            };
        }
    | term STAR term
        {
            $$ = Expr::Op {
                left: Box::new($1),
                kind: OpKind::Cons,
                right: Box::new($3),
            };
        }
    | term SLASH term
        {
            $$ = Expr::Op {
                left: Box::new($1),
                kind: OpKind::Div,
                right: Box::new($3),
            };
        }
;

term
    : NUM
        {
            $$ = Expr::Num($1);
        }
    | LPAREN expr RPAREN
        {
            $$ = $2;
        }
;

Observe that there are three things you must write:

  1. The grammar (i.e., expr : term ...; and term : NUM ...;).
  2. The semantic actions (e.g., $$ = Expr::Op {...};).
  3. The syntax tree type definitions (i.e., enum Expr {...} and enum OpKind {...}).

With Kiki

In Kiki, you write:

terminal Token {
    $Num: i32
    $Plus: ()
    $Minus: ()
    $Star: ()
    $Slash: ()
    $LParen: ()
    $RParen: ()
}

start Expr

enum Expr {
    Term(Term)
    Op {
        left: Expr
        kind: OpKind
        right: Expr
    }
}

enum OpKind {
    Add(_: $Plus)
    Sub(_: $Minus)
    Cons(_: $Star)
    Div(_: $Div)
}

enum Term {
    Num($Num)
    Parenthesized(
        _: $LParen
        Expr
        _: $RParen
    )
}

Observe this code is much simpler and shorter. Instead of having to write the grammar, semantic actions, and syntax tree type definition, you only need to write the syntax tree type definition. Kiki infers the grammar and semantic action from the type definition.

Guide

You can read the user guide here. The guide explains Kiki in detail. It is not a fast read.

If your goal is to grok Kiki as quickly as possible, click here.

Contributing

Contributions are welcome. Simply open a new issue or pull request, and I'll take a look. All forms of contribution (e.g., bugfixes, tests, documentation, typo correction) are helpful.

kiki's People

Contributors

kylejlin avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

kiki's Issues

Clarify behavior for non-empty unit-like structs

Problem

Suppose we have:

start Foo

struct Foo(_: $Ident)

terminal Token {
    $Ident: String
}

I didn't originally anticipate this edge case.
Currently, this is undefined behavior.
I think the emitted Rust will look something like

// ...
struct Foo();

Notice the empty ().
I think it makes more sense to output something like:

// ...
struct Foo;

Solution

Modify the code generator to emit

// ...
struct Foo;

When we add docs, we must make sure to clarify this.

Test parsing in end-to-end tests

Once we use Kiki to generate Rust code, we should run that Rust code on some strings and then examine the resulting parse tree (or parse error).
This requires writing a lexer.

Yank old releases

After we publish the next release, we need to yank the old releases. They are unacceptably buggy.

Move e2e tests to a separate crate

Problem

The current e2e tests test parsers generated by the latest version of Kiki on crates.io.
They do not test parsers generated by the current version of Kiki (i.e., the one in the repo).
This produces misleading results.

We need to build Kiki from the repo, and then use that built executable to generate the parsers.
The e2e tests would then reference those parsers.
Ideally, we would do this in build.rs.
However, build.rs runs before compiling the repo (that's the whole point), so build.rs cannot depend on the current repo.

Solution

Create a separate crate that depends on the crate in this repo.
We call the crate kiki_e2e_test.
kiki_e2e_test's build.rs can depend on the local version of kiki.
We do not pubish kiki_e2e_test to crates.io.

Fix rule reduction code for all-underscore fieldsets

If a fieldset has only underscore fields, the current reduction code is the same as for an empty fieldset.
This is incorrect, since we need to pop items from the parse stack (even though we don't do anything with those popped items).

Support `#![allow()]`

Problem

The generated code will probably violate many lint (e.g. Clippy) rules.
This will lead to extraneous warning messages each time you compile.

Solution

Let the user specify #![allow(...)] to disable the linter within the generated file.

Support inner attributes

An inner attribute is of the form #![...].
We already support outer attributes (i.e., #[...]--notice the lack of the !).

Add VS Code syntax highlighting

I will most likely create the extension in another repository. But until I do that, I'm using this repository to host the tracking issue (i.e., this issue).

Update files based on last write time

Currently, we compare the hash.
However, this requires us to read the file,
which is rather expensive.
I'm not sure how easy it is to get a file's last write time.
I'll look into this tomorrow.

If using the last write time doesn't work out,
an alternative is to check the hash but to read the existing generated Rust code as a stream.
This eliminates the need to read the whole Rust file.
We still need to read the entire Kiki file, in order to hash it.
But since the Rust file is almost always much larger than the grammar,
I'd imagine reading it would be the IO bottleneck.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.