Coder Social home page Coder Social logo

betty's Introduction

Betty

A more specific dialect of ArchieML. While working with editors and reporters, we often found that the format, while "forgiving," can be brittle (especially when combined with CommonMark content). In particular, multiline keys are prone to breaking (either containing no content, or enthusiastically eating the next object in a list). As a result, Betty makes the following changes:

  • Lists will start a new item when they see any redefined key, not just the first key in an object.
  • Multiline fields are now less ambiguous: open them with key:: and close with ::key.
  • If you've opened multiple levels of object, you can jump back out to a specific level by key: {/name} will close {name}. Note that slash must be flush with the opening brace in this syntax: { /name } will not close an object.
  • Similarly, you can exit out of a specific named list with [/key] instead of needing to close individual levels with repeated [] lines
  • You can provide options for behavior:
    • verbose - set this to be overwhelmed with logging messages
    • onFieldName - provide a callback that accepts a string key for mutation and returns the transformed version. Useful for lower-casing keys when Google Docs tries to capitalize them.
    • onValue - provide a callback that accepts the value and field name, and returns the actual value to add to the object. Useful for automatically casting dates, booleans, and numbers.

The module exports a single object with a parse() method, which accepts the text you want to parse and the options object.

When adding new features or altering the parser, it's useful to make sure that you haven't broken anything. npm test will run a check against the files from the original specification repo where applicable, as well as a document containing the syntax extensions defined above. Although Betty is not fully-compliant with the ArchieML spec, it should handle existing content reliably.

Behind the scenes

When you call parse(), Betty actually runs through three stages before producing a final JSON object.

  1. A tokenizer breaks the text into a stream of tagged chunks, consisting of either possible syntax characters (such as {, }, and :) or text.
  2. The parser takes the stream of tokens and turns them into higher level instructions for things like "enter an array," "set a value at key.path," or "buffer this text."
  3. The assembler takes those instructions, pre-processes them (merging buffered strings together), then runs through the final stream of operations to actually assemble the object.

This is much more complex than the baseline ArchieML module. I personally think it's easier this way to reason about the logic for some of the language's "quirks," such as the inconsistent behavior of :end or \ as an escape. Your mileage may vary.

betty's People

Contributors

thomaswilburn avatar

Stargazers

 avatar Chris Zubak-Skees avatar Greg Linch avatar Max Kohler avatar Josh Williams avatar Jacque Schrag avatar Justin Myers avatar  avatar Wojtek Grojec avatar Marc Lajoie avatar Anthony Gentry avatar James Singleton avatar  avatar Dawid Gaweł avatar Aaron Williams avatar Mitchell Thorson avatar Michael Pereira avatar Brent Jones avatar Ryan Murphy avatar Chris Amico avatar

Watchers

James Cloos avatar  avatar Wojtek Grojec avatar  avatar

betty's Issues

Potentially bless a file extension?

Would love to add this as a reader to quaff, but would need some way to say “treat this as a Betty file.”

If it passes all ArchieML tests and returns the same thing I could also potentially include a flag that uses Betty instead.

Great work!

Keys should match against linebreaks

Since we don't parse against entire lines, we've had issues where keys were detected in the middle of the line. However, we can add newline characters to our parser for keys, and that should work.

Add extension points for defining new syntax

  • Document the tokens that are currently parsed (https://github.com/nprapps/betty/blob/master/parser.js#L59), and add a way to pass in new token types (e.g., @key(filename.json), used to load content from an external file, would require us to add @, (, and ) as token types). The tokenizer will probably need to be a class instance and not a pure function for this to work.
  • Add the ability to pass in additional parsing function and type tuples (https://github.com/nprapps/betty/blob/master/parser.js#L59) that will be called in the context of the parser class and have access to its stack manipulation methods. (e.g., [loadFileRef, "AT", "TEXT", "LEFT_PAREN", "TEXT", "RIGHT_PAREN"]).

Generally speaking, we shouldn't need to extend the syntax much as that's what the field name and value hooks are for. But it would be nice to have the option to be able to define new syntax without completely forking the code, and I think it's written in a clean enough way that we can do that solely through the tokenizer and parser.

The example above is not a great use case, but it does point out a real deficiency in the existing hooks, which is that they're processed without any real context of each other. There's no good way to define a new value type that mutates the field name on use (as the @key(file) syntax would do--otherwise you could just write @key: filename, but then you'd have a key with a sobachka in it instead of the key directly stored). New syntax offers us a way around that, as it provides ways for us to process the text before it reaches the hooks (which are called in the assembler stage).

Enter/exit hooks

Was just thinking about this today re: a conversation about making AML safer in a mixed-skill newsroom environment, and these would be easy hooks to add--one with the keypath when the assembler adds an object or array to the tree, and another when it pops that branch off the stack. Especially in the latter case, when we exit the object, that's an ideal time to do validation of the item as a whole (checking for missing keys, validating user input, adding computed values).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.