nprapps / betty Goto Github PK

An unambiguous dialect of ArchieML

JavaScript 100.00%

betty's Introduction

Betty

A more specific dialect of ArchieML. While working with editors and reporters, we often found that the format, while "forgiving," can be brittle (especially when combined with CommonMark content). In particular, multiline keys are prone to breaking (either containing no content, or enthusiastically eating the next object in a list). As a result, Betty makes the following changes:

Lists will start a new item when they see any redefined key, not just the first key in an object.
Multiline fields are now less ambiguous: open them with key:: and close with ::key.
If you've opened multiple levels of object, you can jump back out to a specific level by key: {/name} will close {name}. Note that slash must be flush with the opening brace in this syntax: { /name } will not close an object.
Similarly, you can exit out of a specific named list with [/key] instead of needing to close individual levels with repeated [] lines
You can provide options for behavior:
- verbose - set this to be overwhelmed with logging messages
- onFieldName - provide a callback that accepts a string key for mutation and returns the transformed version. Useful for lower-casing keys when Google Docs tries to capitalize them.
- onValue - provide a callback that accepts the value and field name, and returns the actual value to add to the object. Useful for automatically casting dates, booleans, and numbers.

The module exports a single object with a parse() method, which accepts the text you want to parse and the options object.

When adding new features or altering the parser, it's useful to make sure that you haven't broken anything. npm test will run a check against the files from the original specification repo where applicable, as well as a document containing the syntax extensions defined above. Although Betty is not fully-compliant with the ArchieML spec, it should handle existing content reliably.

Behind the scenes

When you call parse(), Betty actually runs through three stages before producing a final JSON object.

A tokenizer breaks the text into a stream of tagged chunks, consisting of either possible syntax characters (such as {, }, and :) or text.
The parser takes the stream of tokens and turns them into higher level instructions for things like "enter an array," "set a value at key.path," or "buffer this text."
The assembler takes those instructions, pre-processes them (merging buffered strings together), then runs through the final stream of operations to actually assemble the object.

This is much more complex than the baseline ArchieML module. I personally think it's easier this way to reason about the logic for some of the language's "quirks," such as the inconsistent behavior of :end or \ as an escape. Your mileage may vary.

betty's People

Contributors

Stargazers

Watchers

Forkers

isabella232 thomaswilburn

betty's Issues

Potentially bless a file extension?

Would love to add this as a reader to quaff, but would need some way to say “treat this as a Betty file.”

If it passes all ArchieML tests and returns the same thing I could also potentially include a flag that uses Betty instead.

Great work!

Keys should match against linebreaks

Since we don't parse against entire lines, we've had issues where keys were detected in the middle of the line. However, we can add newline characters to our parser for keys, and that should work.

Add extension points for defining new syntax

Document the tokens that are currently parsed (https://github.com/nprapps/betty/blob/master/parser.js#L59), and add a way to pass in new token types (e.g., @key(filename.json), used to load content from an external file, would require us to add @, (, and ) as token types). The tokenizer will probably need to be a class instance and not a pure function for this to work.
Add the ability to pass in additional parsing function and type tuples (https://github.com/nprapps/betty/blob/master/parser.js#L59) that will be called in the context of the parser class and have access to its stack manipulation methods. (e.g., [loadFileRef, "AT", "TEXT", "LEFT_PAREN", "TEXT", "RIGHT_PAREN"]).

Generally speaking, we shouldn't need to extend the syntax much as that's what the field name and value hooks are for. But it would be nice to have the option to be able to define new syntax without completely forking the code, and I think it's written in a clean enough way that we can do that solely through the tokenizer and parser.

The example above is not a great use case, but it does point out a real deficiency in the existing hooks, which is that they're processed without any real context of each other. There's no good way to define a new value type that mutates the field name on use (as the @key(file) syntax would do--otherwise you could just write @key: filename, but then you'd have a key with a sobachka in it instead of the key directly stored). New syntax offers us a way around that, as it provides ways for us to process the text before it reaches the hooks (which are called in the assembler stage).

Enter/exit hooks

Was just thinking about this today re: a conversation about making AML safer in a mixed-skill newsroom environment, and these would be easy hooks to add--one with the keypath when the assembler adds an object or array to the tree, and another when it pops that branch off the stack. Especially in the latter case, when we exit the object, that's an ideal time to do validation of the item as a whole (checking for missing keys, validating user input, adding computed values).

nprapps / betty Goto Github PK

betty's Introduction

Betty

Behind the scenes

betty's People

Contributors

Stargazers

Watchers

Forkers

betty's Issues

Potentially bless a file extension?

Keys should match against linebreaks

Add extension points for defining new syntax

Enter/exit hooks

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent