Coder Social home page Coder Social logo

archieml-js's Introduction

ArchieML

Parse Archie Markup Language (ArchieML) documents into JavaScript objects.

Read about the ArchieML specification at archieml.org.

The current version is v0.5.0.

Installation

npm install archieml

Usage

<script src="archieml.js"></script>

<script type="text/javascript">
  var parsed = archieml.load("key: value");
  >> {"key": "value"}
</script>
var archieml = require('archieml');
var parsed = archieml.load("key: value");
>> {"key": "value"}

Parser options

Inline comments are now deprecated in ArchieML. They will continue to be supported until 1.0, but are now disabled by default. They can be enabled by passing an options object as the second parameter in load:

archieml.load("key: value [comment]");
>> {"key": "value [comment]"}

archieml.load("key: value [comment]", {comments: true});
>> {"key": "value"}

Using with Google Documents

We use archieml at The New York Times to parse Google Documents containing AML. This requires a little upfront work to download the document and convert it into text that archieml can load.

The first step is authenticating with the Google Drive API, and accessing the document. For this, you will need a user account that is authorized to view the document you wish to download.

For this example, I'm going to use a simple node app using Google's official googleapis npm package, but you can use another library or authentication method if you like. Whatever mechanism, you'll need to be able to export the document either as text, or html, and then run some of the post-processing listed in the example file at examples/google_drive.js.

You will need to set up a Google API application in order to authenticate yourself. Full instructions are available here. When you create your Client ID, you should list http://127.0.0.1:3000 as an authorized origin, and http://127.0.0.1:3000/oauth2callback as the callback url.

Then open up examples/google_drive.js and enter the CLIENT_ID and CLIENT_SECRET from the API account you created. And then run the server:

$ npm install archieml
$ npm install express
$ npm install googleapis
$ npm install htmlparser2
$ npm install html-entities

$ node examples/google_drive.js

You should then be able to go to http://127.0.0.1/KEY, where KEY is the file id of the Google Drive document you want to parse. Make sure that the account you created has access to that document.

You can use a test document to start that's public to everyone. It will ask you to authenticate your current session, and then will return back a json representation of the document. View the source of examples/google_drive.js for step by step instructions on what's being done.

http://127.0.0.1:3000/1JjYD90DyoaBuRYNxa4_nqrHKkgZf1HrUj30i3rTWX1s

Tests

A full shared test suite is included from the archieml.org repository, under /test. After running npm install, initialize the shared test submodules (git submodule init && git submodule update) and npm run test to execute the tests.

Changelog

  • 0.5.0 - Added support for implicit object nesting.
  • 0.4.2 - Fixes bug #19.
  • 0.4.1 - Fixes bug #21.
  • 0.4.0 - Updates to how dot-notation is handled in freeform array, unicode key support.
  • 0.3.1 - Added support for freeform arrays.
  • 0.3.0 - Added support for nested arrays. Follows modifications in ArchieML CR-20150509.
  • 0.2.0 - Arrays that are redefined now overwrite the previous definition. Skips within multi-line values break up the value. Follows modifications in ArchieML CR-20150306.
  • 0.1.2 - More consistent handling of newlines. Fixes issue #4, around detecting the scope of multi-line values.
  • 0.1.1 - Fixes issue #1, removing comment backslashes.
  • 0.1.0 - Initial release supporting the first version of the ArchieML spec, published 2015-03-06.

archieml-js's People

Contributors

abstrctn avatar afischer avatar john-michael-murphy avatar kevinschaul avatar mihi-tr avatar minikomi avatar samjacoby avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archieml-js's Issues

Cut npm release of v0.5.0?

I saw a new version of ArchieML got pushed out late last year on GitHub (glad to see it still bumping along!) but noticed 0.5.0 has never been released on npm. Is that something that's planned?

Thank you!

Support for numbers

It will be great if AML can parse numbers as numbers instead of strings. Or even better support for boolean and date.

Add SECURITY.md

Hey there!

I belong to an open source security research community, and a member (@yetingli) has found an issue, but doesn’t know the best way to disclose it.

If not a hassle, might you kindly add a SECURITY.md file with an email, or another contact method? GitHub recommends this best practice to ensure security issues are responsibly disclosed, and it would serve as a simple instruction for security researchers in the future.

Thank you for your consideration, and I look forward to hearing from you!

(cc @huntr-helper)

Document crashes JS parser (but works in Ruby)

Here's a document that crashes the JS parser but works fine in Ruby:

{part1}

text1: foo bar

bar baz

* foo
* bar
* baz

:end
{}
$ node -e 'console.log(require("archieml").load(require("fs").readFileSync("test.aml", "utf-8")))'
/vg/google-drive-sync/node_modules/archieml/archieml.js:95
    stackScope.array.push('');
                    ^

TypeError: Cannot read property 'push' of null
    at parseArrayElement (/vg/google-drive-sync/node_modules/archieml/archieml.js:95:21)
    at Object.load (/vg/google-drive-sync/node_modules/archieml/archieml.js:52:7)
    at [eval]:1:33
    at Object.exports.runInThisContext (vm.js:54:17)
    at Object.<anonymous> ([eval]-wrapper:6:22)
    at Module._compile (module.js:413:34)
    at node.js:612:27
    at nextTickCallbackWith0Args (node.js:453:9)
    at process._tickCallback (node.js:382:13)
$ ruby -r archieml -e 'p Archieml.load(File.read("test.aml"))'
{"part1"=>{"text1"=>"foo bar\n\nbar baz\n\n* foo\n* bar\n* baz"}}
$

Multi-line values are created when they shouldn't be

I'm not sure what the expected result is but this seems a little odd:

key:value
multi-line-value
[singleword]
another multi line value
:end

Output:

{
  "key": "valueanother multi line value",
  "yeah": []
}

multi-line-value has disappeared, and there's also no newlines in the output.

When there is a multi-word value within the square brackets, it is treated as a comment and we get:

key:value
multi-line-value
[double word]
another multi line value
:end

{
  "key": "value\nmulti-line-value\n\nanother multi line value"
}

Which seems more consistent with the other rules.

Since commit a89c313... objects no longer "close" as they used to

Prior to commit a98c313 the following archieml produced two objects and a top-level key:

{colors}
red: #f00
{numbers}
one: 1
{}
key: value

Like so:

{ colors: { red: '#f00' }, numbers: { one: '1' }, key: 'value' }

And since commit a98c313 objects seem to nest:

{ colors: { red: '#f00', key: 'value' }, numbers: { one: '1' } }

Is this intentional behavior? I don't see it in the spec, but it's now in the de-facto documentation since archieml.org is using archieml-js to display example code.

Is there a json to AML method?

Is there a archieml-js method that I'm missing that can convert json into AML?

My use case involves having a server that retrieves a AML google doc parses it and returns json just like examples/google_drive.js and an application that requests that json.

I want the application to be able to keep a cached local copy of that document in an AML format for situations where Google Drive goes down or connectivity is limited.

'Using with Google Documents' instructions, error

following these instructions: https://github.com/newsdev/archieml-js/tree/master#using-with-google-documents

set up google API, installed npm packages, then ran node examples/google_drive.js and I get error in console:

internal/modules/cjs/loader.js:1023
  throw err;
  ^

Error: Cannot find module 'O:\web\aml-test\examples\google_drive.js'
    at Function.Module._resolveFilename (internal/modules/cjs/loader.js:1020:15)
    at Function.Module._load (internal/modules/cjs/loader.js:890:27)
    at Function.executeUserEntryPoint [as runMain] (internal/modules/run_main.js:71:12)
    at internal/main/run_main_module.js:17:47 {
  code: 'MODULE_NOT_FOUND',
  requireStack: []
}

the google_drive.js is definitely there in node_modules/archieml/examples/, also node node_modules/examples/google_drive.js doesn't work. I am in the project directory in the terminal. Using Windows and node v14.2.0 and npm 6.14.4

Support JSON-LD

I would like to write JSON-LD with AML. It's a nice fit, as JSON-LD is good at strong, yet simple semantics for identifier and type, while AML is good at not having lots of quotes and braces.

All I need is some way to create keys containing @.

To support the JSON-LD keywords, it would require:

-  var startKey = new RegExp('^\\s*([A-Za-z0-9-_\.]+)[ \t\r]*:[ \t\r]*(.*)');
+  var startKey = new RegExp('^\\s*([A-Za-z0-9-_\.@]+)[ \t\r]*:[ \t\r]*(.*)');
-  var scopePattern = new RegExp('^\\s*(\\[|\\{)[ \t\r]*([A-Za-z0-9-_\.]*)[ \t\r]*(?:\\]|\\})[ \t\r]*.*?(\n|\r|$)');
+  var scopePattern = new RegExp('^\\s*(\\[|\\{)[ \t\r]*([A-Za-z0-9-_\.@]*)[ \t\r]*(?:\\]|\\})[ \t\r]*.*?(\n|\r|$)');

In return for this, AML could now:

  • provide a strong representation of graphs
@id: Alice
[knows]
* bob

{@context}
foaf: http://xmlns.com/foaf/0.1/
{@context.knows}
@id: foaf:knows
@type: @id 
  • capture internationalized strings
@id: Carol
{jobTitle}
en: Doctor
de: Ärtzt

{@context}
sdo: http://schema.org/
{@context.jobTitle}
@id: sdo:jobTitle
@container: @lang

The : namespacing capability, as used on the right-hand side of some of the above entries, might be a bridge too far: however, if you are asking a user to whip up some linked data, you should provide them with a good, simply-keyed context.

Only one line in a multi-line string can be escaped; multiple should be possible

Ever backslash-escaped line in the following example should be removed in the final output:

key: value
\:ignore
\:skip
\:endskip
:end

However, only the first backslash is getting removed. The regexp that we are using to remove leading backslashes does not have the g or m flags, both of which are necessary to correctly remove every backslash.

Google autoformatting tends to break arrays of strings

When you type an asterisk before a string of text in Google Docs:

* This is a string.

It tries to turn it into a bulleted list:

  • This is a string.

Could archieml parse bullets the same way it parses asterisks? Is there an even better solution?

Freeforms shouldn't require trailing newlines to pick up last line of text

[+freeform]
Text

Should produce {"freeform": [{"type": "text", "value": "Text"}]}, but at the moment that happens only if the input document contains a trailing newline.

It appears this is because the last line of the file is currently parsed only if it contains a special character; normal text lines at the end of files are ignored.

Converting JSON to ArchieML

I had a use case for converting JSON to ArchieML. I've written an ArchieML generator and wanted to make it available to the ArchieML ecosystem. You can see the generator here:
https://www.npmjs.com/package/jughead

Would it make sense to add a pointer to it in the documentation or some reference somewhere?

Make inline comments configurable

Right now, anything enclosed in [brackets] is interpreted as a comment. But many newspapers use brackets in news copy to indicate words that were added by the writer to a quote for clarity.

Could the trigger be configurable? A multi-character combination would be especially helpful (so we can find something unlikely to clash and that doesn't interfere with our templating system).

Adam
Tampa Bay Times

Test suite

I think it would make sense to add some real tests for archieml-js, maybe in form of a proper vows.js test suite. I don't get how the current phantomjs tests are supposed to be working, running npm test does nothing, since there is no test/vendor/runner.js and test/index.html in the repo.

Any objections?

In-line JSON parsing

A really nice way to augment an Archie document is to add existing data. I was curious if being able to add single line minified JSON support to archie would work?

Archie already recognizes the braces as triggers. If it could recognize a line was json and push that to the scope.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.