Coder Social home page Coder Social logo

pug-lexer's Introduction

pug-lexer

The pug lexer. This module is responsible for taking a string and converting it into an array of tokens.

Build Status Dependency Status NPM version Coverage Status

Installation

npm install pug-lexer

Usage

var lex = require('pug-lexer');

lex(str, options)

Convert Pug string to an array of tokens.

options can contain the following properties:

  • filename (string): The name of the Pug file; it is used in error handling if provided.
  • plugins (array): An array of plugins, in the order they should be applied.
console.log(JSON.stringify(lex('div(data-foo="bar")', {filename: 'my-file.pug'}), null, '  '))
[
  {
    "type": "tag",
    "line": 1,
    "val": "div",
    "selfClosing": false
  },
  {
    "type": "attrs",
    "line": 1,
    "attrs": [
      {
        "name": "data-foo",
        "val": "\"bar\"",
        "escaped": true
      }
    ]
  },
  {
    "type": "eos",
    "line": 1
  }
]

new lex.Lexer(str, options)

Constructor for a Lexer class. This is not meant to be used directly unless you know what you are doing.

options may contain the following properties:

  • filename (string): The name of the Pug file; it is used in error handling if provided.
  • interpolated (boolean): if the Lexer is created as a child lexer for inline tag interpolation (e.g. #[p Hello]). Defaults to false.
  • startingLine (integer): the real line number of the first line in the input. It is also used for inline tag interpolation. Defaults to 1.
  • plugins (array): An array of plugins, in the order they should be applied.

License

MIT

pug-lexer's People

Contributors

alubbe avatar evanw avatar forbeslindesay avatar hemanth avatar rzara avatar timothygu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pug-lexer's Issues

Should `text-html` capture any lines that contain HTML?

Currently p This is <strong>html</strong> text yields the following tokens:

[ { type: 'tag', line: 1, val: 'p', selfClosing: false }
, { type: 'text', line: 1, val: 'This is <strong>html</strong> text' }
, { type: 'eos', line: 1 }
]

But should the text token be text-html, or is text-html only reserved for lines of text that start with HTML tags?

Riot compatibility

Greetings, I hope this is the right place for this issue.

In Riotjs, we do <ul each={item in items}></ul> but in Pug, this code is invalid: ul(each={item in items}) {item}, so we have to wrap the Riot as a string like so: ul(each="{item in items}") {item}.

But then we lose linting.

Would it be possible to have an option to allow spaces within the {}'s, or any other such solution so I can have Riot+Pug+linting?

Many thanks in advance

Identifiers starting with 'of' in the 'each' value variable break the lexer

This was reported against my eslint pug plugin:
valpackett/eslint-plugin-pug#10

in the value variable position, an identifier like offers and in fact anything that matches of.+ (i.e. not of itself) seems to explode:

> const lex = require('pug-lexer')
> lex('each x in of')
[{"type":"each","loc":{"start":{"line":1,"column":1},"end":{"line":1,"column":13}},"val":"x","key":null,"code":"of"},{"type":"eos","loc":{"start":{"line":1,"column":13},"end":{"line":1,"column":13}}}]
> lex('each x in ofX')
Uncaught Error: Pug:1:14
  > 1| each x in ofX
--------------------^

The value variable for each must either be a valid identifier (e.g. `item`) or a pair of identifiers in square brackets (e.g. `[key, value]`).

Attribute names starting with ":" are not allowed anymore

In vue.js templates, HTML-attributes starting with : are used for passing property values to a vue component.

An example is here:

https://github.com/vuejs/vue/blob/next/examples/select2/index.html (line 27)

With jade, they have been accepted without any problems, but pug does not allow them anymore:

  1| 
  2| #app
> 3|   hello(:name='name')
--------------------^
  4| 

":" is not valid as the start or end of an un-quoted attribute.
at makeError (/home/egon/dev/electron-vue/node_modules/pug-error/index.js:32:13)
at Lexer.error (/home/egon/dev/electron-vue/node_modules/pug-lexer/index.js:52:15)
at Lexer.attrs (/home/egon/dev/electron-vue/node_modules/pug-lexer/index.js:1044:18)

Should new lines after pipeless text be reported?

e.g.

.
  foo
    bar
      baz
  .

yields

{"type":"dot","line":1}
{"type":"start-pipeless-text","line":1}
{"type":"text","line":2,"val":"foo"}
{"type":"newline","line":3}
{"type":"text","line":3,"val":"  bar"}
{"type":"newline","line":4}
{"type":"text","line":4,"val":"    baz"}
{"type":"newline","line":5}
{"type":"text","line":5,"val":"."}
{"type":"end-pipeless-text","line":5}
{"type":"eos","line":5}

I would expect the eos token to be preceded by a newline token, and the eos token to be reported as being on line 6, as per

p test

yields

{"type":"tag","line":1,"col":1,"val":"p","selfClosing":false}
{"type":"text","line":1,"col":3,"val":"test"}
{"type":"newline","line":2,"col":1}
{"type":"eos","line":2,"col":1}

case parsing

Reported by @neochrome in pugjs/pug#2235


Hi,
I run into problems parsing this kind of case/when

case "a:b"
  when "a:b"
    p a:b
  default
    p default

The error is expected "indent", but got "filter", but really it should handle "a:b" as a string literal - right?

This work around is a bit of a kludge, but at least it works:

-var test = "a:b";
case "a:b"
  when test
    p a:b
  default
    p default

Adding source endings to tokens

Currently working on adding source line endings to all tokens as discussed here. However, I wanted to move this to it's own issue instead of hijacking the other.

Going to make it match babylon's loc format which looks like:

var token = {
    loc: {
        start: { line:1, column:1 },
        end: { line:1, column:13 }
    }
};

Question: which line number should an 'outdent' return?

@ForbesLindesay: While looking at the reporting columns for tokens I noticed an difference in the line reporting of 'outdent' tokens depending on where they occur and was wondering whether this was intentional?

Take the following jade as an example:

foo
  bar
foz
  baz

Which yields the following tokens:

[ { type: 'tag', line: 1, val: 'foo', selfClosing: false },
  { type: 'indent', line: 2, val: 2 },
  { type: 'tag', line: 2, val: 'bar', selfClosing: false },
  { type: 'outdent', line: 3 },
  { type: 'tag', line: 3, val: 'foz', selfClosing: false },
  { type: 'indent', line: 4, val: 2 },
  { type: 'tag', line: 4, val: 'baz', selfClosing: false },
  { type: 'outdent', line: 4 },
  { type: 'eos', line: 4 } ]

The first outdent is reported as being on line 3, essentially before foz. However the final outdent is reported as being on line 4, essentially after baz, which is correct as there is no line 5.

So it this intentional, or should the first outdent be reported as being on line 2?

Switch to generators

Currently, the lexer returns an array containing all the tokens. For very large jade files, this means that the array has to contain ALL of the tokens, and therefore has to use a lot of RAM. For instance, it takes 112 megabytes of RAM to lex and parse a 756-kilobyte test file created by concatenating mixin.attrs.jade (if we don't copy the tokens in Lexer#getTokens it still takes 100 megabytes), when measured using GNU time.

To reduce this memory usage, we could consider using ES2015 generator functions on platforms where these are supported. In my preliminary tests, the same file only takes 91 megabytes to lex and parse, while being only marginally slower (~2%).

What's your opinion on this? Do you think the gains are enough to warrant the additional complexity?


The diffs I used:

jade-lexer:

diff --git a/index.js b/index.js
index 54badd7..d76bc91 100644
--- a/index.js
+++ b/index.js
@@ -3,6 +3,7 @@
 var assert = require('assert');
 var characterParser = require('character-parser');
 var error = require('jade-error');
+var GeneratorFunction = require('generator-function');

 module.exports = lex;
 module.exports.Lexer = Lexer;
@@ -10,6 +11,14 @@ function lex(str, filename) {
   var lexer = new Lexer(str, filename);
   return JSON.parse(JSON.stringify(lexer.getTokens()));
 }
+if (GeneratorFunction) {
+  module.exports.lexIterator = Function('Lexer',
+    'return function* (str, filename) {\n' +
+    '  var lexer = new Lexer(str, filename);\n' +
+    '  yield* lexer.getIterator();\n' +
+    '}'
+  )(Lexer);
+}

 /**
  * Initialize `Lexer` with the given `str`.
@@ -1088,5 +1097,18 @@ Lexer.prototype = {
       this.advance();
     }
     return this.tokens;
-  }
+  },
+
+  getIterator: (function () {
+    if (GeneratorFunction) {
+      return GeneratorFunction('',
+        'while (!this.ended) {\n' +
+        '  this.advance();\n' +
+        '  if (this.tokens.length === 1) yield this.tokens[0];\n' +
+        '  else yield* this.tokens[Symbol.iterator]();\n' +
+        '  this.tokens = [];\n' +
+        '}'
+      );
+    }
+  })()
 };

token-stream:

--- lib/array.js        2015-10-11 10:38:15.384840871 -0700
+++ lib/iterator.js     2015-10-11 10:56:40.112840871 -0700
@@ -2,28 +2,41 @@

 module.exports = TokenStream;
 function TokenStream(tokens) {
-  if (!Array.isArray(tokens)) {
-    throw new TypeError('tokens must be passed to TokenStream as an array.');
+  if (!tokens || !tokens[Symbol.iterator]) {
+    throw new TypeError('tokens must be passed to TokenStream as an iterable.');
   }
-  this._tokens = tokens;
+  this._iterator = tokens[Symbol.iterator]();
+  this._tokens = [];
 }
 TokenStream.prototype.lookahead = function (index) {
   if (this._tokens.length <= index) {
-    throw new Error('Cannot read past the end of a stream');
+    var j = index + 1 - this._tokens.length;
+    while (j--) {
+      var res = this._iterator.next();
+      if (res.done) throw new Error('Cannot read past the end of a stream');
+      this._tokens.push(res.value);
+    }
   }
   return this._tokens[index];
 };
 TokenStream.prototype.peek = function () {
-  if (this._tokens.length === 0) {
-    throw new Error('Cannot read past the end of a stream');
+  if (this._tokens.length) {
+    return this._tokens[0];
+  } else {
+    var res = this._iterator.next();
+    if (res.done) throw new Error('Cannot read past the end of a stream');
+    this._tokens[0] = res.value;
+    return res.value;
   }
-  return this._tokens[0];
 };
 TokenStream.prototype.advance = function () {
-  if (this._tokens.length === 0) {
-    throw new Error('Cannot read past the end of a stream');
+  if (this._tokens.length) {
+    return this._tokens.shift();
+  } else {
+    var res = this._iterator.next();
+    if (res.done) throw new Error('Cannot read past the end of a stream');
+    return res.value;
   }
-  return this._tokens.shift();
 };
 TokenStream.prototype.defer = function (token) {
   this._tokens.unshift(token);

The json files under test/cases are not really json

The files are not really json, just each lines are json.

{"type":"newline","line":3,"col":1}
{"type":"tag","line":3,"col":1,"val":"ul"}
{"type":"indent","line":4,"col":1,"val":2}
...

With a little trick you could turn them into json: use an array:

[
    {"type":"newline","line":3,"col":1},
    {"type":"tag","line":3,"col":1,"val":"ul"},
    {"type":"indent","line":4,"col":1,"val":2}
    ...
]

And json files can be read with the require() function. ;)

Class name rules are too strict

From HTML 4.01 onwards, the class attribute is allowed to have weird values, including Unicode symbols. The following code is perfectly valid:

<p class="#">Foo.
<p class="##">Bar.
<p class="">Baz.
<p class="©">Inga.
<p class="{}">Lorem.
<p class="“‘’”">Ipsum.
<p class="⌘⌥">Dolor.
<p class="{}">Sit.
<p class="[attr=value]">Amet.

However, the pug lexer restricts class names to values begining with -, _ or a letter and only containing _, -, a-z and 0-9. Is there a reason for it not being more loose?

Plugin API

Add an extra option, plugins which should be an array of plugin objects. Plugins can define methods to "override" any of the methods of the lexer which normally return true or false to indicate whether they should fall through. For example, if you wanted to implement the "id" token as a plugin you could use:

opts =  {
  plugins: [
    {
      advance: function (lexer) {
        var tok = lexer.scan(/^#([\w-]+)/, 'id');
        if (tok) {
          lexer.tokens.push(tok);
          lexer.incrementColumn(tok.val.length);
          return true;
        }
        if (/^#/.test(lexer.input)) {
          lexer.error('INVALID_ID', '"' + /.[^ \t\(\#\.\:]*/.exec(this.input.substr(1))[0] + '" is not a valid ID.');
        }
      }
    }
  ]
};

Plugins should be called in sequence until one of them returns true. In this way plugins can be combined relatively safely.

Nested template literals inside pug block return syntax error

I'm using babel-plugin-transform-react-pug in a React project where I'm passing dynamic class names. I use nested template literals like so:

return pug`
  div(className=${`${blockClass}__element`})
    | Whatever content  
`;

This works fine. But JSLint throws an error that seems to come from pug-lexer:

Syntax Error: Unexpected token
Error: Pug:1:16
  > 1| div(className=${`${blockClass}__element`})
----------------------^
    2|       | Whatever content

Syntax Error: Unexpected token
    at makeError (/home/deploy/homestars-www/node_modules/pug-error/index.js:32:13)
    at Lexer.error (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:58:15)
    at Lexer.assertExpression (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:86:12)
    at Lexer.attrs (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:1089:18)
    at Lexer.callLexerFunction (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:1319:23)
    at Lexer.advance (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:1356:15)
    at Lexer.callLexerFunction (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:1319:23)
    at Lexer.getTokens (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:1375:12)
    at lex (/home/deploy/homestars-www/node_modules/pug-lexer/index.js:12:42)
    at findVariablesInTemplate (/home/deploy/homestars-www/node_modules/pug-uses-variables/lib/findVariablesInTemplate.js:31:20)

If I remove the nested literal, this error doesn't occur.

Is this just because of the difference in formatting between babel-plugin-transform-react-pug and pugjs?

viz,

babel-plugin-transform-react-pug: `${}`
pugjs: `#{}` 

Remove "pipeless" state

Rather than setting the various state options for pipeless text, we should just call pipelessText from tokens that we expect to be followed by pipless text. This would simplify our state model and thus make plugins much less brittle.

Make attributes part of the token stream

Lets define three new token types:

{type: 'start-attributes'}
{type: 'attribute', name: 'string', val: 'string', mustEscape: true}
{type: 'end-attributes'}

This way, we can stop making attributes such a uniquely vast token, and it will be easy/obvious how to put line numbers on them.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.