Coder Social home page Coder Social logo

No need for lexeme about makeup_erlang HOT 6 OPEN

elixir-makeup avatar elixir-makeup commented on July 30, 2024
No need for lexeme

from makeup_erlang.

Comments (6)

tmbb avatar tmbb commented on July 30, 2024

There is a way around using the lexeme() combinator. For example, in the Elixir lexer the lexeme() lexer is used in some places so that we can pattern match on binaries later. For example, every lowercase name is parsed as a variable name and later turned into a keyword if the binary matches a keyword in the list.

This doesn't work in a simple way if the text part of the token ({type, meta, text}) is not guaranteed to be a binary. We'd have to keep in the source code a list of the iolists into which the keyword names are parsed. For example, case might be parsed into something like ['c', 'a', 's', 'e']. This would have to be entered into the source code manually, which is error prone and boring.

The interesting way of solving this is to parse the keywords at compile time. For example:

@keywords Internal.parse_words_into_iolist(["do", "end", "def", ...])

That would require moving the actual parser into an ElixirLexer.Internal module so that it could be used in the ElxirLexer module. While it does complicate the code a little, I believe the performance benefits would be considerable, because it avoids building many binaries during the lexing. We don't have to believe, though. We can measure it directly. My Schism library makes it particularly easy, and I'll benchmark it as soon as possible.

from makeup_erlang.

tmbb avatar tmbb commented on July 30, 2024

@josevalim and @mracos what do you think of this? Would you be in favor of making the implementation slightly more complex if it resulted in speed gains?

from makeup_erlang.

mracos avatar mracos commented on July 30, 2024

Will take a look on it 😄

from makeup_erlang.

tmbb avatar tmbb commented on July 30, 2024

Will take a look on it smile

You don't have to. I can do it myself when I have the time. But if you understood what I meant from the description above you can try to implement it.

from makeup_erlang.

mracos avatar mracos commented on July 30, 2024

Actually I should be more explicit, will take a look on the problem and see if I can understand 😅

from makeup_erlang.

tmbb avatar tmbb commented on July 30, 2024

Basically the problem is that for some tokens you want something you can match on. You can match on iolists just fine (just like you can match on binaries, for keywords, builtins, etc). The problem is that it's hard to see which iolist will be generated by the lexer for a given token. The easiest way to see which ioist will be parsed from a token is to run the lexer on the keyword.

Let's say you want to parse the case keyword in erlang or elixir. Maybe your lexer returns something like {ttype, meta, [?c, ?a, ?s, ?e]} instead of {ttype, meta, "case"}.But in the source code, you want to write "case" instead of the complex alternative. The way to transform "case" into whatever is returned by the lexer is to "lex" the "case" binary and ignore the ttype and meta parts. That way, you can build the function heads at compile time. Because the function heads need to be built at compile time, you need to separate the actual lexer into another module, so that the public-facing module can call the lexer on the lists of keywords at compile time.

This might improve performance because you don't need to build binaries when parsing anymore. You can now keep everything as an iolist instead of turning them into binaries to make it easier to postprocess. I can't guarantee it will be faster, but it's worth it to try (maybe it might even be slower because matching iolists might be slower than matching binaries (I have no idea, I really have to benchmark it)

from makeup_erlang.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.