Comments (6)
There is a way around using the lexeme()
combinator. For example, in the Elixir lexer the lexeme()
lexer is used in some places so that we can pattern match on binaries later. For example, every lowercase name is parsed as a variable name and later turned into a keyword if the binary matches a keyword in the list.
This doesn't work in a simple way if the text part of the token ({type, meta, text}
) is not guaranteed to be a binary. We'd have to keep in the source code a list of the iolists into which the keyword names are parsed. For example, case
might be parsed into something like ['c', 'a', 's', 'e']
. This would have to be entered into the source code manually, which is error prone and boring.
The interesting way of solving this is to parse the keywords at compile time. For example:
@keywords Internal.parse_words_into_iolist(["do", "end", "def", ...])
That would require moving the actual parser into an ElixirLexer.Internal
module so that it could be used in the ElxirLexer
module. While it does complicate the code a little, I believe the performance benefits would be considerable, because it avoids building many binaries during the lexing. We don't have to believe, though. We can measure it directly. My Schism
library makes it particularly easy, and I'll benchmark it as soon as possible.
from makeup_erlang.
@josevalim and @mracos what do you think of this? Would you be in favor of making the implementation slightly more complex if it resulted in speed gains?
from makeup_erlang.
Will take a look on it 😄
from makeup_erlang.
Will take a look on it smile
You don't have to. I can do it myself when I have the time. But if you understood what I meant from the description above you can try to implement it.
from makeup_erlang.
Actually I should be more explicit, will take a look on the problem and see if I can understand 😅
from makeup_erlang.
Basically the problem is that for some tokens you want something you can match on. You can match on iolists just fine (just like you can match on binaries, for keywords, builtins, etc). The problem is that it's hard to see which iolist will be generated by the lexer for a given token. The easiest way to see which ioist will be parsed from a token is to run the lexer on the keyword.
Let's say you want to parse the case
keyword in erlang or elixir. Maybe your lexer returns something like {ttype, meta, [?c, ?a, ?s, ?e]}
instead of {ttype, meta, "case"}
.But in the source code, you want to write "case" instead of the complex alternative. The way to transform "case"
into whatever is returned by the lexer is to "lex" the "case"
binary and ignore the ttype
and meta
parts. That way, you can build the function heads at compile time. Because the function heads need to be built at compile time, you need to separate the actual lexer into another module, so that the public-facing module can call the lexer on the lists of keywords at compile time.
This might improve performance because you don't need to build binaries when parsing anymore. You can now keep everything as an iolist instead of turning them into binaries to make it easier to postprocess. I can't guarantee it will be faster, but it's worth it to try (maybe it might even be slower because matching iolists might be slower than matching binaries (I have no idea, I really have to benchmark it)
from makeup_erlang.
Related Issues (7)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from makeup_erlang.