Comments (18)
@Platzer thanks for the link - will try to check it out :-)
@SuperJMN removing whitespace definitely does make constructing the parser a bit simpler 👍
from superpower.
:-) .. any help with this, even just figuring out the right test cases, would be awesome @SuperJMN
from superpower.
@nblumhardt TokenizerBuilder
is an amazing feature 👍
Yesterday i refactored the tokenizer of o custom DSL from 200 lines of code to easy readable 30 lines of TokenizerBuilder
code in less than 40 minutes and just 2 of ~200 test are failing. It is really easy to get started!
from superpower.
TokenizerBuilder
is in on dev
. Leaving this open until one remaining TODO is covered - the tokenizer, if it fails, needs to report the error at the most accurate point; i.e. if one recognizer failed after consuming 0 chars, and another after 10 chars, the latter's results should be surfaced.
from superpower.
"Motivated". Did you call me? ;)
from superpower.
I'm taking a look tomorrow! Promise :)
from superpower.
OK, I'm trying to translate one of my tokenizers to the TokenizerBuilder DSL :)
If I understand it well, this tokenizer will match
- the boolean operator '=='
- the assignment operator '='.
var tokenizer = new TokenizerBuilder<LangToken>()
.Match(Span.EqualTo("=="), LangToken.DoubleEqual)
.Match(Character.EqualTo('='), LangToken.Equal)
.Build();
Does it make sense? :)
from superpower.
Another question that raises is the matching of keywords.
Right now I found I could do with this construction:
builder
.Match(Span.EqualTo("if"), LangToken.If)
.Match(Span.EqualTo("while"), LangToken.While)
.Match(Span.EqualTo("do"), LangToken.Do)
.Match(Span.EqualTo("for"), LangToken.For)
....
Is this the best way to match keywords? :)
from superpower.
I have just discovered another use case:
If you want to make the tokenizer to convert any sequence of whitespace characters to a single token, for example: LangToken.Whitespace, should we do it like this?
builder
.Match(Character.WhiteSpace.AtLeastOnce(), LangToken.Whitespace)
from superpower.
@nblumhardt Nicholas, using the Tokenizer Builder has relieved the pains of creating one manually A LOT! For me it's a big success. I've created an equivalent tokenizer using the Builder in less than 10 minutes. Wow!
from superpower.
BTW, the tokenizer I have right now is this. I've only added the tokens that I'm using right now, for my tests. It corresponds to a subset of tokens of a typical C language parser.
return new TokenizerBuilder<LangToken>()
.Match(Character.WhiteSpace.AtLeastOnce(), LangToken.Whitespace)
.Match(Span.EqualTo("=="), LangToken.DoubleEqual)
.Match(Character.EqualTo('='), LangToken.Equal)
.Match(Character.EqualTo('('), LangToken.LeftParenthesis)
.Match(Character.EqualTo(')'), LangToken.RightParenthesis)
.Match(Character.EqualTo('{'), LangToken.LeftBrace)
.Match(Character.EqualTo('}'), LangToken.RightBrace)
.Match(Character.EqualTo(';'), LangToken.Semicolon)
.Match(Span.EqualTo("if"), LangToken.If, true)
.Match(Span.Regex(@"\w[\w\d]*"), LangToken.Identifier, true)
.Build();
from superpower.
Sry for barging in... I've got a remark to @SuperJMN comment on keyword parsing. How would you build a parser for this:
if whileRunning == true
so it is generating these tokens:
- LangToken.If
- LangToken.Whitespace
- LangToken.Identifier
- LangToken.Whitespace
- LangToken.DoubleEqual
- LangToken.Whitespace
- LangToken.True
because (I think, didn't try)
builder
...
.Match(Span.EqualTo("while"), LangToken.While)
...
will match the whileRunning
identifier as LangToken.While
?
from superpower.
@Platzer Good question! I've got it to work perfectly.
I think it's not only because the order of the Match calls does matter, but also because I set the requireDelimiters
option to true
in both keywords and identifiers (it's false
by default).
This is the test code I tested (xUnit):
public class TokenizerSpecs
{
[Theory]
[MemberData(nameof(TokenData))]
public void TokenizationTest(string code, IEnumerable<LangToken> tokens)
{
var sut = CreateSut();
var actual = sut.Tokenize(code).Select(t => t.Kind);
var expected = tokens;
actual.Should().BeEquivalentTo(expected);
}
public static IEnumerable<object[]> TokenData()
{
return new List<object[]>()
{
new object[] {"==", new List<LangToken>() {LangToken.DoubleEqual},},
new object[] {"=", new List<LangToken>() {LangToken.Equal},},
new object[] {"ifSomething", new List<LangToken>() {LangToken.Identifier},},
new object[]
{
"if whileRunning == true",
new List<LangToken>()
{
LangToken.If,
LangToken.Whitespace,
LangToken.Identifier,
LangToken.Whitespace,
LangToken.DoubleEqual,
LangToken.Whitespace,
LangToken.True,
},
},
};
}
private Tokenizer<LangToken> CreateSut()
{
return TokenizerFactory.Create();
}
}
and the Tokenizer is this:
public static class TokenizerFactory
{
public static Tokenizer<LangToken> Create()
{
return new TokenizerBuilder<LangToken>()
.Match(Character.WhiteSpace.AtLeastOnce(), LangToken.Whitespace)
.Match(Span.EqualTo("=="), LangToken.DoubleEqual)
.Match(Character.EqualTo('='), LangToken.Equal)
.Match(Character.EqualTo('('), LangToken.LeftParenthesis)
.Match(Character.EqualTo(')'), LangToken.RightParenthesis)
.Match(Character.EqualTo('{'), LangToken.LeftBrace)
.Match(Character.EqualTo('}'), LangToken.RightBrace)
.Match(Character.EqualTo(';'), LangToken.Semicolon)
.Match(Span.EqualTo("if"), LangToken.If, true)
.Match(Span.EqualTo("while"), LangToken.While, true)
.Match(Span.EqualTo("true"), LangToken.True, true)
.Match(Span.EqualTo("false"), LangToken.False, true)
.Match(Span.Regex(@"\w[\w\d]*"), LangToken.Identifier, true)
.Build();
}
}
I hope you understand how xUnit works for tests :) Basically, it takes the object tuples in TokenData
static method and passes them as parameters of the [Theory]
method. As you see the TokenData
method returns the test data. It contains the input and the expected tokens. The latter is the one that you asked for :)
from superpower.
@SuperJMN looking good! Curious - why does your grammar need whitespace tokens?
from superpower.
@nblumhardt if you want to keep the users format and highlight the errors with some squiggles you would need to know how many whitespaces the user entered (see: NDC London How to parse a file - Matt Ellis).
from superpower.
@nblumhardt It seems I asked myself the wrong question: "how can my grammar ignore whitespaces?" :) To tell the truth, I just added them because it seemed unnatural at first glance, but now that you say, it's better to remove them because it will make my parsers simpler :)
from superpower.
That's great to hear, @Platzer!
from superpower.
A question please: I have a similar situation as the following:
.Match(Span.EqualTo("while"), LangToken.While) that match 'whileRunning'
I have created this helper method:
public static TextParser BuildTextParserEqualTo(string equalTo)
{
return Span.EqualTo(equalTo);
}
That I call from here:
var tokenizerBuilder = new TokenizerBuilder();
tokenizerBuilder.Ignore(Span.WhiteSpace);
tokenizerBuilder.Match(BuildTextParserEqualTo("and"), InputToken.And);
tokenizerBuilder.Match(Span.NonWhiteSpace, InputToken.None);
return tokenizerBuilder.Build();
But the "and" would get matched in this text: "aaa bbb ccc andDDD" - unwanted in my case unfortunately.
How do I get it matched for only the "and" in: "aaa bbb ccc and DDD", i.e. only for a stand-alone word / token?
(I tried the Regex approach as well, but that also seems to ignore the $ at the end of my expression)
Any help would be much appreciated.
Sry for barging in... I've got a remark to @SuperJMN comment on keyword parsing. How would you build a parser for this:
if whileRunning == true
so it is generating these tokens:
- LangToken.If - LangToken.Whitespace - LangToken.Identifier - LangToken.Whitespace - LangToken.DoubleEqual - LangToken.Whitespace - LangToken.True
because (I think, didn't try)
builder ... .Match(Span.EqualTo("while"), LangToken.While) ...
will match the
whileRunning
identifier asLangToken.While
?
from superpower.
Related Issues (20)
- Zero allocation parsing? HOT 1
- Unit Testing? HOT 1
- Is it normal that NaturalUInt32 raise System.OverflowException : Value was either too large or too small for a UInt32 ? HOT 2
- [Request] Need a new introductory blog for v2 (or v3) HOT 1
- Questions: My On Going Questions HOT 4
- Parsing confusion: Zero-width parsers, what rewinds and when, and properly returning a "failed" parse? HOT 4
- [Question] New Release? HOT 1
- Thank you for writing superpower HOT 1
- Better `Message` HOT 5
- Improve error reporting for some failed match scenarios HOT 9
- Using backslash in CStyle QuotedString causes syntax error HOT 1
- `OptionalOrNull()` HOT 2
- Send parser value into another parser HOT 1
- Tokenizer and Parser Assistance HOT 3
- TokenizerBuilder and Mapping CLR Types HOT 2
- Dynamically referencing Parsers via Decorators for TokenizerBuilder HOT 2
- Some way to check the next token HOT 3
- Missing: Cheat sheet for upgrading from Sprache to Superpower HOT 1
- How to handle consecutive delimiters HOT 6
- dotnet 8 upgrade? HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from superpower.