Hello. I've been working on some Toml related tooling lately and cam

Benchmarks Feedback / Suggestions / Discussion about iarna-toml HOT 8 CLOSED

iarna commented on August 29, 2024

Benchmarks Feedback / Suggestions / Discussion

from iarna-toml.

Comments (8)

iarna commented on August 29, 2024

Regarding the assertion, as its a price paid equally by the different parsers, it's something I was prepared to pass on, but you are right that we'd get a better baseline without it. The benchmarks started as a tool for honing development of this library, not as much to compare with others, and when that was the case it didn't matter so much.

It is absolutely a synthetic benchmark at the moment. I'd be interested in what it looks like with perf dependent TOML cases. I know that the personal use case that drove me to write this simply can't be loaded with toml, and started causing crashes in toml-j0.4, which while interesting, makes it unhelpful in bechmarking anything but this tool itself.

I'm a bit skeptical about digging through OSS projects for real world examples (the truth is, most use cases for TOML aren't perf dependent), but finding perf dependent use cases would be awesome.

(As an end user, personally I think the spec compliance is way more important.)

from iarna-toml.

iarna commented on August 29, 2024

For funsies, I ran the file that kicked off this project through. It weighs in at 568kb and 10,000 lines. It still makes toml-j0.4 crash with a "call stack exceded" and toml grind to a crawl, though it is loading it now!

@iarna/toml: 20MB/sec (35.57 parses per second)
toml: 0MB/sec (0.07 parses per second)
@sgarciac/bombadil: 6.7MB/sec (11.68 parses per second)
@ltd/j-toml: 24MB/sec (42.56 parses per second)

So we see that for my use case @ltd/j-toml is actually a bit faster, probably due to its lightning fast string handling, which you can see in the synthetic benchmarks. (And in fact, is why I have synthetic benchmarks—to try to identify what component of the parser is working well and what isn't. @ltd/j-toml is astounding fast at the big string benchmarks.)

from iarna-toml.

iarna commented on August 29, 2024

So, the last thing I didn't comment on is the why of the equality assertion:

And that comes down to not wanting to run benchmarks for parsers that can't correctly parse material. It doesn't matter how fast you are, if you get the wrong answer. Yes, I do have the spec compliance suite, but that'd leave us entirely unable to compare parsers as mine is the only one passing all of the suite (and essentially no one passes quite the same subset). Further, most parsers can correctly parse most TOML, and from a pragmatic stand point, it's useful to know how that plays out speed-wise.

If a tested parser starts returning something that's not equal but is in fact correct, I'll revisit how that check is done.

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

I have updated the benchmark suite to only run the assertion once, and I'll be updating the repo and the results in the docs sometime in the next week or two.

from iarna-toml.

bd82 commented on August 29, 2024

I'm a bit skeptical about digging through OSS projects for real world examples (the truth is, most use cases for TOML aren't perf dependent).

I agree that most TOML use cases are not performance dependent.

(As an end user, personally I think the spec compliance is way more important.)

Also agree.

I ran the file that kicked off this project through. It weighs in at 568kb and 10,000 lines.

Is it possible to share this file? I could use it to benchmark my own Toml Parser.

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

Yep, there is indeed a lack, that is why I built my own parser that outputs a CST (Concrete Syntax Tree)
for implementing a Toml Prettier Plugin.

I have updated the benchmark suite to only run the assertion once, and I'll be updating the repo and the results in the docs sometime in the next week or two.

Sounds good.
I think that from your replies we have a few concern here regarding performance:

Demonstrating in general that iarna-toml is fast in general
doing an in-depth analysis of different parsers strengths.

Perhaps those concerns should be separated?

As a user when I open this repo's main readme I can get overwhelmed by the large performance
data table, the simple user may just want basic numbers to compare between the parsers.
For example: see the JavaScript Parsing Libraries benchmark I've created:

https://sap.github.io/chevrotain/performance/
3 Columns only
- NAME, OP/second, relative speed
1 scenario only (1,000 lines JSON, so it is easy to convert the ops/second to K/lines per second).

and the detailed analysis may be presented in a separate page linked from the main readme?

Cheers.
Shahar.

from iarna-toml.

bd82 commented on August 29, 2024

Closing this, thanks for discussion 😄

from iarna-toml.

iarna commented on August 29, 2024

Yeah, any time, thanks for bringing it up! I did move my benchmarks close to the bottom of the readme (and link to a separate page as well, as the table is getting a bit unwieldy). (Edit: It actually resulted in me more generally reorganizing the readme.)

One interesting area of difference between the parsers is that some have a high constant time startup cost to initiating their parser (in particular, the Chevrotain based Bombadil) and this means it scores very poorly on very small input files, but is pretty competent on large documents. Does this matter? Depends on your use case.

I was curious if that was specific to Bombadil or a more general trait of Chevrotain and hacked together a quick JSON parser to compare to the Chevrotain benchmarks but… well, their parser is just a validating grammar, not something that produces any result, so it's not actually that comparable, alas.

from iarna-toml.

bd82 commented on August 29, 2024

I was curious if that was specific to Bombadil or a more general trait of Chevrotain and hacked together a quick JSON parser to compare to the Chevrotain benchmarks but… well, their parser is just a validating grammar, not something that produces any result, so it's not actually that comparable, alas.

Chevrotain does indeed have a higher startup cost, but a parser instance can (and should) be re-used so
that cost is mostly paid during program initialization (and some for the first inputs).
So that is often irrelevant for a long running process.

The reasons for the initialization cost is that like parser combinators Chevrotain does not do
any code generation, so all kinds of analysis (left recursion detection, grammar building/resolving, ...)
is done at runtime. There are ways to improve the cold start performance characteristics

e.g https://github.com/SAP/chevrotain/tree/master/examples/parser/serialized_grammar
and the cold start performance can probably (easily) be improved farther, but that has never been a high priority..

But thinking about it it now makes me realize that there may be low hanging fruit in that scenario
So I will investigate it farther. 😄
Chevrotain/chevrotain#907

Cheers.

from iarna-toml.

LongTengDao commented on August 29, 2024

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

@iarna Hi, what do you mean of "document orientated parser"?

from iarna-toml.

Benchmarks Feedback / Suggestions / Discussion about iarna-toml HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent