Coder Social home page Coder Social logo

Comments (8)

iarna avatar iarna commented on August 29, 2024

Regarding the assertion, as its a price paid equally by the different parsers, it's something I was prepared to pass on, but you are right that we'd get a better baseline without it. The benchmarks started as a tool for honing development of this library, not as much to compare with others, and when that was the case it didn't matter so much.

It is absolutely a synthetic benchmark at the moment. I'd be interested in what it looks like with perf dependent TOML cases. I know that the personal use case that drove me to write this simply can't be loaded with toml, and started causing crashes in toml-j0.4, which while interesting, makes it unhelpful in bechmarking anything but this tool itself.

I'm a bit skeptical about digging through OSS projects for real world examples (the truth is, most use cases for TOML aren't perf dependent), but finding perf dependent use cases would be awesome.

(As an end user, personally I think the spec compliance is way more important.)

from iarna-toml.

iarna avatar iarna commented on August 29, 2024

For funsies, I ran the file that kicked off this project through. It weighs in at 568kb and 10,000 lines. It still makes toml-j0.4 crash with a "call stack exceded" and toml grind to a crawl, though it is loading it now!

  • @iarna/toml: 20MB/sec (35.57 parses per second)
  • toml: 0MB/sec (0.07 parses per second)
  • @sgarciac/bombadil: 6.7MB/sec (11.68 parses per second)
  • @ltd/j-toml: 24MB/sec (42.56 parses per second)

So we see that for my use case @ltd/j-toml is actually a bit faster, probably due to its lightning fast string handling, which you can see in the synthetic benchmarks. (And in fact, is why I have synthetic benchmarks—to try to identify what component of the parser is working well and what isn't. @ltd/j-toml is astounding fast at the big string benchmarks.)

from iarna-toml.

iarna avatar iarna commented on August 29, 2024

So, the last thing I didn't comment on is the why of the equality assertion:

And that comes down to not wanting to run benchmarks for parsers that can't correctly parse material. It doesn't matter how fast you are, if you get the wrong answer. Yes, I do have the spec compliance suite, but that'd leave us entirely unable to compare parsers as mine is the only one passing all of the suite (and essentially no one passes quite the same subset). Further, most parsers can correctly parse most TOML, and from a pragmatic stand point, it's useful to know how that plays out speed-wise.

If a tested parser starts returning something that's not equal but is in fact correct, I'll revisit how that check is done.

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

I have updated the benchmark suite to only run the assertion once, and I'll be updating the repo and the results in the docs sometime in the next week or two.

from iarna-toml.

bd82 avatar bd82 commented on August 29, 2024

I'm a bit skeptical about digging through OSS projects for real world examples (the truth is, most use cases for TOML aren't perf dependent).

I agree that most TOML use cases are not performance dependent.

(As an end user, personally I think the spec compliance is way more important.)

Also agree.

I ran the file that kicked off this project through. It weighs in at 568kb and 10,000 lines.

Is it possible to share this file? I could use it to benchmark my own Toml Parser.

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

Yep, there is indeed a lack, that is why I built my own parser that outputs a CST (Concrete Syntax Tree)
for implementing a Toml Prettier Plugin.

I have updated the benchmark suite to only run the assertion once, and I'll be updating the repo and the results in the docs sometime in the next week or two.

Sounds good.
I think that from your replies we have a few concern here regarding performance:

  • Demonstrating in general that iarna-toml is fast in general
  • doing an in-depth analysis of different parsers strengths.

Perhaps those concerns should be separated?

As a user when I open this repo's main readme I can get overwhelmed by the large performance
data table, the simple user may just want basic numbers to compare between the parsers.
For example: see the JavaScript Parsing Libraries benchmark I've created:

and the detailed analysis may be presented in a separate page linked from the main readme?

Cheers.
Shahar.

from iarna-toml.

bd82 avatar bd82 commented on August 29, 2024

Closing this, thanks for discussion 😄

from iarna-toml.

iarna avatar iarna commented on August 29, 2024

Yeah, any time, thanks for bringing it up! I did move my benchmarks close to the bottom of the readme (and link to a separate page as well, as the table is getting a bit unwieldy). (Edit: It actually resulted in me more generally reorganizing the readme.)

One interesting area of difference between the parsers is that some have a high constant time startup cost to initiating their parser (in particular, the Chevrotain based Bombadil) and this means it scores very poorly on very small input files, but is pretty competent on large documents. Does this matter? Depends on your use case.

I was curious if that was specific to Bombadil or a more general trait of Chevrotain and hacked together a quick JSON parser to compare to the Chevrotain benchmarks but… well, their parser is just a validating grammar, not something that produces any result, so it's not actually that comparable, alas.

from iarna-toml.

bd82 avatar bd82 commented on August 29, 2024

I was curious if that was specific to Bombadil or a more general trait of Chevrotain and hacked together a quick JSON parser to compare to the Chevrotain benchmarks but… well, their parser is just a validating grammar, not something that produces any result, so it's not actually that comparable, alas.

Chevrotain does indeed have a higher startup cost, but a parser instance can (and should) be re-used so
that cost is mostly paid during program initialization (and some for the first inputs).
So that is often irrelevant for a long running process.

The reasons for the initialization cost is that like parser combinators Chevrotain does not do
any code generation, so all kinds of analysis (left recursion detection, grammar building/resolving, ...)
is done at runtime. There are ways to improve the cold start performance characteristics

But thinking about it it now makes me realize that there may be low hanging fruit in that scenario
So I will investigate it farther. 😄
Chevrotain/chevrotain#907

Cheers.

from iarna-toml.

LongTengDao avatar LongTengDao commented on August 29, 2024

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

@iarna Hi, what do you mean of "document orientated parser"?

from iarna-toml.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.