So, I've been keeping the topic of parsers combinators on the back of my head since ~2

chunk_parser has a function that may help. <code clas

Parsers combinators about trial.protocol HOT 7 OPEN

breese commented on August 29, 2024

Parsers combinators

from trial.protocol.

Comments (7)

breese commented on August 29, 2024

chunk_parser has a function that may help. next(const view_type&) instructs the parser to continue from a new view while preserving some internal state (e.g. nesting.)

from trial.protocol.

vinipsmaker commented on August 29, 2024

next<class Token>() could have some of its checks converted from ec = token::code::error_* to assert(state != ...) and maybe be cheaper and have smaller generated code (assuming clang is not already smart enough to "execute" a lot of the code when constants are used... which usually it already does). It's not urgent tho. As you've said, next(const view_type&) can already be used. This issue is more like a place to discuss ideas.

from trial.protocol.

breese commented on August 29, 2024

Maybe next(const view_type&) should be renamed to resume(view_type) to make its purpose more clear. We could also add this function to the normal reader.

from trial.protocol.

breese commented on August 29, 2024

The mental model we have been using so far is to parse JSON until it encounters unknown data, at which point we switch to customized parsing.

Instead we could check for the custom data format before parsing as JSON. For instance, we could add a skip_until() algorithm that takes a predicate and continues skipping until the predicate is true. The predicate takes the tail view so it can detect custom data.

Something like this:

auto after = skip_until(reader, [] (view_type input) { return *input == '%'; });
// Do custom parsing now

from trial.protocol.

vinipsmaker commented on August 29, 2024

This idea reminds me about the ambiguity problem. re2c solves ambiguity by using the following rule:

If multiple rules match the same string, the earlier rule takes precedence.

we could add a skip_until()

I don't even think a helper algorithm is necessary. The user could just try the mini-parser before the JSON parser. He can use the mini_parser.literal().size() value to know how much input in json_reader.tail() should be skipped. That kinda is the solution to combine multiple JSON parsers already. For instance, I've mentioned using another chunked_parser to implement partial::{skip,parse}() in another message.

The chosen approach changes how the composed parser solves ambiguity. If multiple parsers match the same substream which one should take precedence? That choice can be fixed or can be left to the user.

If no API is added then the user will have the choice to try the mini-parser first as you've pointed out. That's something he can already do. Then the mini-parser will always have precedence over the builtin rules.

If the JSON parser is changed in a way that makes error_unexpected_token a non-terminal state then the user will have two choices that just affect the precedence (either the previous solution or the choice to try the JSON parser first). Honestly I don't think it's really important to let the user have a choice here. I can't really imagine how ambiguous rules can be a good idea to extend the JSON syntax.

However I'm only perceiving the conceptual solution here. How do you think performance would be affected, for instance? That's another perspective and I think you're in a better position to have trustworthy judgement here.

from trial.protocol.

breese commented on August 29, 2024

I definitely believe that the user should be able to resolve ambiguity. This may require domain-specific knowledge which we do not have.

The mini-parser idea is actually how chunk_reader works. It copies its internal state so it can restore the state if parsing fails due to incomplete input. It also knows that it does not need a full copy of the nesting levels, as parsing failure can only mess with the top-most element. On the other hand, it does create a copy of the state on each next() call. In contrast, if the user creates his own copy (mini-parser) then he can continue parsing on that copy until it fails. So there will overall be fewer copies made. Keep in mind that sizeof(json::reader) is 328 bytes (on a ILP64 architecture), most of which is internally cached pointers to optimize parser performance.

The biggest challenge with making error_unexpected_token a non-terminal state is that it is returned in different contexts, some of which are clearly JSON parsing failures, such as a missing JSON Object value (e.g. {"key":}). There is no show-stopper here, but it does take some work it figure out which is which.

from trial.protocol.

vinipsmaker commented on August 29, 2024

In contrast, if the user creates his own copy (mini-parser) then he can continue parsing on that copy until it fails. So there will overall be fewer copies made.

I've lost you here. Can you elaborate? What do you mean by the user creating his own copy? What scenario of combining parsers would be this exactly?

The biggest challenge with making error_unexpected_token a non-terminal state [...] There is no show-stopper here, but it does take some work it figure out which is which.

Turning error_unexpected_token in a non-terminal state is useful for a limited scenario (you want parsers combinators, and you want the main JSON parser to take precedence over your mini-parsers), but I haven't reached this scenario yet.

So far the only show-stopper for me is the inability to distinguish between error_unexpected_token and "need more input". I can't implement support for concatenated JSON streams in tabjson without this (otherwise at best I'd be buffering more of the stream endlessly). I think I can already work around everything else.

The biggest challenge with making error_unexpected_token a non-terminal state is that it is returned in different contexts, some of which are clearly JSON parsing failures, such as a missing JSON Object value (e.g. {"key":}). There is no show-stopper here, but it does take some work it figure out which is which.

The design here is certainly tricky.

from trial.protocol.

Parsers combinators about trial.protocol HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent