Coder Social home page Coder Social logo

Parsers combinators about trial.protocol HOT 7 OPEN

breese avatar breese commented on August 29, 2024
Parsers combinators

from trial.protocol.

Comments (7)

breese avatar breese commented on August 29, 2024

chunk_parser has a function that may help. next(const view_type&) instructs the parser to continue from a new view while preserving some internal state (e.g. nesting.)

from trial.protocol.

vinipsmaker avatar vinipsmaker commented on August 29, 2024

next<class Token>() could have some of its checks converted from ec = token::code::error_* to assert(state != ...) and maybe be cheaper and have smaller generated code (assuming clang is not already smart enough to "execute" a lot of the code when constants are used... which usually it already does). It's not urgent tho. As you've said, next(const view_type&) can already be used. This issue is more like a place to discuss ideas.

from trial.protocol.

breese avatar breese commented on August 29, 2024

Maybe next(const view_type&) should be renamed to resume(view_type) to make its purpose more clear. We could also add this function to the normal reader.

from trial.protocol.

breese avatar breese commented on August 29, 2024

The mental model we have been using so far is to parse JSON until it encounters unknown data, at which point we switch to customized parsing.

Instead we could check for the custom data format before parsing as JSON. For instance, we could add a skip_until() algorithm that takes a predicate and continues skipping until the predicate is true. The predicate takes the tail view so it can detect custom data.

Something like this:

auto after = skip_until(reader, [] (view_type input) { return *input == '%'; });
// Do custom parsing now

from trial.protocol.

vinipsmaker avatar vinipsmaker commented on August 29, 2024

This idea reminds me about the ambiguity problem. re2c solves ambiguity by using the following rule:

If multiple rules match the same string, the earlier rule takes precedence.


we could add a skip_until()

I don't even think a helper algorithm is necessary. The user could just try the mini-parser before the JSON parser. He can use the mini_parser.literal().size() value to know how much input in json_reader.tail() should be skipped. That kinda is the solution to combine multiple JSON parsers already. For instance, I've mentioned using another chunked_parser to implement partial::{skip,parse}() in another message.

The chosen approach changes how the composed parser solves ambiguity. If multiple parsers match the same substream which one should take precedence? That choice can be fixed or can be left to the user.

If no API is added then the user will have the choice to try the mini-parser first as you've pointed out. That's something he can already do. Then the mini-parser will always have precedence over the builtin rules.

If the JSON parser is changed in a way that makes error_unexpected_token a non-terminal state then the user will have two choices that just affect the precedence (either the previous solution or the choice to try the JSON parser first). Honestly I don't think it's really important to let the user have a choice here. I can't really imagine how ambiguous rules can be a good idea to extend the JSON syntax.

However I'm only perceiving the conceptual solution here. How do you think performance would be affected, for instance? That's another perspective and I think you're in a better position to have trustworthy judgement here.

from trial.protocol.

breese avatar breese commented on August 29, 2024

I definitely believe that the user should be able to resolve ambiguity. This may require domain-specific knowledge which we do not have.

The mini-parser idea is actually how chunk_reader works. It copies its internal state so it can restore the state if parsing fails due to incomplete input. It also knows that it does not need a full copy of the nesting levels, as parsing failure can only mess with the top-most element. On the other hand, it does create a copy of the state on each next() call. In contrast, if the user creates his own copy (mini-parser) then he can continue parsing on that copy until it fails. So there will overall be fewer copies made. Keep in mind that sizeof(json::reader) is 328 bytes (on a ILP64 architecture), most of which is internally cached pointers to optimize parser performance.

The biggest challenge with making error_unexpected_token a non-terminal state is that it is returned in different contexts, some of which are clearly JSON parsing failures, such as a missing JSON Object value (e.g. {"key":}). There is no show-stopper here, but it does take some work it figure out which is which.

from trial.protocol.

vinipsmaker avatar vinipsmaker commented on August 29, 2024

In contrast, if the user creates his own copy (mini-parser) then he can continue parsing on that copy until it fails. So there will overall be fewer copies made.

I've lost you here. Can you elaborate? What do you mean by the user creating his own copy? What scenario of combining parsers would be this exactly?

The biggest challenge with making error_unexpected_token a non-terminal state [...] There is no show-stopper here, but it does take some work it figure out which is which.

Turning error_unexpected_token in a non-terminal state is useful for a limited scenario (you want parsers combinators, and you want the main JSON parser to take precedence over your mini-parsers), but I haven't reached this scenario yet.

So far the only show-stopper for me is the inability to distinguish between error_unexpected_token and "need more input". I can't implement support for concatenated JSON streams in tabjson without this (otherwise at best I'd be buffering more of the stream endlessly). I think I can already work around everything else.

The biggest challenge with making error_unexpected_token a non-terminal state is that it is returned in different contexts, some of which are clearly JSON parsing failures, such as a missing JSON Object value (e.g. {"key":}). There is no show-stopper here, but it does take some work it figure out which is which.

The design here is certainly tricky.

from trial.protocol.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.