I've been experimenting on using exceptions to bail out early of undesired keys, but t

It is possible to work around the bad_alloc</co

I'll implement this change over the weekend. <p dir="au

String collector API should only be noexcept if `Collector()` is noexcept about trial.protocol HOT 11 OPEN

breese commented on August 29, 2024

String collector API should only be noexcept if `Collector()` is noexcept

from trial.protocol.

Comments (11)

breese commented on August 29, 2024

Are you considering using exceptions to improve performance?

from trial.protocol.

vinipsmaker commented on August 29, 2024

Are you considering using exceptions to improve performance?

Not really as an exception should be way slower. But I want to experiment. The compiler has access to the whole code, so maybe it'd do control flow transformations and go away with the exception.

In any case, the string collector API may be a vector::push_back() that throws because bad_alloc.

from trial.protocol.

breese commented on August 29, 2024

The collector is used in a critical part of the code where performance is important.

It is possible to work around the bad_alloc exception by querying the string size in advance using reader.literal().size() which will give you an overestimate of the final string size. This can be used to reserve sufficient memory before the collector is called.

I acknowledge that this is different from your pattern matching use case.

from trial.protocol.

vinipsmaker commented on August 29, 2024

The collector is used in a critical part of the code where performance is important.

I don't mean to make the collector API noexcept(false). I mean to make it conditionally noexcept FWIW. Old use cases shouldn't be affected. And if a different implementation is required for the noexcept(false) case, std::enable_if should help.

It is possible to work around the bad_alloc exception by querying the string size in advance using reader.literal().size()

Fair enough.

I acknowledge that this is different from your pattern matching use case.

I still have to run benchmarks. Using value<std::string>() instead should be faster on some data (and I can reuse storage through several calls on the same nesting level). The question is when to use each approach. Getting a good data set for the benchmark is challenging thou so I'm more inclined to just use macro to select the approach.

from trial.protocol.

vinipsmaker commented on August 29, 2024

It is possible to work around the bad_alloc exception by querying the string size in advance using reader.literal().size()

That's a killer argument for the noexcept case.

I'll think more about the problem.

In the meantime...

I acknowledge that this is different from your pattern matching use case.

What do you think about collector returning false to stop further iteration? reader.string(collector) itself should return the value from collector's return. Maybe surround the whole thing in enable_ifs so the current API doesn't break.

from trial.protocol.

vinipsmaker commented on August 29, 2024

I still have to run benchmarks. Using value<std::string>() instead should be faster on some data

Actually, there's a better way for my pattern matching. I must be stupid for not realizing this sooner.

I could collect the string once in a static array whose capacity is N + 1 where N is the size of the largest string in the static search set. I'll implement this change over the weekend. So both downsizes are avoided — dynamic allocation and decoding same value multiple times.

from trial.protocol.

vinipsmaker commented on August 29, 2024

I'll implement this change over the weekend.

Well, here is the new approach. I'm only pasting the code for the record. This take is pretty straight-forward and not worth discussing IMO.

https://github.com/vinipsmaker/trial.protocol/blob/0417cc8be422e93a5df73a025ea85f4775ec7d47/include/trial/protocol/json/scan.hpp#L40-L87

from trial.protocol.

vinipsmaker commented on August 29, 2024

Here's a different idea.

Trial.Protocol splits match and decode steps. This is beneficial when we're not interested in the decoded values. The performance benefit is very real and very measurable. It's not only useful for performance, but this flexibility allows json::reader to fit in so many different designs. It's even better than that. The reason being that it's the code that the user doesn't write that enables this optimization so he doesn't need to do anything to get this optimization for free.

However merging match and decoding steps can be useful for performance when you know you're interested in the value. Trial.Protocol interface for matching is:

reader.next();

Could you keep the following in the back of your mind for a few days to decide what you think?

// new overload
reader.next(matching_options);

MatchingOptions is a template parameter from json::reader::next() and only the defined members would be used. One of such members would be the string collector with the exact same interface that already is available. It'd allow to merge the matching and decoding steps when desired.

Loops such as:

for (;;) {
    // ...

    auto current_key = reader.value<std::string>();
    if (current_key == "foo") {
        if (!reader.next()) throw Error();
        reader.string(msg.foo);
        if (!reader.next()) throw Error();
    } else if (current_key == "bar") {
        if (!reader.next()) throw Error();
        reader.string(msg.bar);
        if (!reader.next()) throw Error();
    } else {
        if (!reader.next()) throw Error();
        json::partial::skip(reader);
    }
}

Could become:

for (;;) {
    // ...

    auto current_key = reader.value<std::string>();
    if (current_key == "foo") {
        // MatchCollector here is a user-defined class that already fills
        // the required user-customization points
        if (!reader.next(MatchCollector(msg.foo))) throw Error();
        if (reader.symbol() != json::token::symbol::string) throw Error();
        if (!reader.next()) throw Error();
    } else if (current_key == "bar") {
        if (!reader.next(MatchCollector(msg.bar))) throw Error();
        if (reader.symbol() != json::token::symbol::string) throw Error();
        if (!reader.next()) throw Error();
    } else {
        if (!reader.next()) throw Error();
        json::partial::skip(reader);
    }
}

Do notice that the same idea can be used to fill the current_key variable in the example, but I wanted to keep it simple.

Another useful member to MatchingOptions could be a tail() function that gets called to get current token before the actual matching step begins. This is useful so you can parse a group of tokens but still capture the offsets to build a string slice that happens to be the same return value from json::partial::skip(). For instance:

auto head = /* ... */;
auto val = json::partial::parse(reader, my_match_options);
auto tail = my_match_options.tail;
std::string_view lit(head, tail - head);

I had this tail() idea while I coded one recent gawk plug-in using Trial.Protocol. It'd allow greater encapsulation of responsibilities in a few cases. In the plug-in, I always store offsets from the interest set that was present in the JSON so I can rebuild a new JSON easily later by simple string concatenation. If I happen to start using json::partial::parse() I'll lose the ability to capture the tail of the subtree, but this MatchingOptions interface would allow me save the tail offset (json::partial::parse() should only use the user-provided matching_options object in the last call to reader.next() that consumes the last token from the related group).

from trial.protocol.

breese commented on August 29, 2024

Is the purpose of this API to improve usability or to get better performance?

The reason i ask is because most of the value<T>() code assume that next() has found the end of the token and checked the format, so I next(MatchingOptions) is not simply a wrapper for a next() followed by a value<T>() then we would need to rewrite a significant portion of the underlying decoder.

from trial.protocol.

vinipsmaker commented on August 29, 2024

Is the purpose of this API to improve usability or to get better performance?

The idea is to avoid walking over the string/token twice, so performance. You can ignore the tail() thing for the moment.

from trial.protocol.

vinipsmaker commented on August 29, 2024

I've kept cooking the next(/*callbacks=*/matching_options) idea in the back of my head and I think it can also be used to provide an alternative approach to offer the same feature I described in the Appendix A of that email I've sent to the Boost mailing list (i.e. you would not need to change/complicate the event set if you want to support a string token split in different buffers).

The idea is for the MatchCollector to check and export nread on the partial token and modify the string in place (after the next() call finishes of course) to discard the already consumed string slice (not counting partial sequence tokens that need to be feeded in the next buffer and so on). It's easier to explain with code, but that's the idea.

from trial.protocol.

String collector API should only be noexcept if `Collector()` is noexcept about trial.protocol HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent