We could perhaps benchmark <a href="https://pypi.org/project/pysimdjson/" rel="nofollo

I think the "shape" system I briefly described in <a class="issue-link js-issue-link"

Just letting us supply our own loads /<code class="not

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Early results: <div class="highlight highlight-source-shell notranslate position-r

Some other JSON lib + dy2py for faster deserialisation? about aiodynamo HOT 14 CLOSED

hennge commented on May 27, 2024

Some other JSON lib + dy2py for faster deserialisation?

from aiodynamo.

Comments (14)

TkTech commented on May 27, 2024 1

I think the "shape" system I briefly described in TkTech/pysimdjson#74 (comment) would work for the case you're describing. The shape is compiled into a simple bytecode in Python, and run in Cython/C. For many repeated structure transformations it's usually 1 or 2 orders of magnitude faster.

If you could provide a few examples of exactly how you're trying to transform the JSON, I can use it as an example use-case and ensure it performs well.

from aiodynamo.

dimaqq commented on May 27, 2024

Or https://github.com/TeskaLabs/cysimdjson for that matter.
Then, if that's promising, we could re-code dy2py in cython 🚀

from aiodynamo.

Tinche commented on May 27, 2024

Just letting us supply our own loads/dumps would be a great start, just plugging in ujson or orjson is a large speedup in JSON parsing and it's essentially free.

from aiodynamo.

dimaqq commented on May 27, 2024

@Tinche nice stab, but alas, Python's json is good enough.
What's slow is the conversion from amazon-style {"foo": {"S": "bar"}} to native Python {"foo": "bar"}.

It may seem trivial, but it turns out that processing millions of conversions like this takes time.
Table scan (or query) delivers thousands of items, and each items often has dozens of fields, among which there may be compound fields (arrays, mapping), and so, in the end, there are simply too many operations to perform at Python level.

from aiodynamo.

Tinche commented on May 27, 2024

@dimaqq I don't know how to optimize the JSON format Dynamo expects, but that's another issue. I do know ujson is significantly faster than Python's JSON, and there's a good chance it can be used in place with zero changes. So that's a free win.

from aiodynamo.

dimaqq commented on May 27, 2024

There's a set of benchmarks in this project, some don't require a dynamo server: https://github.com/HENNGE/aiodynamo/tree/master/benchmarks/deserialize

@Tinche would you like to run these against the main branch and against a temporary branch that uses a different library and post the results?

from aiodynamo.

Tinche commented on May 27, 2024

@dimaqq Sure, sounds like a plan.

from aiodynamo.

Tinche commented on May 27, 2024

I went with orjson in this case. I actually adapted the signing benchmark, since that's the one that touches JSON.

> pyperf compare_to before.json after.json
Mean +- std dev: [before] 71.6 us +- 1.7 us -> [after] 58.1 us +- 1.1 us: 1.23x faster

Seems worthwhile to me.

from aiodynamo.

dimaqq commented on May 27, 2024

the signing benchmark is valid (e.g. if someone loads a lot of data into dynamo), but not the most important, in my opinion.

the important bit is deserialisation, because that is a bottleneck in query/scan operations that cannot be trivially parallelised.

the deserialise benchmark doesn't even use json - rather it's about python code mangling aws-style dicts to python-style dicts. thus the important benchmark will be query. and that needs to be ran against well-provisioned cloud dynamo :pain:

(alternatively, we could set up a dummy http server that returns precooked json responses... we don't have that now, but it could be done).

I'll see if I can run some benchmarks 🤔

from aiodynamo.

dimaqq commented on May 27, 2024

Early results:

> pyperf compare_to aiodynamo_mock-without-orjson.json aiodynamo_mock-with-orjson.json
Mean +- std dev: [aiodynamo_mock-without-orjson] 108 ms +- 5 ms -> [aiodynamo_mock-with-orjson] 110 ms +- 5 ms: 1.02x slower

basically there's no difference for query or scan which are dominated by serialisation.
(tested with mock http client and ~3MB response)

from aiodynamo.

Tinche commented on May 27, 2024

I mean yeah, I guess if you use a 3 megabyte payload the majority of the time is spent elsewhere. The vast majority of my payloads are much, much smaller though, so JSON decoding does play a role.

We can talk about optimizing stuff in https://github.com/HENNGE/aiodynamo/blob/963a6baecb7782fb5820179e2ec0c041a527d02e/src/aiodynamo/utils.py in another ticket. I might be able to help shave off some microseconds on some of these, which adds up.

from aiodynamo.

dimaqq commented on May 27, 2024

I imagine the network latency would dominate for small payloads.

Consider that AWS built-in monitoring: server-side get item latency is typically 4~15ms; and lower than 1ms is not even reported.

from aiodynamo.

Tinche commented on May 27, 2024

Sure, but I can't do anything about network latency. Whereas if my JSON processing is a little more efficient, it means there's a little less latency for the endpoint, and my asyncio service is free to dedicate more time to another request sooner. I find after running asyncio at scale CPU time is a very precious resource.

Look, I'm not going to pretend this is going to be a major speedup. I am going to claim it's a very minor win that's essentially free. If you folks feel that's not the case (either it's not a win at all or it's not essentially free) I can respect that decision, it's your call after all :)

I can tell you personally, the first thing I do for our services is look at replacing the use of the stdlib json with something else, since it's very easy to do and does provide a small speedup. (Unless the service is running Pypy, that's another story.)

from aiodynamo.

dimaqq commented on May 27, 2024

I think this is pretty good now.

from aiodynamo.

Some other JSON lib + dy2py for faster deserialisation? about aiodynamo HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent