Coder Social home page Coder Social logo

Comments (14)

TkTech avatar TkTech commented on May 27, 2024 1

I think the "shape" system I briefly described in TkTech/pysimdjson#74 (comment) would work for the case you're describing. The shape is compiled into a simple bytecode in Python, and run in Cython/C. For many repeated structure transformations it's usually 1 or 2 orders of magnitude faster.

If you could provide a few examples of exactly how you're trying to transform the JSON, I can use it as an example use-case and ensure it performs well.

from aiodynamo.

dimaqq avatar dimaqq commented on May 27, 2024

Or https://github.com/TeskaLabs/cysimdjson for that matter.
Then, if that's promising, we could re-code dy2py in cython 🚀

from aiodynamo.

Tinche avatar Tinche commented on May 27, 2024

Just letting us supply our own loads/dumps would be a great start, just plugging in ujson or orjson is a large speedup in JSON parsing and it's essentially free.

from aiodynamo.

dimaqq avatar dimaqq commented on May 27, 2024

@Tinche nice stab, but alas, Python's json is good enough.
What's slow is the conversion from amazon-style {"foo": {"S": "bar"}} to native Python {"foo": "bar"}.

It may seem trivial, but it turns out that processing millions of conversions like this takes time.
Table scan (or query) delivers thousands of items, and each items often has dozens of fields, among which there may be compound fields (arrays, mapping), and so, in the end, there are simply too many operations to perform at Python level.

from aiodynamo.

Tinche avatar Tinche commented on May 27, 2024

@dimaqq I don't know how to optimize the JSON format Dynamo expects, but that's another issue. I do know ujson is significantly faster than Python's JSON, and there's a good chance it can be used in place with zero changes. So that's a free win.

from aiodynamo.

dimaqq avatar dimaqq commented on May 27, 2024

There's a set of benchmarks in this project, some don't require a dynamo server: https://github.com/HENNGE/aiodynamo/tree/master/benchmarks/deserialize

@Tinche would you like to run these against the main branch and against a temporary branch that uses a different library and post the results?

from aiodynamo.

Tinche avatar Tinche commented on May 27, 2024

@dimaqq Sure, sounds like a plan.

from aiodynamo.

Tinche avatar Tinche commented on May 27, 2024

I went with orjson in this case. I actually adapted the signing benchmark, since that's the one that touches JSON.

> pyperf compare_to before.json after.json
Mean +- std dev: [before] 71.6 us +- 1.7 us -> [after] 58.1 us +- 1.1 us: 1.23x faster

Seems worthwhile to me.

from aiodynamo.

dimaqq avatar dimaqq commented on May 27, 2024

the signing benchmark is valid (e.g. if someone loads a lot of data into dynamo), but not the most important, in my opinion.

the important bit is deserialisation, because that is a bottleneck in query/scan operations that cannot be trivially parallelised.

the deserialise benchmark doesn't even use json - rather it's about python code mangling aws-style dicts to python-style dicts. thus the important benchmark will be query. and that needs to be ran against well-provisioned cloud dynamo :pain:

(alternatively, we could set up a dummy http server that returns precooked json responses... we don't have that now, but it could be done).

I'll see if I can run some benchmarks 🤔

from aiodynamo.

dimaqq avatar dimaqq commented on May 27, 2024

Early results:

> pyperf compare_to aiodynamo_mock-without-orjson.json aiodynamo_mock-with-orjson.json
Mean +- std dev: [aiodynamo_mock-without-orjson] 108 ms +- 5 ms -> [aiodynamo_mock-with-orjson] 110 ms +- 5 ms: 1.02x slower

basically there's no difference for query or scan which are dominated by serialisation.
(tested with mock http client and ~3MB response)

from aiodynamo.

Tinche avatar Tinche commented on May 27, 2024

I mean yeah, I guess if you use a 3 megabyte payload the majority of the time is spent elsewhere. The vast majority of my payloads are much, much smaller though, so JSON decoding does play a role.

We can talk about optimizing stuff in https://github.com/HENNGE/aiodynamo/blob/963a6baecb7782fb5820179e2ec0c041a527d02e/src/aiodynamo/utils.py in another ticket. I might be able to help shave off some microseconds on some of these, which adds up.

from aiodynamo.

dimaqq avatar dimaqq commented on May 27, 2024

I imagine the network latency would dominate for small payloads.

Consider that AWS built-in monitoring: server-side get item latency is typically 4~15ms; and lower than 1ms is not even reported.

from aiodynamo.

Tinche avatar Tinche commented on May 27, 2024

Sure, but I can't do anything about network latency. Whereas if my JSON processing is a little more efficient, it means there's a little less latency for the endpoint, and my asyncio service is free to dedicate more time to another request sooner. I find after running asyncio at scale CPU time is a very precious resource.

Look, I'm not going to pretend this is going to be a major speedup. I am going to claim it's a very minor win that's essentially free. If you folks feel that's not the case (either it's not a win at all or it's not essentially free) I can respect that decision, it's your call after all :)

I can tell you personally, the first thing I do for our services is look at replacing the use of the stdlib json with something else, since it's very easy to do and does provide a small speedup. (Unless the service is running Pypy, that's another story.)

from aiodynamo.

dimaqq avatar dimaqq commented on May 27, 2024

I think this is pretty good now.

from aiodynamo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.