I am just starting to get a better understanding of the code, but it seems so far this

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Here's a first pass implementation of pipelining. <a href="https://g

Parallel execution of packed pipelined commands against multiple nodes about redis-py-cluster HOT 28 CLOSED

grokzen commented on May 25, 2024

Parallel execution of packed pipelined commands against multiple nodes

from redis-py-cluster.

Comments (28)

Grokzen commented on May 25, 2024

@72squared I have been thinking about refactoring pipeline object to do almost the same thing by storing all commands and then when executing it calculate what key/command should go where and then send them one after another. This would make them silimar to the normal implementation of pipelines compared to my first version i have in the code right now.

I like your suggestion about improving the MGET, MSET, MSETNX commands but please make a packing PR before you implement the threading pool and do that as another PR?

I look forward to your solution :]

from redis-py-cluster.

72squared commented on May 25, 2024

Yeah splitting it out into 2 work items makes sense. parallel execution and threading can come later. I'll need to study the code a bit more before attempting anything. The tricky thing will be trying to correctly handle any commands where the server replies with MOVED.

from redis-py-cluster.

Grokzen commented on May 25, 2024

Ye, it will be hard to work out all issues :] this is uncharted territory so getting it right at first try will be hard :D

from redis-py-cluster.

72squared commented on May 25, 2024

Any progress on getting unit tests into the repo? I'd like to start work on the packed commands in pipelining but I prefer to do test driven development or at least have some test infrastructure in place to verify I'm not breaking things.

If porting the redis-py tests is a non-starter I'd be willing to write some tests specifically for redis-py-cluster. If you think about it, a lot of the corner cases covered by the redis-py tests address issues and regressions that are unlikely to spring up in redis-py-cluster. The bulk of the logic is still in redis-py so if redis-py works, so will redis-py-cluster.

Porting over the tests in redis-py might be a good start but it could probably be trimmed down significantly to those tests that would highlight issues relating to key routing operations and pipelining.

from redis-py-cluster.

Grokzen commented on May 25, 2024

@72squared Ye i asked andy and there was no problem of taking the test code from reids-py and integrate it into cluster lib. Currently it is just that i must have time to get around to do it :] work and IRL stuff and all that in the way.

I have the code finished and i only have to do minor cleanup before i can clean it up.

If you want start before i commit my test code just create a new test file in tests/ folder and in your setUp() just create a new RedisCluster object and use it in your test. I use py.test to run them. Nothing special or config.

I can clean up your code to my testing structure post PR if you want so that is no problems for me :]

from redis-py-cluster.

72squared commented on May 25, 2024

K. sounds good. Thanks!

from redis-py-cluster.

72squared commented on May 25, 2024

I am starting work on this issue now and have a general theory of how it should work, but it's complicated by commands like MGET and DELETE that operate on multiple keys. I'll probably break each key out into individual commands that can be packed per host and re-aggregate them in the response, but it makes programming them slightly more difficult. Hope to have a patch for you in the next week or so (I'm also going on vacation so I won't be around my computer as much then).

from redis-py-cluster.

Grokzen commented on May 25, 2024

Sounds good :] I have some ideas also on how to make it happen but i am looking forward to see your solution.

from redis-py-cluster.

72squared commented on May 25, 2024

Here's a first pass implementation of pipelining.

https://github.com/happybits/redis-py-cluster/tree/pipe-exp01

I didn't submit a pull request yet because I am not satisfied with a couple things still, but tox passes and it does pack all the commands correctly by grouping commands that should be routed to each node.

I'm trying to avoid having to have multiple spots in the code where we figure out MOVED and ASKED. Right now the way I do it is to make all commands single or multiple go through the same block of code:

https://github.com/happybits/redis-py-cluster/blob/pipe-exp01/rediscluster/rediscluster.py#L274-L358

But I'm thinking instead of breaking up this monolithic function up into reusable parts and then single commands can use parts of it without any code duplication.

Feedback is appreciated.

from redis-py-cluster.

Grokzen commented on May 25, 2024

I will look through this in the next few days and get back with my comments.

from redis-py-cluster.

72squared commented on May 25, 2024

One thing I experimented with is bypassing using the pipeline object at all here:
https://github.com/happybits/redis-py-cluster/blob/pipe-exp01/rediscluster/rediscluster.py#L311
Instead I got the connection object from the connection pool for each node and packed the commands and sent them directly. It worked and seemed more efficient, but unfortunately the Connection object's api changes between different versions of redis-py so I couldn't find a way to do it that would pass all the tox tests and work with the latest version of redis-py from github.

from redis-py-cluster.

Grokzen commented on May 25, 2024

Okey after a quick look at the code my first impression is that i do not really like that you have combined the normal send_cluster_command with code a pipeline implementation. BasePipeline class have almost nothing more to do then to then to stack up all commands and when ready fire them off.

If you look how pipelines are done at redis-py almost all pipeline specific handling is done within the pipeline methods, like executing the pipeline handles almost everything and i would really like to see (if it is possible ofc) this new code moved into the BasePipeline class. This maybe will require some rewrite of send_cluster_command() to handle both cases correctly and so on.

But take this with abit of salt because it is just a quick look at it and i really have to get to a whiteboard and draw everything up to really see how everything could/should work. It is a great start and GJ for making the tests pass :]

from redis-py-cluster.

72squared commented on May 25, 2024

Yeah, I agree about keeping pipeline logic in the pipeline class. My first design goal was to not duplicate the MOVED keyslot logic but the answer there is to break that up into a reusable chunk of code that can be called by the rediscluster class and the pipeline class.

I also spotted a flaw in my logic on MOVED. right now I batch a bunch of keys together by mapping keyslots to nodes and then packing the commands for a node together. But when I get a MOVED for an individual key, I am not breaking up the commands for just the moved keys to put only those commands into the retry loop, so what I have actually doesn't really work. I'll attempt to address that first.

After I get that logic flaw fixed, I'll try to refactor to move pipeline logic into the appropriate class.

from redis-py-cluster.

Grokzen commented on May 25, 2024

When you are doing the retry will you wait for response from all nodes and then check/gather all keys that have MOVED as response and do another pipeline pack/send retry or will you do just a plain send for each separate key that have MOVED as response?

from redis-py-cluster.

Grokzen commented on May 25, 2024

Have you given it some thought how you are going to solve the problem of overriden methods in RedisCluster class? For example 'rpoplpush' that have a custom impl and that do 2 calls in sequence where the second call relies on the first call to be done and if you would try that one in your pipeline solution i do not think it will work because you put the command it wants to execute on the command stack and do not executes until later time.

One solution i can think of is instead of storing the method + command arguments that should be called you store some info about what method to call and with what args and in the pipeline solution just call it when you get to that command in the command stack.

One problem with that approach is that you probably have to make multiple pipeline objects because if you have the case that you want to execute the following command [set --> get --> rpoplpush --> set --> get](For whatever reason O.o) you must first of all preserve the calling chain, you can't extract the rpoplpush command and call it first/last and group all the other commands into 1 pipeline because that will break the intended order the caller wanted to execute them. I guess in that case you need to make this object structure (Pipeline(set, get) --> rpoplpush --> Pipeline(set, get)) and call them sequentally.

Another solution is to create some callback system but I think that will be to complex.

Also what came to my mind is that we have to atleast write tests for the overriden methods in RedisCluster that tries to call them via a pipeline object but it would almost be best if we had test for all commands in via pipeline to ensure it works correctly. It maybe possible to do parameterized fixtures for a normal RedisCluster object and one Pipeline object.

from redis-py-cluster.

72squared commented on May 25, 2024

Regarding MOVED, yeah, I think I'll wait for response on all the keys from the first pass (it will all go in parallel some day instead of serial order per node) then collect all the keys that have moved and do another pass only with those keys that have moved and interweave those into the final response.

from redis-py-cluster.

72squared commented on May 25, 2024

Regarding the overridden methods which are no longer atomic and are made up of multiple commands, callbacks that aggregate the results of a multi-command step and chained pipelines as you described seems like the best way forward. I'm not quite prepared to tackle that yet. Want to get something working first. But yeah, eventually that's how it would need to be.

from redis-py-cluster.

72squared commented on May 25, 2024

Agreed on needing tests for pipelining the overridden commands before this patch goes upstream so we establish the correct behavior. I'll probably work on that patch next in a separate branch. Correct behavior comes first, then performance improvements.

from redis-py-cluster.

72squared commented on May 25, 2024

here's another experimental shot at doing packed pipeline execution.

https://github.com/happybits/redis-py-cluster/compare/packed-pipeline

Need to still handle connection errors but I think this is pretty close. At some point I'd still like to do parallel execution of commands against different nodes but that can come later.

I also need to add some tests to verify the ask and moved exceptions are handled correctly.

from redis-py-cluster.

72squared commented on May 25, 2024

Fixed a bug and added a test. Need to test moved and ask still, and adde connection error handling.

from redis-py-cluster.

72squared commented on May 25, 2024

After packed pipeline is complete I want to work on adding optional support for the pipeline class to issue requests to multiple nodes in parallel. I should be able to do this with greenlets but I'll only enable it if gevent is installed, otherwise each pipeline request will go to each node sequentially.

Does that sound like a reasonable plan?

from redis-py-cluster.

Grokzen commented on May 25, 2024

It sounds like a plan, but i have a few requirements i want to be added so we are on the same page :]

It must be possible to enable/disable parralell execution at any time in the Pipeline object via some variable.
It should be possible to set this value when creating a pipeline object via def pipeline()
If gevent is not installed then it should default to sequentially like you said yourself
Try if possible to handle that code in its own function/code path so that it can be easy to move around because i have plans to do a code restructure and move some functionality around to other classes in the future. Also this could open up for other types of parralell execution methods if you do not want or can't use gevent.

from redis-py-cluster.

72squared commented on May 25, 2024

Agreed on all the requirements you listed above.
Regarding other types of parallel execution methods, I could see possibility for using built-in python threads or a thread pool as well.

from redis-py-cluster.

Grokzen commented on May 25, 2024

@72squared Any updates to this issue? I have started to work on major refactoring for 0.2.0 so unless you want to port your changes into the refactoring i will need them soonish :]

from redis-py-cluster.

Grokzen commented on May 25, 2024

@72squared You can see the new code in the PR for #40

from redis-py-cluster.

72squared commented on May 25, 2024

I'll apply changes post refactor, most likely.

from redis-py-cluster.

72squared commented on May 25, 2024

This issue has been addressed by these patches that were accepted for the 0.2.0 target release:

#35
#46
#51

Closing.

from redis-py-cluster.

Grokzen commented on May 25, 2024

Good :]

from redis-py-cluster.

Parallel execution of packed pipelined commands against multiple nodes about redis-py-cluster HOT 28 CLOSED

Comments (28)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent