Comments (28)
@72squared I have been thinking about refactoring pipeline object to do almost the same thing by storing all commands and then when executing it calculate what key/command should go where and then send them one after another. This would make them silimar to the normal implementation of pipelines compared to my first version i have in the code right now.
I like your suggestion about improving the MGET, MSET, MSETNX commands but please make a packing PR before you implement the threading pool and do that as another PR?
I look forward to your solution :]
from redis-py-cluster.
Yeah splitting it out into 2 work items makes sense. parallel execution and threading can come later. I'll need to study the code a bit more before attempting anything. The tricky thing will be trying to correctly handle any commands where the server replies with MOVED.
from redis-py-cluster.
Ye, it will be hard to work out all issues :] this is uncharted territory so getting it right at first try will be hard :D
from redis-py-cluster.
Any progress on getting unit tests into the repo? I'd like to start work on the packed commands in pipelining but I prefer to do test driven development or at least have some test infrastructure in place to verify I'm not breaking things.
If porting the redis-py tests is a non-starter I'd be willing to write some tests specifically for redis-py-cluster. If you think about it, a lot of the corner cases covered by the redis-py tests address issues and regressions that are unlikely to spring up in redis-py-cluster. The bulk of the logic is still in redis-py so if redis-py works, so will redis-py-cluster.
Porting over the tests in redis-py might be a good start but it could probably be trimmed down significantly to those tests that would highlight issues relating to key routing operations and pipelining.
from redis-py-cluster.
@72squared Ye i asked andy and there was no problem of taking the test code from reids-py and integrate it into cluster lib. Currently it is just that i must have time to get around to do it :] work and IRL stuff and all that in the way.
I have the code finished and i only have to do minor cleanup before i can clean it up.
If you want start before i commit my test code just create a new test file in tests/ folder and in your setUp() just create a new RedisCluster object and use it in your test. I use py.test to run them. Nothing special or config.
I can clean up your code to my testing structure post PR if you want so that is no problems for me :]
from redis-py-cluster.
K. sounds good. Thanks!
from redis-py-cluster.
I am starting work on this issue now and have a general theory of how it should work, but it's complicated by commands like MGET and DELETE that operate on multiple keys. I'll probably break each key out into individual commands that can be packed per host and re-aggregate them in the response, but it makes programming them slightly more difficult. Hope to have a patch for you in the next week or so (I'm also going on vacation so I won't be around my computer as much then).
from redis-py-cluster.
Sounds good :] I have some ideas also on how to make it happen but i am looking forward to see your solution.
from redis-py-cluster.
Here's a first pass implementation of pipelining.
https://github.com/happybits/redis-py-cluster/tree/pipe-exp01
I didn't submit a pull request yet because I am not satisfied with a couple things still, but tox passes and it does pack all the commands correctly by grouping commands that should be routed to each node.
I'm trying to avoid having to have multiple spots in the code where we figure out MOVED and ASKED. Right now the way I do it is to make all commands single or multiple go through the same block of code:
https://github.com/happybits/redis-py-cluster/blob/pipe-exp01/rediscluster/rediscluster.py#L274-L358
But I'm thinking instead of breaking up this monolithic function up into reusable parts and then single commands can use parts of it without any code duplication.
Feedback is appreciated.
from redis-py-cluster.
I will look through this in the next few days and get back with my comments.
from redis-py-cluster.
One thing I experimented with is bypassing using the pipeline object at all here:
https://github.com/happybits/redis-py-cluster/blob/pipe-exp01/rediscluster/rediscluster.py#L311
Instead I got the connection object from the connection pool for each node and packed the commands and sent them directly. It worked and seemed more efficient, but unfortunately the Connection object's api changes between different versions of redis-py so I couldn't find a way to do it that would pass all the tox tests and work with the latest version of redis-py from github.
from redis-py-cluster.
Okey after a quick look at the code my first impression is that i do not really like that you have combined the normal send_cluster_command with code a pipeline implementation. BasePipeline class have almost nothing more to do then to then to stack up all commands and when ready fire them off.
If you look how pipelines are done at redis-py almost all pipeline specific handling is done within the pipeline methods, like executing the pipeline handles almost everything and i would really like to see (if it is possible ofc) this new code moved into the BasePipeline class. This maybe will require some rewrite of send_cluster_command() to handle both cases correctly and so on.
But take this with abit of salt because it is just a quick look at it and i really have to get to a whiteboard and draw everything up to really see how everything could/should work. It is a great start and GJ for making the tests pass :]
from redis-py-cluster.
Yeah, I agree about keeping pipeline logic in the pipeline class. My first design goal was to not duplicate the MOVED keyslot logic but the answer there is to break that up into a reusable chunk of code that can be called by the rediscluster class and the pipeline class.
I also spotted a flaw in my logic on MOVED. right now I batch a bunch of keys together by mapping keyslots to nodes and then packing the commands for a node together. But when I get a MOVED for an individual key, I am not breaking up the commands for just the moved keys to put only those commands into the retry loop, so what I have actually doesn't really work. I'll attempt to address that first.
After I get that logic flaw fixed, I'll try to refactor to move pipeline logic into the appropriate class.
from redis-py-cluster.
When you are doing the retry will you wait for response from all nodes and then check/gather all keys that have MOVED as response and do another pipeline pack/send retry or will you do just a plain send for each separate key that have MOVED as response?
from redis-py-cluster.
Have you given it some thought how you are going to solve the problem of overriden methods in RedisCluster class? For example 'rpoplpush' that have a custom impl and that do 2 calls in sequence where the second call relies on the first call to be done and if you would try that one in your pipeline solution i do not think it will work because you put the command it wants to execute on the command stack and do not executes until later time.
One solution i can think of is instead of storing the method + command arguments that should be called you store some info about what method to call and with what args and in the pipeline solution just call it when you get to that command in the command stack.
One problem with that approach is that you probably have to make multiple pipeline objects because if you have the case that you want to execute the following command [set --> get --> rpoplpush --> set --> get](For whatever reason O.o) you must first of all preserve the calling chain, you can't extract the rpoplpush command and call it first/last and group all the other commands into 1 pipeline because that will break the intended order the caller wanted to execute them. I guess in that case you need to make this object structure (Pipeline(set, get) --> rpoplpush --> Pipeline(set, get)) and call them sequentally.
Another solution is to create some callback system but I think that will be to complex.
Also what came to my mind is that we have to atleast write tests for the overriden methods in RedisCluster that tries to call them via a pipeline object but it would almost be best if we had test for all commands in via pipeline to ensure it works correctly. It maybe possible to do parameterized fixtures for a normal RedisCluster object and one Pipeline object.
from redis-py-cluster.
Regarding MOVED, yeah, I think I'll wait for response on all the keys from the first pass (it will all go in parallel some day instead of serial order per node) then collect all the keys that have moved and do another pass only with those keys that have moved and interweave those into the final response.
from redis-py-cluster.
Regarding the overridden methods which are no longer atomic and are made up of multiple commands, callbacks that aggregate the results of a multi-command step and chained pipelines as you described seems like the best way forward. I'm not quite prepared to tackle that yet. Want to get something working first. But yeah, eventually that's how it would need to be.
from redis-py-cluster.
Agreed on needing tests for pipelining the overridden commands before this patch goes upstream so we establish the correct behavior. I'll probably work on that patch next in a separate branch. Correct behavior comes first, then performance improvements.
from redis-py-cluster.
here's another experimental shot at doing packed pipeline execution.
https://github.com/happybits/redis-py-cluster/compare/packed-pipeline
Need to still handle connection errors but I think this is pretty close. At some point I'd still like to do parallel execution of commands against different nodes but that can come later.
I also need to add some tests to verify the ask and moved exceptions are handled correctly.
from redis-py-cluster.
Fixed a bug and added a test. Need to test moved and ask still, and adde connection error handling.
from redis-py-cluster.
After packed pipeline is complete I want to work on adding optional support for the pipeline class to issue requests to multiple nodes in parallel. I should be able to do this with greenlets but I'll only enable it if gevent is installed, otherwise each pipeline request will go to each node sequentially.
Does that sound like a reasonable plan?
from redis-py-cluster.
It sounds like a plan, but i have a few requirements i want to be added so we are on the same page :]
- It must be possible to enable/disable parralell execution at any time in the Pipeline object via some variable.
- It should be possible to set this value when creating a pipeline object via
def pipeline()
- If gevent is not installed then it should default to sequentially like you said yourself
- Try if possible to handle that code in its own function/code path so that it can be easy to move around because i have plans to do a code restructure and move some functionality around to other classes in the future. Also this could open up for other types of parralell execution methods if you do not want or can't use gevent.
from redis-py-cluster.
Agreed on all the requirements you listed above.
Regarding other types of parallel execution methods, I could see possibility for using built-in python threads or a thread pool as well.
from redis-py-cluster.
@72squared Any updates to this issue? I have started to work on major refactoring for 0.2.0 so unless you want to port your changes into the refactoring i will need them soonish :]
from redis-py-cluster.
@72squared You can see the new code in the PR for #40
from redis-py-cluster.
I'll apply changes post refactor, most likely.
from redis-py-cluster.
This issue has been addressed by these patches that were accepted for the 0.2.0 target release:
Closing.
from redis-py-cluster.
Good :]
from redis-py-cluster.
Related Issues (20)
- archive repo since redis-py is cluster compatible HOT 6
- Redis Cluster cannot be connected. Please provide at least one reachable node. when connecting to cluster HOT 2
- Support of Redis v4.1 HOT 1
- get wrong return value "OK" from redis HOT 1
- how to use ACL in redis cluster client HOT 1
- TypeError: unsupported operand type(s) for +: 'int' and 'str' HOT 1
- version 2.1.3 raise rediscluster.exceptions.MovedError when connect redis cluster HOT 4
- Is need to connection pool disconect? HOT 2
- ClusterWithReadReplicasConnectionPool with read_from_replicas causes dump. HOT 1
- KeyError:636 -> BaseException HOT 2
- Support for PSYNC HOT 1
- rediscluster.exceptions.ClusterError: TTL exhausted HOT 4
- The first node of cluster can not connect HOT 2
- Is there any params for RedisCluster __init__, to let redis know when did idle client connection should be kicked out HOT 3
- ImportError: cannot import name 'SlotNotCoveredError' from 'rediscluster.exceptions' HOT 1
- Turn off vervose logging HOT 4
- Best practice for deletion from Redis HOT 2
- Please update so that it can be installed on Python 3.11 HOT 2
- use redis-py-cluster unable to connect to redis cluster in python 2.7 HOT 2
- is ClusterBlockingConnectionPool work in gevent? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from redis-py-cluster.