It seems that resharding is handled incorrectly up to v1.0.0. The relevant source code

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

About the change of error, see <a href="https://github.com/Grokzen/redis-py-cluster/bl

Can you share the test you have been running in a gist that i can access? Also

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Client fails during live resharding about redis-py-cluster HOT 13 CLOSED

grokzen commented on May 25, 2024

Client fails during live resharding

from redis-py-cluster.

Comments (13)

Grokzen commented on May 25, 2024

@milyenpabo Have you looked at the unstable branch and the code in there if the logic is the same?

from redis-py-cluster.

milyenpabo commented on May 25, 2024

@Grokzen As far as I can tell from the source, it has the same issue. I also ran a search in the repo for the ASKING command, but did not find anything relevant. Strangely, exceptions.py on the unstable branch has a new AskError class that describes the correct mechanism in a comment, but I found no other clues.

Do I miss something?

from redis-py-cluster.

Grokzen commented on May 25, 2024

@milyenpabo I think the client can handle ASK errors now after i rewrote the code to use a better parser https://github.com/Grokzen/redis-py-cluster/blob/unstable/rediscluster/client.py#L260 but it could be that ASKING command is not sent when that happens.

from redis-py-cluster.

milyenpabo commented on May 25, 2024

@Grokzen That's right, the exception is caught but ASKING is not sent, so in effect the ping-pong issue is still there. The difference in the unstable version is that it throws a different type of exception than in v1.0.0:

Traceback (most recent call last):
  File "redis-writer.py", line 33, in <module>
    r.set(d, d)
  File "build/bdist.linux-x86_64/egg/redis/client.py", line 1055, in set
  File "build/bdist.linux-x86_64/egg/rediscluster/utils.py", line 87, in inner
  File "build/bdist.linux-x86_64/egg/rediscluster/client.py", line 266, in execute_command
rediscluster.exceptions.ClusterError: TTL exhausted.

from redis-py-cluster.

Grokzen commented on May 25, 2024

@milyenpabo I will dig some during the weekend and if all goes well i might have a patch that adds this feature if it needs to be there.

The only thing is that i have never seen commands or client that breaks when it is running during a live reshard. Both me and @72squared have very recently had a test script (with threads) that performs operations during a reshard and after our fixes (thread related) we have not yet seen any problems.

Have you seen a acctual problem/exception somehow when you have been running a reshard that we can use to reproduce the problem, or this more a "it should comply to the docs" issue?

from redis-py-cluster.

Grokzen commented on May 25, 2024

About the change of error, see https://github.com/Grokzen/redis-py-cluster/blob/unstable/docs/Upgrading.md#100----next-release where it states that the error was changed due to the old one was not so descriptive.

from redis-py-cluster.

Grokzen commented on May 25, 2024

Can you share the test script you have been running in a gist that i can access? Also a short description about the redis-cluster you are running, number of nodes, locally/cloud/docker? How much data do you have in the server and how many nodes are you moving around? With that i might be able to reproduce the error.

from redis-py-cluster.

milyenpabo commented on May 25, 2024

@Grokzen I think the current implementation does not comply with the redis cluster specification. But anyhow, I can provide a test case. I have a redis-writer.py script that writes keys to a redis cluster (v3.0.4) with 3 masters and 1 replica per master. (See gist.)

from rediscluster import RedisCluster

startup_nodes = [{"host": "127.0.0.1", "port": "27002"}]
r = RedisCluster(startup_nodes=startup_nodes, max_connections=32, decode_responses=True)

N = 1000000
p = 0
pdiff = 1

print "Writing %d keys to redis" % N

for i in xrange(N):
    d = str(i)
    r.set(d, d)
    progress = 100.0*i/N
    if progress >= p:
        print "%.0f%%" % progress,
        p += pdiff
if progress < 100:
    print "100%"

First I run the script to populate redis:

python -u redis-writer.py

Then I re-run the script while resharding redis with the following command:

./redis-trib.rb reshard --from 412072cda05b27485348f3b83a5b5b09ba01b1ce --to 2f51725948ca46649048a05f4c4ac0bfa605bad5 --slots 2000 --yes 127.0.0.1:27002

If the number of slots to migrate (in my case 2000) and the number of keys/slot (in my case ~60) is large enough, the script will fail with a probability close to 100%. The above mentioned exception is thrown when the writer script hits a key that was already migrated, but the slot containing the key is still under migration. (For explanation see the issue-opening post.)

Please note that the error is not triggered deterministically. You probably did not see this happening because in your tests you did not hit an already moved key in a slot just under migration. So depending on your setup, you might need to re-run the test several times or tweak the parameters (number of slots to reshard and/or the number of keys/slot, i.e. the N variable in the script).

from redis-py-cluster.

Grokzen commented on May 25, 2024

very nice. I will play with it some during the weekend and see what i can come up with.

from redis-py-cluster.

Grokzen commented on May 25, 2024

@milyenpabo Do you think that only this will be enough? 4bf8e51

I am not really sure how to verify it works as intended more then that it do not break anything and your script still works as expected during the reshard operation. I do not however see the exception you got after that fix has been applied.

It feels stable enough for me to merge this.

from redis-py-cluster.

Grokzen commented on May 25, 2024

@milyenpabo I think the fix i pushed to unstable branch have fixed the problem. I compared to jedis so the logic would look the same and it does. If you think there is still problems with the implementation, you can reopen this issue and i will take another look at the problem.

from redis-py-cluster.

milyenpabo commented on May 25, 2024

@Grokzen I think that the fix is OK. I also made some tests, and now redirection seems to work fine. Thanks!

from redis-py-cluster.

Grokzen commented on May 25, 2024

@milyenpabo Another fix was just commited because the original implementation was not enough. It was found that it was still failing on some ASK errors. See commit 199ee0b

from redis-py-cluster.

Client fails during live resharding about redis-py-cluster HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent