Add another parameter to specify node ID in CLUSTER NODES instead of using domains as the prefix of domains may be the same.

Limit concurrent running commands and the number of connections

FutureGroupHandle should signal in `drop` function.

Current FutureGroupHandle implementation requires it should be the outmost future directly fed into tokio::spawn.
To support nested group, we need to send the signal in drop function of FutureGroupHandle.

New Migration Process through SCAN command

Recent Migration Limitations

There are two drawbacks for the current migration process based on the current approach - trigger replication between the source Redis and destination Redis:

We can only perform a double scale because multiple FULL SYNC to a Redis node can't preserve all the data - the latter one will remove some of the right data transferred by the formers.
We have to wait for a fixed long time to HOPE that the replication between two Redis nodes is done because there's no way to know it. (Well, actually we can perform a block and check method).

Solutions

There are two solutions:

(1) Implement the Redis Replication Protocol which consists of two parts:
- RDB parsing (hard and time-consuming to maintain)
- Replication Network Protocol (not that hard but feasible)
(2) Use SCAN, DUMP, RESTORE to mimic the replication.

I don't want to use the first solution since there are just too many works for the RDB parsing. And we need to keep updating the codes as the RDB format changes.

The second one should be able to be compatible with the future versions of Redis.

The SCAN command has a great property that it can guarantee that all the keys set before the first SCAN command will finally be returned, for multiple times sometimes though. We can perform a 3 stage migration to mimic the replication.

Wait for all the commands to be finished by Redis.
Start the scanning and forward the data to peer Redis.
Redirect all the write operation after the first SCAN to peer Redis.

But it also has some problems:

It will have some impacts on the latency while scanning the data. But that could be tunable.
If a large collection type key gets updated frequently, we need to store the large data from DUMP again and again, which could result in a large amount of memory.

Detailed Steps for Scanning

Redirect all the later commands to Queue_block1.
Wait for all the existing commands to be finished. We can maintain a counter for running commands to achieve that.
Start to send SCAN, DUMP to get the data and add RESTORE command to Queue1. When SCAN is done, mark the Queue1 as SENT_FINISHED.
When the first SCAN command gets the reply, release all the commands in Queue_block1.
Let all the commands in Queue_block1 and the commands after the first SCAN to another send function which for all the write requests, do the following with Lua script to support atomic:
- apply the write operation
- get the latest version of the key by DUMP and forward the new data to Queue2
Forward all the RESTORE commands in Queue1. Once the Queue1 is set SENT_FINISHED and is empty, start to forward the RESTORE commands in Queue2.
When Queue1 is set SENT_FINISHED and is empty, start to block the commands in Queue_block2, wait for all the commands in Queue2 to be sent.
Migration source proxy commits the process with destination proxy. Destination starts to handle new slots. Source proxy starts to redirect the migrated slots to destination proxy.
Release the commands in Queue_block2. Redirect all the keys inside migrated slots to destination proxy.
INFOMGR starts to return success.
Wait for the Coordinator to commit the migration.
Delete the migrated out data in source proxy.

Add compatibility tests

We can play the same commands script to both undermoon and redis and see whether the result is the same.

Let coordinator support multiple brokers and dynamic configuration.

Support multiple brokers for
- using multiple brokers and serving multiple clusters
- broker backup
Support dynamic configuration through Redis api

Support blocking command

Since #81
complex commands are much easier to implement than before.
We can implement blocking commands by transforming them to non-blocking commands.

For example, BLPOP can be implemented by keeping calling LPOP until it does not return Nil.

Need to detect LOADING redis before committing the migrations.

After migration, we might get this error:
undermoon::common::resp_execution] error reply: LOADING Redis is loading the dataset in memory

Report server proxy epoch to broker

The metadata storage needs to know the real epoch for server proxy for the following reasons:

For scaling down, the broker needs to know when the latest metadata have already been synchronized to all the related server proxies to safely remove server proxies out of the cluster.
The metadata storage needs to know all the up-to-date server proxies to provide an API to query all the ready-to-use server proxy endpoints.
The metadata storage can use real epochs to detect inconsistent metadata.

Amend setting role.

Now the replicator module just keeps sending SLAVEOF command to the backend Redis, resulting in the following log in Redis triggered again and again:

REPLICAOF would result into synchronization with the master we are already connected with. No operation performed.

Maybe we need to check whether the role is incorrect. But the address in ROLE is replica-announce-ip. We need to use CONFIG GET to get the replica-annoucne-ip first from the peer master.

Simplify mem_broker

Only support one cluster in mem_broker.

Let SETREPL trigger SLAVEOF command to redis directly

Now SETREPL only set out an asynchronous task to periodically send SLAVEOF command to Redis.
This could result in a short time when a promoted master is still a slave but need to serve the requests.
We need to trigger SLAVEOF once directly before SETREPL returns.

Move compression inside DatabaseMap

Move compression inside DatabaseMap to suppress the warning for replica server proxy

failed to get config from {}. Use default config.

Sync cluster config in coordinator

Only one cluster in a server proxy

The original design of undermoon is to support multiple logical clusters in a single server-side proxy to support multi-tenant. Now it turns out to be a bad idea for the following reasons:

maintaining metadata of multiple clusters is not easy especially when it comes to migration states.
the rolling upgrade could be more difficult as it affects multiple clusters at the same time.

I better remove the support for multiple logical clusters.

Track spawned futures

Build a future wrapper to track spawned futures to detect future leak like goroutine leak in Golang.

v0.3 Roadmap

The main change in v0.3 will be broker API that will break compatibility. It will be done with overmoon v0.2

Manage freed nodes

After the replicas get freed, they are not changed to master and keep and replication.
The data inside the freed nodes are not cleared.

Should it be managed by the coordinator and server_proxy?

chashmap could panic

https://github.com/redox-os/tfs/blob/master/chashmap/src/lib.rs#L270 might panic

Migration use wrong offset field.

Now we use lag from INFO to determine whether the replication has finished, which is wrong. We should use master_repl_offset and offset of each replica.

Rename Host to Proxy

At the first time, I think we will only deploy one proxy per host. So host and proxy are the same and are used interchangeably in the codes and API.
Now to support some clients and redis cluster proxies which do not support AUTH command for the backend clusters, we need to deploy multiple proxies in the same machine to support multiple tenants.
I have changed the API in #21.
Later we need to change the host in code to proxy.

v0.2 Roadmap

Undermoon v0.2 focuses on supporting arbitrary slot migration, whose key functionality has been done in #65

There are still 2 problems need to solve:

(1) ~~The migration speed is too slow - around 450 keys per second~~ Now 4000 keys per second, which could also be tunable.
(2) ~~The blocking phase to ensure consistency is not implemented yet.~~

The first one needs the Redis pipeline to increase the throughput. The second one needs complex synchronization. Both of them are not easy to implement without async and await.

Thus, in v0.2 Undermoon will change to futures-rs 0.3.

v0.2 Roadmap

Move to the new future api.
Optimize Resp by only storing the index in the data to eliminate data copy.
Let CmdTask support multiple commands as a single request.
Refactor api from executor to the backend sender. Make then returns a Pin<Box> for future functionality such as MGET and blocking command.
~~More unit tests~~. moved to v0.3
~~More docs~~. moved to v0.3

Optimization

Change RwLock on server proxy meta to
- (1) use mutex lock on write operation and use atomic pointer for read and write
- (2) or Use CAS with https://github.com/vorner/arc-swap
Batch sendto syscall.
Optimize memory copy
Let Resp objects just store the index of the Redis packet to reduce memory allocation.

Support scaling down

AUTH not cleaned up when recreating new cluster.

When a server_proxy deleted an old cluster and create a new one, the existing connections are still tagged the old cluster name, which results in db not found error.

Make commands that go in the slow path in migration configurable.

Returns peers proxies in GET /api/proxies/meta/<server_proxy_address>

Now, get_peer in coordinator use separated HTTP calls to get the peer server proxies.

get the cluster name from host metadata
get the cluster metadata

This could lead to inconsistent data.

We should return the metadata of peer proxies in /api/proxies/meta/<server_proxy_address> directly.

Support External Storage for Memory Broker

Maybe we can use etcd or zookeeper as external storage for the case that the stored data are not large.

False Negative Failure After Recovering Proxy

When proxies are tagged failed and recover again, the client pool in coordinator might get a stale connection and fails to send PING, which cause false negative failure report.
This is confusing but could be fine. Might be fixed later.

Migration could potentially recover deleted keys

Since during the key migration, a key could be written from source shard to destination shard for multiple times, a key deleted by users could be recover again.

The overall process is:

The key gets migrated to the destination by the RESTORE command.
Users delete the key.
Key gets migrated again using the RESTORE command since SCAN could generate the key for multiple times or the first migration is triggered actively by the destination shard. Then the deleted key recovers.

Description problem

Hi, I don't see any special use of this agent from the documentation.Can you explain it to me carefully?
email: [email protected]

Remove atomic operations in slowlog

Return error for unsupported commands.

Support Compression for String Type

Support dynamic configuration for migration parameters

Compress metadata

When metadata of a large cluster are synchronized from HTTP broker to coordinator, from coordinator to server proxy, they may need to be compressed to eliminate the data size.

Support CONFIG command to dynamically change slowlog threshold

Amend broker api

Rename host to proxy
Rename database to cluster
Rename db_name to cluster_name

Support migration for multi-key command

Add timeout for the api in memory broker

Sharded Coordinator

If the whole undermoon cluster has more than 100k server proxies, the coordinator might not be able to hold such amount of connections.

We need to divide coordinators into different shards by clusters and server proxies.

Need to delete some part of the data after migration

When scaling the proxies just migrate all the data from one to another, leaving the two involving proxies holding another half part of the data which they don't own.
Need to delete this data after migration by SCAN and DELETE command.

Amend INFO command

Amend the formatting of INFO command.
Move UMCTL INFOREPL to INFO command.

Support multiple db in Redis

Support real SELECT command.

Support redis-cluster-proxy

Support the official redis-cluster-proxy

Optimize UMFORWARD

Now we use UMFORWARD command to support additional attributes to implement max_redirections, which result in command wrapping and unwrapping and not the best performance.
Maybe we can implement RESP3 and use the attributes in RESPS3 to optimize it.

Task for deleting keys running with migration task

After migration, a task for deleting keys will be started and currently will cause some problems:

(1) Data Inconsistency When scaling up and down
Fixed by #158

If a cluster is scaling up and down frequently,
a migration task could be running with a task for deleting keys covering the same slots, which could result in losing some keys.

The PR 158 fix it by checking whether there's any deleting key task before starting a migration in the API.

(2) High CPU Usage

Improvements

Limit thread number
Expose inner meta of server_proxy
set tcp no delay
let channel size configurable

doyoubi / undermoon Goto Github PK

undermoon's People

Contributors

Stargazers

Watchers

Forkers

undermoon's Issues

Recent Migration Limitations

Solutions

Detailed Steps for Scanning

v0.2 Roadmap

Recommend Projects

Recommend Topics

Recommend Org