Implement: CLUSTER REPLICATE NO ONE #1674

skolosov-snap · 2025-02-05T22:43:54Z

Currently, ValKey doesn't allow to detach replica attached to primary node. So, if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients. All clients will keep sending traffic to this node (with getting inaccurate responses) until they refresh their topology.

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

codecov · 2025-02-06T01:17:26Z

Codecov Report

Attention: Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.

Project coverage is 71.13%. Comparing base (e03b3f1) to head (9efee82).

Files with missing lines	Patch %	Lines
src/cluster_legacy.c	81.81%	4 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##           unstable    #1674      +/-   ##
============================================
+ Coverage     70.96%   71.13%   +0.16%     
============================================
  Files           123      123              
  Lines         65648    65669      +21     
============================================
+ Hits          46587    46711     +124     
+ Misses        19061    18958     -103

Files with missing lines	Coverage Δ
src/commands.def	`100.00% <ø> (ø)`
src/cluster_legacy.c	`85.88% <81.81%> (+0.14%)`	⬆️

... and 15 files with indirect coverage changes

src/cluster_legacy.c

hpatro

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

src/cluster_legacy.c

skolosov-snap · 2025-02-07T23:47:20Z

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

Updated.

zuiderkwast

This feature makes sense to me.

@valkey-io/core-team New arguments = major decision. Please approve or vote if you agree.

src/cluster_legacy.c

src/commands/cluster-replicate.json

zuiderkwast · 2025-02-10T17:12:17Z

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

src/commands/cluster-replicate.json

skolosov-snap · 2025-02-10T17:37:35Z

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

Done

src/cluster_legacy.c

skolosov-snap · 2025-02-12T17:54:09Z

Any objection to merge it?

zuiderkwast · 2025-02-12T18:58:28Z

We're busy making the 8.1.0 release candidate just now. This one will need to wait and get merged after that.

madolson · 2025-02-13T22:29:39Z

Any objection to merge it?

We should also have some tests validating this new behavior works as intended. Have a cluster, disconnect the replica, make sure slots/shards and all are still consistent and the rest of the cluster agrees on the state.

src/commands/cluster-replicate.json

src/cluster_legacy.c

PingXie · 2025-02-17T18:26:08Z

if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients.

Can we introduce a new mode so it doesn't forget all the nodes in the cluster? I think conceptually we are discussing a form of reset still so it seems to me that the solution is too tactical. Maybe CLUSTER RESET SOFT?

BTW, I just noticed that the forget path is not always working. The reset node joined back to the cluster quickly.

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

I don't see the implementation moves the node to a new shard. This would leave two primaries (one real and one empty) in the original shard, which will confuse the client.

hpatro · 2025-02-17T19:15:29Z

I don't see the implementation moves the node to a new shard. This would leave two primaries (one real and one empty) in the original shard, which will confuse the client.

Good catch!

skolosov-snap · 2025-02-19T20:21:14Z

if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients.

Can we introduce a new mode so it doesn't forget all the nodes in the cluster? I think conceptually we are discussing a form of reset still so it seems to me that the solution is too tactical. Maybe CLUSTER RESET SOFT?

IMHO that is just a syntactical question. Whatever command name we would come up with, the behavior of it would be the same: turn replica into primary with leaving it in the cluster. If you think the name of CLUSTER RESET SOFT is better I can support it in that name. My personal opinion is that CLUSTER RESET is not the best command to implement this feature, because the main thing what CLUSTER RESET does is excluding node from the cluster and that is exactly what we want to avoid. On the other hand CLUSTER REPLICATE NO ONE is consistent with similar non-cluster version of REPLICAOF NO ONE what should not confuse client but even give some kind of similarity.

BTW, I just noticed that the forget path is not always working.

What do you mean by "forget path is not always working"? What forget path? We are not doing any forgetting here.

The reset node joined back to the cluster quickly.

AFAIU if node is reset it will not be added back to the cluster automatically, but only when somebody does it explicitly. So if you want to remove replica manually (i.e. node is scheduled for maintenance) the only option for you is to reset with affecting all clients (start seeing errors).

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

I don't see the implementation moves the node to a new shard. This would leave two primaries (one real and one empty) in the original shard, which will confuse the client.

What is a shard? Would you please shed a light on this term? Is it something special for ValKey? AFAIU shard is primary with replicas attached to it. So, when we switched role of the node from replica to empty primary, doesn't it mean that we moved it to separate new shard?

skolosov-snap · 2025-02-19T23:08:35Z

I believe I figured it out. There is internal shards dict that needs to be updated.

zuiderkwast · 2025-02-20T00:10:04Z

@PingXie

Can we introduce a new mode so it doesn't forget all the nodes in the cluster? I think conceptually we are discussing a form of reset still so it seems to me that the solution is too tactical. Maybe CLUSTER RESET SOFT?

I agree with @skolosov-snap that CLUSTER REPLICATE NO ONE is better. The node is not leaving the cluster so RESET seems less intuitive.

BTW, I just noticed that the forget path is not always working. The reset node joined back to the cluster quickly.

To make the cluster forget a node, you need to send CLUSTER FORGET to the other nodes in the cluster. If you only reset a node, it will disconnect but the other nodes will find it again soon. I believe this is the documented behavior.

skolosov-snap · 2025-02-20T01:15:05Z

To make the cluster forget a node, you need to send CLUSTER FORGET to the other nodes in the cluster. If you only reset a node, it will disconnect but the other nodes will find it again soon. I believe this is the documented behavior.

Looking at the doc it should behave differently:

This command is mainly useful to re-provision a Valkey Cluster node in order to be used in the context of a new, different cluster.

So, I believe potentially it may be fixed in the future.

@PingXie, @zuiderkwast You are right about current CLUSTER RESET behavior. However, it takes about 15 secs to re-join the cluster. So, we gonna have 15 seconds of errors which is pretty long for errors in such scenarios:

27859:M 19 Feb 2025 17:11:14.472 * Connection with replica 127.0.0.1:17383 lost.
27859:M 19 Feb 2025 17:11:29.523 * Sending MEET packet to node 096e0230bddddb11d32e36497adcc4235bffa7d1 () because there is no inbound link for it
27859:M 19 Feb 2025 17:11:29.593 * Successfully completed handshake with 096e0230bddddb11d32e36497adcc4235bffa7d1 ()

PingXie · 2025-02-20T16:31:15Z

@skolosov-snap @zuiderkwast you have a good point. CLUSTER REPLICATE NO ONE works better.

BTW, a follow up thought, should we consider a single token NONE as opposed to NO ONE? I understand the rationale of mimicking REPLICAOF NO ONE but it using two tokens for one concept seems quite arbitrary. I wonder if it is a lesser evil if we go with NONE this time? So CLUSTER REPLICATE NONE perhaps?

zuiderkwast · 2025-02-20T16:34:54Z

a single token NONE as opposed to NO ONE?

I suggested it earlier in this PR in comment #1674 (comment) and I was already convinced that NO ONE is better. 😆

PingXie · 2025-02-20T16:37:58Z

a single token NONE as opposed to NO ONE?

I suggested it earlier in this PR in comment #1674 (comment) and I was already convinced that NO ONE is better. 😆

Hmm the comment link seems broken. What convinced you? :)

skolosov-snap · 2025-02-20T16:46:22Z

BTW, a follow up thought, should we consider a single token NONE as opposed to NO ONE? I understand the rationale of mimicking REPLICAOF NO ONE but it using two tokens for one concept seems quite arbitrary. I wonder if it is a lesser evil if we go with NONE this time? So CLUSTER REPLICATE NONE perhaps?

Not a big deal to me, but IMHO consistency/similarity between commands is better. What benefit will we get if we replace it with single toke?
Also, I understand that probability of having NONE node-id is near the zero, but is not zero. I.e. in a future if we let say allow to set ID by user or in some kind of test.
Again, not a big deal to me, but I don't see any benefits from making a single token (except may be shorter if-statement in the code).

PingXie · 2025-02-20T17:03:54Z

Also, I understand that probability of having NONE node-id is near the zero, but is not zero.

The probability will be practically 0.

I.e. in a future if we let say allow to set ID by user or in some kind of test.

We have human_nodename for this purpose.

    sds human_nodename;                     /* The known human readable nodename for this node */

zuiderkwast · 2025-02-20T17:09:16Z

a single token NONE as opposed to NO ONE?

I suggested it earlier in this PR in comment #1674 (comment) and I was already convinced that NO ONE is better. 😆

Hmm the comment link seems broken. What convinced you? :)

Interesting, the link works for me. It's one of the resolved comments above on src/commands/cluster-replicate.json.

Quoting @skolosov-snap's comment which made sense to me:

I think similarity to REPLICAOF NO ONE is better than trying to keep the same arity for different cases. I think client are not thinking about arity at all but similarity between commands is easier for them remember. In addition, for us it is easier to distinguish <node-id> from a special case.

skolosov-snap · 2025-02-26T19:21:43Z

@PingXie is command naming still concern for you?

madolson · 2025-02-26T20:24:31Z

The probability will be practically 0.

No it's zero, the node ID is hex characters, you can't have N or O.

I also think it's worth keeping NO ONE because that just makes in mirror replicaof.

tests/cluster/tests/12-replication.tcl

PingXie · 2025-02-26T23:58:34Z

@PingXie is command naming still concern for you?

I am still inclined to a single token but NO ONE is NOT a blocker for me. Can we take an explicit vote on the command name, for the record? @valkey-io/core-team

👍 for "NO ONE"

Signed-off-by: Sergey Kolosov <[email protected]>

enjoy-binbin

The new code LGTM, thanks.

enjoy-binbin · 2025-02-28T03:06:02Z

src/cluster_legacy.c

+            clusterCloseAllSlots();
+            resetManualFailover();
+
+            // moving new primary to its own shard.


@PingXie please also ack this logic.

Suggested change

// moving new primary to its own shard.

/* Moving new primary to its own shard. */

soloestoy · 2025-02-28T03:16:28Z

@PingXie is command naming still concern for you?

I am still inclined to a single token but NO ONE is NOT a blocker for me. Can we take an explicit vote on the command name, for the record? @valkey-io/core-team

👍 for "NO ONE"

I prefer replicate no-one, align with the original "node-id", "no-one" is just a special node-id

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 91589d1 to ff96c0f Compare February 5, 2025 22:51

enjoy-binbin reviewed Feb 6, 2025

View reviewed changes

src/cluster_legacy.c Show resolved Hide resolved

hpatro reviewed Feb 7, 2025

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from fde1ab6 to 4abbae8 Compare February 7, 2025 23:46

zuiderkwast added the major-decision-pending Major decision pending by TSC team label Feb 10, 2025

zuiderkwast reviewed Feb 10, 2025

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/commands/cluster-replicate.json Outdated Show resolved Hide resolved

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 4abbae8 to 85238e6 Compare February 10, 2025 16:12

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 85238e6 to 3789227 Compare February 10, 2025 17:24

zuiderkwast reviewed Feb 10, 2025

View reviewed changes

src/commands/cluster-replicate.json Outdated Show resolved Hide resolved

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 3789227 to e4e8b24 Compare February 10, 2025 17:36

enjoy-binbin reviewed Feb 11, 2025

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

enjoy-binbin reviewed Feb 11, 2025

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/cluster_legacy.c Outdated Show resolved Hide resolved

src/cluster_legacy.c Outdated Show resolved Hide resolved

skolosov-snap force-pushed the skolosov/replicate-no-one branch from e4e8b24 to bed392b Compare February 11, 2025 16:31

enjoy-binbin reviewed Feb 14, 2025

View reviewed changes

src/commands/cluster-replicate.json Show resolved Hide resolved

enjoy-binbin reviewed Feb 14, 2025

View reviewed changes

src/cluster_legacy.c Outdated Show resolved Hide resolved

skolosov-snap force-pushed the skolosov/replicate-no-one branch from bed392b to b419cfb Compare February 19, 2025 23:04

skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from b9ad4fd to 95c0b42 Compare February 20, 2025 00:06

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 95c0b42 to 445e078 Compare February 20, 2025 01:55

zuiderkwast mentioned this pull request Feb 20, 2025

CLUSTER RESET forgets other nodes, but they don't forget you #1757

Open

madolson reviewed Feb 26, 2025

View reviewed changes

tests/cluster/tests/12-replication.tcl Outdated Show resolved Hide resolved

skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from c636c6b to fd65b48 Compare February 28, 2025 00:01

Implement: CLUSTER REPLICATE NO ONE

9efee82

Signed-off-by: Sergey Kolosov <[email protected]>

skolosov-snap force-pushed the skolosov/replicate-no-one branch from 9c3276f to 9efee82 Compare February 28, 2025 00:04

enjoy-binbin approved these changes Feb 28, 2025

View reviewed changes

enjoy-binbin added release-notes This issue should get a line item in the release notes needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. labels Feb 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement: CLUSTER REPLICATE NO ONE #1674

Implement: CLUSTER REPLICATE NO ONE #1674

skolosov-snap commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 6, 2025 •

edited

Loading

hpatro left a comment

skolosov-snap commented Feb 7, 2025

zuiderkwast left a comment

zuiderkwast commented Feb 10, 2025

skolosov-snap commented Feb 10, 2025

skolosov-snap commented Feb 12, 2025

zuiderkwast commented Feb 12, 2025

madolson commented Feb 13, 2025

PingXie commented Feb 17, 2025

hpatro commented Feb 17, 2025

skolosov-snap commented Feb 19, 2025 •

edited

Loading

skolosov-snap commented Feb 19, 2025

zuiderkwast commented Feb 20, 2025

skolosov-snap commented Feb 20, 2025 •

edited

Loading

PingXie commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

PingXie commented Feb 20, 2025

skolosov-snap commented Feb 20, 2025 •

edited

Loading

PingXie commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

skolosov-snap commented Feb 26, 2025

madolson commented Feb 26, 2025 •

edited

Loading

PingXie commented Feb 26, 2025 •

edited

Loading

enjoy-binbin left a comment

enjoy-binbin Feb 28, 2025

soloestoy commented Feb 28, 2025 •

edited

Loading

	// moving new primary to its own shard.
	/* Moving new primary to its own shard. */

Implement: CLUSTER REPLICATE NO ONE #1674

Are you sure you want to change the base?

Implement: CLUSTER REPLICATE NO ONE #1674

Conversation

skolosov-snap commented Feb 5, 2025 • edited Loading

codecov bot commented Feb 6, 2025 • edited Loading

Codecov Report

hpatro left a comment

Choose a reason for hiding this comment

skolosov-snap commented Feb 7, 2025

zuiderkwast left a comment

Choose a reason for hiding this comment

zuiderkwast commented Feb 10, 2025

skolosov-snap commented Feb 10, 2025

skolosov-snap commented Feb 12, 2025

zuiderkwast commented Feb 12, 2025

madolson commented Feb 13, 2025

PingXie commented Feb 17, 2025

hpatro commented Feb 17, 2025

skolosov-snap commented Feb 19, 2025 • edited Loading

skolosov-snap commented Feb 19, 2025

zuiderkwast commented Feb 20, 2025

skolosov-snap commented Feb 20, 2025 • edited Loading

PingXie commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

PingXie commented Feb 20, 2025

skolosov-snap commented Feb 20, 2025 • edited Loading

PingXie commented Feb 20, 2025

zuiderkwast commented Feb 20, 2025

skolosov-snap commented Feb 26, 2025

madolson commented Feb 26, 2025 • edited Loading

PingXie commented Feb 26, 2025 • edited Loading

enjoy-binbin left a comment

Choose a reason for hiding this comment

enjoy-binbin Feb 28, 2025

Choose a reason for hiding this comment

soloestoy commented Feb 28, 2025 • edited Loading

skolosov-snap commented Feb 5, 2025 •

edited

Loading

codecov bot commented Feb 6, 2025 •

edited

Loading

skolosov-snap commented Feb 19, 2025 •

edited

Loading

skolosov-snap commented Feb 20, 2025 •

edited

Loading

skolosov-snap commented Feb 20, 2025 •

edited

Loading

madolson commented Feb 26, 2025 •

edited

Loading

PingXie commented Feb 26, 2025 •

edited

Loading

soloestoy commented Feb 28, 2025 •

edited

Loading