Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement: CLUSTER REPLICATE NO ONE #1674

Open
wants to merge 1 commit into
base: unstable
Choose a base branch
from

Conversation

skolosov-snap
Copy link

@skolosov-snap skolosov-snap commented Feb 5, 2025

Currently, ValKey doesn't allow to detach replica attached to primary node. So, if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients. All clients will keep sending traffic to this node (with getting inaccurate responses) until they refresh their topology.

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 91589d1 to ff96c0f Compare February 5, 2025 22:51
Copy link

codecov bot commented Feb 6, 2025

Codecov Report

Attention: Patch coverage is 81.81818% with 4 lines in your changes missing coverage. Please review.

Project coverage is 71.13%. Comparing base (e03b3f1) to head (9efee82).

Files with missing lines Patch % Lines
src/cluster_legacy.c 81.81% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           unstable    #1674      +/-   ##
============================================
+ Coverage     70.96%   71.13%   +0.16%     
============================================
  Files           123      123              
  Lines         65648    65669      +21     
============================================
+ Hits          46587    46711     +124     
+ Misses        19061    18958     -103     
Files with missing lines Coverage Δ
src/commands.def 100.00% <ø> (ø)
src/cluster_legacy.c 85.88% <81.81%> (+0.14%) ⬆️

... and 15 files with indirect coverage changes

Copy link
Collaborator

@hpatro hpatro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from fde1ab6 to 4abbae8 Compare February 7, 2025 23:46
@skolosov-snap
Copy link
Author

cluster-replicate.json file should be updated and as part of the build commands.def will get updated. or if it was accidentally not staged, please add that.

Also, could you run the clang-format on your end to fix some of the formatting issue.

Updated.

@zuiderkwast zuiderkwast added the major-decision-pending Major decision pending by TSC team label Feb 10, 2025
Copy link
Contributor

@zuiderkwast zuiderkwast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feature makes sense to me.

@valkey-io/core-team New arguments = major decision. Please approve or vote if you agree.

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 4abbae8 to 85238e6 Compare February 10, 2025 16:12
@zuiderkwast
Copy link
Contributor

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 85238e6 to 3789227 Compare February 10, 2025 17:24
@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 3789227 to e4e8b24 Compare February 10, 2025 17:36
@skolosov-snap
Copy link
Author

The CI job "DCO" is failing. You need to use git commit -s. See the Details link next to the DCO job.

Why we need it? See here: https://github.com/valkey-io/valkey/blob/unstable/CONTRIBUTING.md#developer-certificate-of-origin thanks!

Done

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from e4e8b24 to bed392b Compare February 11, 2025 16:31
@skolosov-snap
Copy link
Author

Any objection to merge it?

@zuiderkwast
Copy link
Contributor

We're busy making the 8.1.0 release candidate just now. This one will need to wait and get merged after that.

@madolson
Copy link
Member

Any objection to merge it?

We should also have some tests validating this new behavior works as intended. Have a cluster, disconnect the replica, make sure slots/shards and all are still consistent and the rest of the cluster agrees on the state.

@PingXie
Copy link
Member

PingXie commented Feb 17, 2025

if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients.

Can we introduce a new mode so it doesn't forget all the nodes in the cluster? I think conceptually we are discussing a form of reset still so it seems to me that the solution is too tactical. Maybe CLUSTER RESET SOFT?

BTW, I just noticed that the forget path is not always working. The reset node joined back to the cluster quickly.

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

I don't see the implementation moves the node to a new shard. This would leave two primaries (one real and one empty) in the original shard, which will confuse the client.

@hpatro
Copy link
Collaborator

hpatro commented Feb 17, 2025

I don't see the implementation moves the node to a new shard. This would leave two primaries (one real and one empty) in the original shard, which will confuse the client.

Good catch!

@skolosov-snap
Copy link
Author

skolosov-snap commented Feb 19, 2025

if you want to change cluster topology the only way to do it is to reset (CLUSTER RESET command) the node. However, this results into removing node from the cluster what affects clients.

Can we introduce a new mode so it doesn't forget all the nodes in the cluster? I think conceptually we are discussing a form of reset still so it seems to me that the solution is too tactical. Maybe CLUSTER RESET SOFT?

IMHO that is just a syntactical question. Whatever command name we would come up with, the behavior of it would be the same: turn replica into primary with leaving it in the cluster. If you think the name of CLUSTER RESET SOFT is better I can support it in that name. My personal opinion is that CLUSTER RESET is not the best command to implement this feature, because the main thing what CLUSTER RESET does is excluding node from the cluster and that is exactly what we want to avoid. On the other hand CLUSTER REPLICATE NO ONE is consistent with similar non-cluster version of REPLICAOF NO ONE what should not confuse client but even give some kind of similarity.

BTW, I just noticed that the forget path is not always working.

What do you mean by "forget path is not always working"? What forget path? We are not doing any forgetting here.

The reset node joined back to the cluster quickly.

AFAIU if node is reset it will not be added back to the cluster automatically, but only when somebody does it explicitly. So if you want to remove replica manually (i.e. node is scheduled for maintenance) the only option for you is to reset with affecting all clients (start seeing errors).

In this change we implement supporting of new argument for CLUSTER REPLICATE command: CLUSTER REPLICATE NO ONE. When calling this command the node will be converted from replica to empty primary node but still staying in the cluster. Thus, all traffic coming from the clients to this node can be redirected to correct node.

I don't see the implementation moves the node to a new shard. This would leave two primaries (one real and one empty) in the original shard, which will confuse the client.

What is a shard? Would you please shed a light on this term? Is it something special for ValKey? AFAIU shard is primary with replicas attached to it. So, when we switched role of the node from replica to empty primary, doesn't it mean that we moved it to separate new shard?

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from bed392b to b419cfb Compare February 19, 2025 23:04
@skolosov-snap
Copy link
Author

I believe I figured it out. There is internal shards dict that needs to be updated.

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from b9ad4fd to 95c0b42 Compare February 20, 2025 00:06
@zuiderkwast
Copy link
Contributor

@PingXie

Can we introduce a new mode so it doesn't forget all the nodes in the cluster? I think conceptually we are discussing a form of reset still so it seems to me that the solution is too tactical. Maybe CLUSTER RESET SOFT?

I agree with @skolosov-snap that CLUSTER REPLICATE NO ONE is better. The node is not leaving the cluster so RESET seems less intuitive.

BTW, I just noticed that the forget path is not always working. The reset node joined back to the cluster quickly.

To make the cluster forget a node, you need to send CLUSTER FORGET to the other nodes in the cluster. If you only reset a node, it will disconnect but the other nodes will find it again soon. I believe this is the documented behavior.

@skolosov-snap
Copy link
Author

skolosov-snap commented Feb 20, 2025

To make the cluster forget a node, you need to send CLUSTER FORGET to the other nodes in the cluster. If you only reset a node, it will disconnect but the other nodes will find it again soon. I believe this is the documented behavior.

Looking at the doc it should behave differently:

This command is mainly useful to re-provision a Valkey Cluster node in order to be used in the context of a new, different cluster.

So, I believe potentially it may be fixed in the future.

@PingXie, @zuiderkwast You are right about current CLUSTER RESET behavior. However, it takes about 15 secs to re-join the cluster. So, we gonna have 15 seconds of errors which is pretty long for errors in such scenarios:

27859:M 19 Feb 2025 17:11:14.472 * Connection with replica 127.0.0.1:17383 lost.
27859:M 19 Feb 2025 17:11:29.523 * Sending MEET packet to node 096e0230bddddb11d32e36497adcc4235bffa7d1 () because there is no inbound link for it
27859:M 19 Feb 2025 17:11:29.593 * Successfully completed handshake with 096e0230bddddb11d32e36497adcc4235bffa7d1 ()

@PingXie
Copy link
Member

PingXie commented Feb 20, 2025

@skolosov-snap @zuiderkwast you have a good point. CLUSTER REPLICATE NO ONE works better.

BTW, a follow up thought, should we consider a single token NONE as opposed to NO ONE? I understand the rationale of mimicking REPLICAOF NO ONE but it using two tokens for one concept seems quite arbitrary. I wonder if it is a lesser evil if we go with NONE this time? So CLUSTER REPLICATE NONE perhaps?

@zuiderkwast
Copy link
Contributor

a single token NONE as opposed to NO ONE?

I suggested it earlier in this PR in comment #1674 (comment) and I was already convinced that NO ONE is better. 😆

@PingXie
Copy link
Member

PingXie commented Feb 20, 2025

a single token NONE as opposed to NO ONE?

I suggested it earlier in this PR in comment #1674 (comment) and I was already convinced that NO ONE is better. 😆

Hmm the comment link seems broken. What convinced you? :)

@skolosov-snap
Copy link
Author

skolosov-snap commented Feb 20, 2025

BTW, a follow up thought, should we consider a single token NONE as opposed to NO ONE? I understand the rationale of mimicking REPLICAOF NO ONE but it using two tokens for one concept seems quite arbitrary. I wonder if it is a lesser evil if we go with NONE this time? So CLUSTER REPLICATE NONE perhaps?

Not a big deal to me, but IMHO consistency/similarity between commands is better. What benefit will we get if we replace it with single toke?
Also, I understand that probability of having NONE node-id is near the zero, but is not zero. I.e. in a future if we let say allow to set ID by user or in some kind of test.
Again, not a big deal to me, but I don't see any benefits from making a single token (except may be shorter if-statement in the code).

@PingXie
Copy link
Member

PingXie commented Feb 20, 2025

Also, I understand that probability of having NONE node-id is near the zero, but is not zero.

The probability will be practically 0.

I.e. in a future if we let say allow to set ID by user or in some kind of test.

We have human_nodename for this purpose.

    sds human_nodename;                     /* The known human readable nodename for this node */

@zuiderkwast
Copy link
Contributor

a single token NONE as opposed to NO ONE?

I suggested it earlier in this PR in comment #1674 (comment) and I was already convinced that NO ONE is better. 😆

Hmm the comment link seems broken. What convinced you? :)

Interesting, the link works for me. It's one of the resolved comments above on src/commands/cluster-replicate.json.

Quoting @skolosov-snap's comment which made sense to me:

I think similarity to REPLICAOF NO ONE is better than trying to keep the same arity for different cases. I think client are not thinking about arity at all but similarity between commands is easier for them remember. In addition, for us it is easier to distinguish <node-id> from a special case.

@skolosov-snap
Copy link
Author

@PingXie is command naming still concern for you?

@madolson
Copy link
Member

madolson commented Feb 26, 2025

The probability will be practically 0.

No it's zero, the node ID is hex characters, you can't have N or O.

I also think it's worth keeping NO ONE because that just makes in mirror replicaof.

@PingXie
Copy link
Member

PingXie commented Feb 26, 2025

@PingXie is command naming still concern for you?

I am still inclined to a single token but NO ONE is NOT a blocker for me. Can we take an explicit vote on the command name, for the record? @valkey-io/core-team

👍 for "NO ONE"

@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch 2 times, most recently from c636c6b to fd65b48 Compare February 28, 2025 00:01
@skolosov-snap skolosov-snap force-pushed the skolosov/replicate-no-one branch from 9c3276f to 9efee82 Compare February 28, 2025 00:04
Copy link
Member

@enjoy-binbin enjoy-binbin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code LGTM, thanks.

clusterCloseAllSlots();
resetManualFailover();

// moving new primary to its own shard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PingXie please also ack this logic.

Suggested change
// moving new primary to its own shard.
/* Moving new primary to its own shard. */

@enjoy-binbin enjoy-binbin added release-notes This issue should get a line item in the release notes needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. labels Feb 28, 2025
@soloestoy
Copy link
Member

soloestoy commented Feb 28, 2025

@PingXie is command naming still concern for you?

I am still inclined to a single token but NO ONE is NOT a blocker for me. Can we take an explicit vote on the command name, for the record? @valkey-io/core-team

👍 for "NO ONE"

I prefer replicate no-one, align with the original "node-id", "no-one" is just a special node-id

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major-decision-pending Major decision pending by TSC team needs-doc-pr This change needs to update a documentation page. Remove label once doc PR is open. release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants