Skip to content

Commit

Permalink
Save config file and brocast the PONG when configEpoch changed
Browse files Browse the repository at this point in the history
This is somehow related with valkey-io#974 and valkey-io#1777. When the epoch changes,
we should save the configuration file and broadcast a PONG as much
as possible.

For example, if a primary down after bumping the epoch, its replicas
may initiate a failover, but the other primaries may refuse to vote
because the epoch of the replica has not been updated.

Or for example, for some reasons we bump the epoch, if the epoch
is not updated in time in the cluster, it may affect the judgment
of message staleness.

Signed-off-by: Binbin <[email protected]>
  • Loading branch information
enjoy-binbin committed Mar 4, 2025
1 parent 7a2d50a commit e925965
Showing 1 changed file with 4 additions and 3 deletions.
7 changes: 4 additions & 3 deletions src/cluster_legacy.c
Original file line number Diff line number Diff line change
Expand Up @@ -1938,7 +1938,7 @@ int clusterBumpConfigEpochWithoutConsensus(void) {
if (myself->configEpoch == 0 || myself->configEpoch != maxEpoch) {
server.cluster->currentEpoch++;
myself->configEpoch = server.cluster->currentEpoch;
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG | CLUSTER_TODO_BROADCAST_ALL);
serverLog(LL_NOTICE, "New configEpoch set to %llu", (unsigned long long)myself->configEpoch);
return C_OK;
} else {
Expand Down Expand Up @@ -2001,7 +2001,7 @@ void clusterHandleConfigEpochCollision(clusterNode *sender) {
/* Get the next ID available at the best of this node knowledge. */
server.cluster->currentEpoch++;
myself->configEpoch = server.cluster->currentEpoch;
clusterSaveConfigOrDie(1);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG | CLUSTER_TODO_BROADCAST_ALL);
serverLog(LL_NOTICE, "configEpoch collision with node %.40s (%s). configEpoch set to %llu", sender->name,
sender->human_nodename, (unsigned long long)myself->configEpoch);
}
Expand Down Expand Up @@ -4776,7 +4776,7 @@ void clusterFailoverReplaceYourPrimary(void) {

/* 3) Update state and save config. */
clusterUpdateState();
clusterSaveConfigOrDie(1);
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG);

/* 4) Pong all the other nodes so that they can update the state
* accordingly and detect that we switched to primary role. */
Expand Down Expand Up @@ -4963,6 +4963,7 @@ void clusterHandleReplicaFailover(void) {
/* Update my configEpoch to the epoch of the election. */
if (myself->configEpoch < server.cluster->failover_auth_epoch) {
myself->configEpoch = server.cluster->failover_auth_epoch;
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_FSYNC_CONFIG | CLUSTER_TODO_BROADCAST_ALL);
serverLog(LL_NOTICE, "configEpoch set to %llu after successful failover",
(unsigned long long)myself->configEpoch);
}
Expand Down

0 comments on commit e925965

Please sign in to comment.