Skip to content

Commit

Permalink
Fix reconfiguring sub-replica causing data loss when myself change sh…
Browse files Browse the repository at this point in the history
…ard_id

In this case, sender is myself's primary, when executing updateShardId,
not only the sender's shard_id is updated, but also the shard_id of
myself is updated, casuing the subsequent areInSameShard check, that is,
the full_sync_required check to fail.

This one follow valkey-io#885 and closes valkey-io#942.

Signed-off-by: Binbin <[email protected]>
  • Loading branch information
enjoy-binbin committed Aug 26, 2024
1 parent 9f4b1ad commit db05d64
Showing 1 changed file with 17 additions and 2 deletions.
19 changes: 17 additions & 2 deletions src/cluster_legacy.c
Original file line number Diff line number Diff line change
Expand Up @@ -3162,6 +3162,8 @@ int clusterProcessPacket(clusterLink *link) {

/* PING, PONG, MEET: process config information. */
if (type == CLUSTERMSG_TYPE_PING || type == CLUSTERMSG_TYPE_PONG || type == CLUSTERMSG_TYPE_MEET) {
int myself_shard_id_changed = 0;

serverLog(LL_DEBUG, "%s packet received: %.40s", clusterGetMessageTypeString(type),
link->node ? link->node->name : "NULL");

Expand Down Expand Up @@ -3312,10 +3314,16 @@ int clusterProcessPacket(clusterLink *link) {
if (sender->replicaof) clusterNodeRemoveReplica(sender->replicaof, sender);
serverLog(LL_NOTICE, "Node %.40s (%s) is now a replica of node %.40s (%s) in shard %.40s",
sender->name, sender->human_nodename, sender_claimed_primary->name,
sender_claimed_primary->human_nodename, sender->shard_id);
sender_claimed_primary->human_nodename, sender_claimed_primary->shard_id);
clusterNodeAddReplica(sender_claimed_primary, sender);
sender->replicaof = sender_claimed_primary;

/* The later updateShardId may change myself shard_id, and we
* need to remember whether this change has occurred. */
if (sender_claimed_primary->shard_id && myself != sender && myself->replicaof == sender) {
myself_shard_id_changed = 1;
}

/* Update the shard_id when a replica is connected to its
* primary in the very first time. */
updateShardId(sender, sender_claimed_primary->shard_id);
Expand Down Expand Up @@ -3398,7 +3406,14 @@ int clusterProcessPacket(clusterLink *link) {
* so we can try a psync. */
serverLog(LL_NOTICE, "I'm a sub-replica! Reconfiguring myself as a replica of %.40s from %.40s",
myself->replicaof->replicaof->name, myself->replicaof->name);
clusterSetPrimary(myself->replicaof->replicaof, 1, !areInSameShard(myself->replicaof->replicaof, myself));
if (myself_shard_id_changed) {
/* If myself shard_id changes during the clusterProcessPacket, myself
* will not be able to psync with the new shard. */
clusterSetPrimary(myself->replicaof->replicaof, 1, 1);
} else {
int are_in_same_shard = areInSameShard(myself->replicaof->replicaof, myself);
clusterSetPrimary(myself->replicaof->replicaof, 1, !are_in_same_shard);
}
clusterDoBeforeSleep(CLUSTER_TODO_SAVE_CONFIG | CLUSTER_TODO_UPDATE_STATE | CLUSTER_TODO_FSYNC_CONFIG);
}

Expand Down

0 comments on commit db05d64

Please sign in to comment.