Redis Sentinel retains outdated master node information after pod failure #1217

trynocoding · 2025-01-24T03:43:38Z

What version of redis operator are you using?
redis-operator version: v0.19.0

What operating system and processor architecture are you using (kubectl version)?

[root@master redis]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7", GitCommit:"07a61d861519c45ef5c89bc22dda289328f29343", GitTreeState:"clean", BuildDate:"2023-10-18T11:42:32Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7", GitCommit:"07a61d861519c45ef5c89bc22dda289328f29343", GitTreeState:"clean", BuildDate:"2023-10-18T11:33:23Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
[root@master redis]#

What did you do?

[root@master redis]# cat replication.yaml 
---
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisReplication
metadata:
  name:  redis-replication
spec:
  clusterSize: 3
  securityContext:
    runAsUser: 1000
    fsGroup: 1000
  kubernetesConfig: 
    image: quay.io/opstree/redis:v7.0.15
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 101m
        memory: 128Mi
      limits:
        cpu: 101m
        memory: 128Mi
  redisExporter:
    enabled: true
    image: quay.io/opstree/redis-exporter:v1.44.0
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 100m
        memory: 128Mi
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: local-path
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi

[root@master redis]# cat sentinel.yaml 
---
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisSentinel
metadata:
  name: redis-sentinel
spec:
  clusterSize: 3
  securityContext:
    runAsUser: 1000
    fsGroup: 1000
  redisSentinelConfig: 
    redisReplicationName : redis-replication
  kubernetesConfig:
    image: quay.io/opstree/redis-sentinel:v7.0.15
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 101m
        memory: 128Mi
      limits:
        cpu: 101m
        memory: 128Mi

kubectl apply -f replication.yaml
kubectl apply -f sentinel.yaml
Continuously delete redis master node(redis-replication-0) to simulate node failures such as node crash, shutdown, etc.

redis-replication-0                                   2/2     Running            0                  5m49s   app=redis-replication,controller-revision-hash=redis-replication-66cccc9c9c,redis-role=master,redis_setup_type=replication,role=replication,statefulset.kubernetes.io/pod-name=redis-replication-0
redis-replication-1                                   2/2     Running            0                  5m42s   app=redis-replication,controller-revision-hash=redis-replication-66cccc9c9c,redis-role=slave,redis_setup_type=replication,role=replication,statefulset.kubernetes.io/pod-name=redis-replication-1
redis-replication-2                                   2/2     Running            0                  5m36s   app=redis-replication,controller-revision-hash=redis-replication-66cccc9c9c,redis-role=slave,redis_setup_type=replication,role=replication,statefulset.kubernetes.io/pod-name=redis-replication-2

for-example

watch -n1 "kubectl delete po redis-replication-0 --force"

There seems to be a problem with Sentinel's residual old master node information?

[root@master install_operator]# kubectl exec -it redis-sentinel-sentinel-0 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
redis-sentinel-sentinel-0:/sentinel-data$ redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=myMaster,status=ok,address=10.0.2.30:6379,slaves=2,sentinels=3
127.0.0.1:26379> sentinel replicas myMaster
1)  1) "name"
    2) "10.0.0.130:6379"
    3) "ip"
    4) "10.0.0.130"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave"
   11) "link-pending-commands"
   12) "101"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "95656"
   17) "last-ok-ping-reply"
   18) "95656"
   19) "last-ping-reply"
   20) "95656"
   21) "s-down-time"
   22) "65655"
   23) "down-after-milliseconds"
   24) "30000"
   25) "info-refresh"
   26) "0"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "95656"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
   43) "replica-announced"
   44) "1"
2)  1) "name"
    2) "10.0.1.191:6379"
    3) "ip"
    4) "10.0.1.191"
    5) "port"
    6) "6379"
    7) "runid"
    8) "f197fb15d6cc801c6796d65c3b4c4407306e2a77"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "72"
   19) "last-ping-reply"
   20) "72"
   21) "down-after-milliseconds"
   22) "30000"
   23) "info-refresh"
   24) "5320"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "95656"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "10.0.2.30"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "141477"
   41) "replica-announced"
   42) "1"
127.0.0.1:26379>

What did you expect to see?
when a master node fails, Sentinel should immediately update its status to clear the old master node information.

What did you see instead?
Old master pod exist in Sentinel

The text was updated successfully, but these errors were encountered:

trynocoding added the bug Something isn't working label Jan 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redis Sentinel retains outdated master node information after pod failure #1217

Redis Sentinel retains outdated master node information after pod failure #1217

trynocoding commented Jan 24, 2025

Redis Sentinel retains outdated master node information after pod failure #1217

Redis Sentinel retains outdated master node information after pod failure #1217

Comments

trynocoding commented Jan 24, 2025