Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis Sentinel retains outdated master node information after pod failure #1217

Open
trynocoding opened this issue Jan 24, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@trynocoding
Copy link

What version of redis operator are you using?
redis-operator version: v0.19.0

What operating system and processor architecture are you using (kubectl version)?

[root@master redis]# kubectl version
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7", GitCommit:"07a61d861519c45ef5c89bc22dda289328f29343", GitTreeState:"clean", BuildDate:"2023-10-18T11:42:32Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.7", GitCommit:"07a61d861519c45ef5c89bc22dda289328f29343", GitTreeState:"clean", BuildDate:"2023-10-18T11:33:23Z", GoVersion:"go1.20.10", Compiler:"gc", Platform:"linux/amd64"}
[root@master redis]# 

What did you do?

[root@master redis]# cat replication.yaml 
---
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisReplication
metadata:
  name:  redis-replication
spec:
  clusterSize: 3
  securityContext:
    runAsUser: 1000
    fsGroup: 1000
  kubernetesConfig: 
    image: quay.io/opstree/redis:v7.0.15
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 101m
        memory: 128Mi
      limits:
        cpu: 101m
        memory: 128Mi
  redisExporter:
    enabled: true
    image: quay.io/opstree/redis-exporter:v1.44.0
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 100m
        memory: 128Mi
      limits:
        cpu: 100m
        memory: 128Mi
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: local-path
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 1Gi
[root@master redis]# cat sentinel.yaml 
---
apiVersion: redis.redis.opstreelabs.in/v1beta1
kind: RedisSentinel
metadata:
  name: redis-sentinel
spec:
  clusterSize: 3
  securityContext:
    runAsUser: 1000
    fsGroup: 1000
  redisSentinelConfig: 
    redisReplicationName : redis-replication
  kubernetesConfig:
    image: quay.io/opstree/redis-sentinel:v7.0.15
    imagePullPolicy: IfNotPresent
    resources:
      requests:
        cpu: 101m
        memory: 128Mi
      limits:
        cpu: 101m
        memory: 128Mi
  1. kubectl apply -f replication.yaml
  2. kubectl apply -f sentinel.yaml
  3. Continuously delete redis master node(redis-replication-0) to simulate node failures such as node crash, shutdown, etc.
redis-replication-0                                   2/2     Running            0                  5m49s   app=redis-replication,controller-revision-hash=redis-replication-66cccc9c9c,redis-role=master,redis_setup_type=replication,role=replication,statefulset.kubernetes.io/pod-name=redis-replication-0
redis-replication-1                                   2/2     Running            0                  5m42s   app=redis-replication,controller-revision-hash=redis-replication-66cccc9c9c,redis-role=slave,redis_setup_type=replication,role=replication,statefulset.kubernetes.io/pod-name=redis-replication-1
redis-replication-2                                   2/2     Running            0                  5m36s   app=redis-replication,controller-revision-hash=redis-replication-66cccc9c9c,redis-role=slave,redis_setup_type=replication,role=replication,statefulset.kubernetes.io/pod-name=redis-replication-2

for-example

watch -n1 "kubectl delete po redis-replication-0 --force"
  1. There seems to be a problem with Sentinel's residual old master node information?
[root@master install_operator]# kubectl exec -it redis-sentinel-sentinel-0 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
redis-sentinel-sentinel-0:/sentinel-data$ redis-cli -p 26379
127.0.0.1:26379> info sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=myMaster,status=ok,address=10.0.2.30:6379,slaves=2,sentinels=3
127.0.0.1:26379> sentinel replicas myMaster
1)  1) "name"
    2) "10.0.0.130:6379"
    3) "ip"
    4) "10.0.0.130"
    5) "port"
    6) "6379"
    7) "runid"
    8) ""
    9) "flags"
   10) "s_down,slave"
   11) "link-pending-commands"
   12) "101"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "95656"
   17) "last-ok-ping-reply"
   18) "95656"
   19) "last-ping-reply"
   20) "95656"
   21) "s-down-time"
   22) "65655"
   23) "down-after-milliseconds"
   24) "30000"
   25) "info-refresh"
   26) "0"
   27) "role-reported"
   28) "slave"
   29) "role-reported-time"
   30) "95656"
   31) "master-link-down-time"
   32) "0"
   33) "master-link-status"
   34) "err"
   35) "master-host"
   36) "?"
   37) "master-port"
   38) "0"
   39) "slave-priority"
   40) "100"
   41) "slave-repl-offset"
   42) "0"
   43) "replica-announced"
   44) "1"
2)  1) "name"
    2) "10.0.1.191:6379"
    3) "ip"
    4) "10.0.1.191"
    5) "port"
    6) "6379"
    7) "runid"
    8) "f197fb15d6cc801c6796d65c3b4c4407306e2a77"
    9) "flags"
   10) "slave"
   11) "link-pending-commands"
   12) "0"
   13) "link-refcount"
   14) "1"
   15) "last-ping-sent"
   16) "0"
   17) "last-ok-ping-reply"
   18) "72"
   19) "last-ping-reply"
   20) "72"
   21) "down-after-milliseconds"
   22) "30000"
   23) "info-refresh"
   24) "5320"
   25) "role-reported"
   26) "slave"
   27) "role-reported-time"
   28) "95656"
   29) "master-link-down-time"
   30) "0"
   31) "master-link-status"
   32) "ok"
   33) "master-host"
   34) "10.0.2.30"
   35) "master-port"
   36) "6379"
   37) "slave-priority"
   38) "100"
   39) "slave-repl-offset"
   40) "141477"
   41) "replica-announced"
   42) "1"
127.0.0.1:26379> 

What did you expect to see?
when a master node fails, Sentinel should immediately update its status to clear the old master node information.

What did you see instead?
Old master pod exist in Sentinel

@trynocoding trynocoding added the bug Something isn't working label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant