Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]1.0 Back-off restarting failed error occurred when creating pulsar cluster with the latest yaml #8467

Open
tianyue86 opened this issue Nov 15, 2024 · 0 comments
Assignees
Labels
kind/bug Something isn't working

Comments

@tianyue86
Copy link

Describe the env

Kubernetes: v1.30.4-eks-a737599
KubeBlocks: 1.0.0-beta.2
kbcli: 1.0.0-alpha.1

To Reproduce
Steps to reproduce the behavior:

  1. Get the latest pulsar cluster yaml
helm template pulsarc2 ./addons-cluster/pulsar --version 1.0.0-alpha.0
---
# Source: pulsar-cluster/templates/cluster.yaml
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: pulsarc2
  namespace: default
  labels: 
    helm.sh/chart: pulsar-cluster-1.0.0-alpha.0
    app.kubernetes.io/version: "3.0.2"
    app.kubernetes.io/instance: pulsarc2
  annotations:
    resource.kubeblocks.io/ignore-constraint: "true"
    ## Todo: use cluster api to control the rendering logic of service in component definition
    kubeblocks.io/enabled-pod-ordinal-svc: broker
    "kubeblocks.io/extra-env": '{"KB_PULSAR_BROKER_NODEPORT": "false"}'
spec:
  terminationPolicy: Delete
  services:
    - name: broker-bootstrap
      serviceName: broker-bootstrap
      componentSelector: broker
      spec:
        type: ClusterIP
        ports:
          - name: pulsar
            port: 6650
            targetPort: 6650
          - name: http
            port: 80
            targetPort: 8080
          - name: kafka-client
            port: 9092
            targetPort: 9092
    - name: zookeeper
      serviceName: zookeeper
      componentSelector: zookeeper
      spec:
        type: ClusterIP
        ports:
          - name: client
            port: 2181
            targetPort: 2181
  componentSpecs:
    - name: proxy
      componentDef: pulsar-proxy
      
      
      replicas: 3
      resources:
        limits:
          cpu: 
          memory: "512Mi"
        requests:
          cpu: "200m"
          memory: "512Mi"
    - name: bookies-recovery
      componentDef: pulsar-bookies-recovery
      
      
      replicas: 1
      resources:
        limits:
          cpu: 
          memory: "512Mi"
        requests:
          cpu: "200m"
          memory: "512Mi"
    - name: broker
      componentDef: pulsar-broker
      
      
      replicas: 1
      resources:
        limits:
          cpu: 
          memory: "512Mi"
        requests:
          cpu: "200m"
          memory: "512Mi"
    - name: bookies
      componentDef: pulsar-bookkeeper
      
      
      replicas: 4
      resources:
        limits:
          cpu: 
          memory: "512Mi"
        requests:
          cpu: "200m"
          memory: "512Mi"
      volumeClaimTemplates:
        - name: ledgers
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 8Gi
        - name: journal
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 8Gi
    - name: zookeeper
      componentDef: pulsar-zookeeper
      replicas: 1
      resources:
        limits:
          cpu: 
          memory: "512Mi"
        requests:
          cpu: "100m"
          memory: "512Mi"
      volumeClaimTemplates:
        - name: data
          spec:
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 8Gi
  1. Apply the yaml to create cluster
  2. Check the cluster status: Abnormal
NAMESPACE   NAME       CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
default     pulsarc2                                                        Delete                            Abnormal   8m10s

  1. Check pod
k get pod
NAME                                                           READY   STATUS                  RESTARTS        AGE
dp-backup-redis-0-redis-slfirj-20241101082100-77b6d883-fl5lb   2/2     Running                 0               14d
mysql-synchronized-db-test-enable-gtid-ddc95dcf5-vwmsg         1/1     Running                 0               13d
pulsarc2-bookies-0                                             2/2     Running                 0               9m3s
pulsarc2-bookies-1                                             2/2     Running                 0               9m3s
pulsarc2-bookies-2                                             2/2     Running                 0               9m3s
pulsarc2-bookies-3                                             2/2     Running                 0               9m3s
pulsarc2-bookies-recovery-0                                    1/1     Running                 0               9m4s
pulsarc2-broker-0                                              0/2     Init:CrashLoopBackOff   6 (2m49s ago)   9m2s
pulsarc2-proxy-0                                               1/1     Running                 0               9m
pulsarc2-proxy-1                                               1/1     Running                 0               9m
pulsarc2-proxy-2                                               1/1     Running                 0               9m
pulsarc2-zookeeper-0                                           1/1     Running                 0               9m2s
test-db-client-connectionstress-kafka-jgqdoi                   0/1     Completed               0               5h7m
  1. Describe pod
k describe pod pulsarc2-broker-0
Events:
  Type     Reason     Age                     From               Message
  ----     ------     ----                    ----               -------
  Normal   Scheduled  9m52s                   default-scheduler  Successfully assigned default/pulsarc2-broker-0 to ip-172-31-7-55.ap-northeast-1.compute.internal
  Normal   Pulled     9m51s                   kubelet            Container image "docker.io/apecloud/pulsar:3.0.2" already present on machine
  Normal   Created    9m51s                   kubelet            Created container init-broker-cluster
  Normal   Started    9m51s                   kubelet            Started container init-broker-cluster
  Normal   Pulled     9m34s                   kubelet            Container image "docker.io/apecloud/pulsar:3.0.2" already present on machine
  Normal   Created    9m34s                   kubelet            Created container init-sysctl
  Normal   Started    9m34s                   kubelet            Started container init-sysctl
  Normal   Pulled     8m50s (x4 over 9m33s)   kubelet            Container image "docker.io/apecloud/pulsar:2.11" already present on machine
  Normal   Created    8m50s (x4 over 9m33s)   kubelet            Created container init-pulsar-tools
  Normal   Started    8m50s (x4 over 9m33s)   kubelet            Started container init-pulsar-tools
  Warning  BackOff    4m40s (x23 over 9m31s)  kubelet            Back-off restarting failed container init-pulsar-tools in pod pulsarc2-broker-0_default(02bd68fd-cb12-4d85-b1ae-80ef16d9ee3d)
  1. Check cmp
k get cmp
NAME                        DEFINITION                                SERVICE-VERSION   STATUS    AGE
pulsarc2-bookies            pulsar-bookkeeper-3-1.0.0-alpha.0         3.0.2             Running   10m
pulsarc2-bookies-recovery   pulsar-bookies-recovery-3-1.0.0-alpha.0   3.0.2             Running   10m
pulsarc2-broker             pulsar-broker-3-1.0.0-alpha.0             3.0.2             Failed    10m
pulsarc2-proxy              pulsar-proxy-3-1.0.0-alpha.0              3.0.2             Running   10m
pulsarc2-zookeeper          pulsar-zookeeper-3-1.0.0-alpha.0          3.0.2             Running   10m

describe cmp pulsarc2-broker
Events:
Type Reason Age From Message


Normal ComponentPhaseTransition 11m component-controller component is Creating
Normal Unavailable 11m component-controller the component phase is Creating
Normal ComponentPhaseTransition 10m component-controller component is Updating
Normal Unavailable 10m component-controller the component phase is Updating
Normal ComponentPhaseTransition 10m (x2 over 11m) component-controller component is Failed
Normal Unavailable 10m (x2 over 11m) component-controller the component phase is Failed

@tianyue86 tianyue86 added the kind/bug Something isn't working label Nov 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants