Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] mountpod 没有正常退出 #1250

Open
YunhuiChen opened this issue Jan 22, 2025 · 2 comments
Open

[BUG] mountpod 没有正常退出 #1250

YunhuiChen opened this issue Jan 22, 2025 · 2 comments
Labels
kind/bug Something isn't working

Comments

@YunhuiChen
Copy link

What happened:
批量创建一些应用pod:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: dynamic-multi
  labels:
    app: dynamic-multi
spec:
  replicas: 30
  podManagementPolicy: Parallel
  serviceName: dynamic-multi
  selector:
    matchLabels:
      app: dynamic-multi
  template:
    metadata:
      labels:
        app: dynamic-multi
    spec:
      containers:
      - command:
        - sh
        - -c
          # - sleep infinity
        - "while true; do echo $(date -u) >> /data/out.txt; sleep 0.5; done"
        image: registry.cn-hangzhou.aliyuncs.com/juicedata/mount:ce-v1.2.0
        name: tools
        volumeMounts:
        - mountPath: /data
          name: dynamic-multi
          mountPropagation: HostToContainer
  volumeClaimTemplates:
  - metadata:
      name: dynamic-multi
    spec:
      accessModes: [ "ReadWriteMany" ]
      storageClassName: dynamic-ee
      resources:
        requests:
          storage: 1Gi

mountpod没有被正常删除,状态转为error:

Name:                 juicefs-stage3-pvc-48832491-6215-4091-936e-37ebeb32a083-upcxav
Namespace:            kube-system
Priority:             2000001000
Priority Class Name:  system-node-critical
Service Account:      juicefs-csi-node-sa
Node:                 stage3/172.30.117.99
Start Time:           Thu, 16 Jan 2025 11:32:17 +0800
Labels:               app.kubernetes.io/name=juicefs-mount
                      chaostest=true
                      juicefs-hash=15a68d09a1b3c9dfb67ce31e876b502d0975d6b8a2456680332b6af03696e3e
                      volume-id=pvc-48832491-6215-4091-936e-37ebeb32a083
Annotations:          cni.projectcalico.org/containerID: b5994e0628e6f331fbac981d58dac649647398eb06b1775ac13bb54ad8fd28ee
                      cni.projectcalico.org/podIP:
                      cni.projectcalico.org/podIPs:
                      juicefs-delete-at: 2025-01-22 02:54:02
                      juicefs-delete-delay: 30s
                      juicefs-uniqueid: pvc-48832491-6215-4091-936e-37ebeb32a083
                      juicefs-uuid: stage-volume1
Status:               Failed
IP:
IPs:                  <none>
Containers:
  jfs-mount:
    Container ID:  docker://81a08e3a5e05557bccbe083dd1446ffc854680afc61f6b46c6f6249bb8298491
    Image:         registry-vpc.cn-hangzhou.aliyuncs.com/juicedata/mount:ee-5.1.7-e634965
    Image ID:      docker-pullable://registry-vpc.cn-hangzhou.aliyuncs.com/juicedata/mount@sha256:264270c44064954e246733df2e231facdbd80efa2db81678f6c341c9081670e7
    Port:          <none>
    Host Port:     <none>
    Command:
      sh
      -c
      cp /etc/juicefs/stage-volume1.conf /root/.juicefs
      exec /sbin/mount.juicefs stage-volume1 /jfs/pvc-48832491-6215-4091-936e-37ebeb32a083-upcxav -o foreground,no-update,cache-size=102400,enable-xattr,enable-acl,cache-dir=/data/stage-cache
    State:          Terminated
      Reason:       Error
      Exit Code:    2
      Started:      Thu, 16 Jan 2025 11:32:18 +0800
      Finished:     Wed, 22 Jan 2025 10:56:03 +0800
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  5Gi
    Requests:
      cpu:     100m
      memory:  200Mi
    Environment Variables from:
      juicefs-pvc-48832491-6215-4091-936e-37ebeb32a083-secret  Secret  Optional: false
    Environment:
      JFS_FOREGROUND:  1
      JFS_SUPER_COMM:  /tmp/fuse_fd_csi_comm.sock
    Mounts:
      /data/stage-cache from cachedir-0 (rw)
      /etc/juicefs from init-config (rw)
      /etc/updatedb.conf from updatedb (rw)
      /jfs from jfs-dir (rw)
      /tmp from jfs-fuse-fd (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-qhwp6 (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   False
  Initialized                 True
  Ready                       False
  ContainersReady             False
  PodScheduled                True
Volumes:
  init-config:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  juicefs-pvc-48832491-6215-4091-936e-37ebeb32a083-secret
    Optional:    false
  jfs-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/juicefs/volume
    HostPathType:  DirectoryOrCreate
  jfs-fuse-fd:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/juicefs-csi/15a68d09a1b3c9dfb67ce31e876b502d0975d6b8a2456680332b6af03696e3e
    HostPathType:  DirectoryOrCreate
  updatedb:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/updatedb.conf
    HostPathType:  FileOrCreate
  cachedir-0:
    Type:          HostPath (bare host directory volume)
    Path:          /data/stage-cache
    HostPathType:  DirectoryOrCreate
  kube-api-access-qhwp6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 CriticalAddonsOnly op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists

CI可复现:https://github.com/juicedata/jfs/actions/runs/12887089221/job/35970819230?pr=2322

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?

Environment:

  • JuiceFS CSI Driver version (which image tag did your CSI Driver use):
  • Kubernetes version (e.g. kubectl version):
  • Object storage (cloud provider and region):
  • Metadata engine info (version, cloud provider managed or self maintained):
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
  • Others:
@YunhuiChen YunhuiChen added the kind/bug Something isn't working label Jan 22, 2025
@zxh326 zxh326 changed the title [BUG] 创建大量mountpod和应用pod后删除,没有正常退出 [BUG] mountpod 没有正常退出 Jan 22, 2025
@zxh326
Copy link
Member

zxh326 commented Jan 22, 2025

应该和数量无关

问题发生在这个 PR 之后,应该和这个有关 cc @zwwhdls

@zwwhdls
Copy link
Member

zwwhdls commented Jan 22, 2025

mount pod 的 daemon 进程收到 SIGTERM 信号后没有传递给子进程,导致 pod 没有正常退出

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants