Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hami2.4.0 + AscenDevicePlugin 报错UnexpectedAdmissionError #654

Open
ymbZzz opened this issue Nov 28, 2024 · 4 comments · May be fixed by Project-HAMi/ascend-device-plugin#14
Open

Hami2.4.0 + AscenDevicePlugin 报错UnexpectedAdmissionError #654

ymbZzz opened this issue Nov 28, 2024 · 4 comments · May be fixed by Project-HAMi/ascend-device-plugin#14
Labels
kind/bug Something isn't working

Comments

@ymbZzz
Copy link

ymbZzz commented Nov 28, 2024

表现

image
UnexpectedAdmissionError 14m kubelet Allocate failed due to rpc error: code = Unknown desc = parse pod annotation error: unknown uuid: , which is unexpected

可能的原因

Hami在POD中注入注解,第二部份没有UUID信息
image

POD信息

核心是sidecar.istio.io/inject: "true"这个注解
这个注解会动态注入一个container

@ymbZzz ymbZzz added the kind/bug Something isn't working label Nov 28, 2024
@lengrongfu
Copy link
Member

/assign

@lengrongfu
Copy link
Member

@ymbZzz can you provide you pod yaml?

@lengrongfu
Copy link
Member

Below are two pods, one of which uses hami. it label value is hami.io/vgpu-devices-allocated: GPU-ebe7c3f7-303d-558d-435e-99a160631fe4,NVIDIA,1000,10:;,,0,0:;

kind: Pod
apiVersion: v1
metadata:
  name: gpu-test-6f58db7c7c-nszdp
  generateName: gpu-test-6f58db7c7c-
  namespace: default
  uid: 81794181-4045-4880-86ba-5282d73056d7
  resourceVersion: '3276674'
  creationTimestamp: '2024-12-07T08:24:26Z'
  labels:
    app: gpu-test
    pod-template-hash: 6f58db7c7c
  annotations:
    cni.projectcalico.org/containerID: d95faeb288f6134b4cc65a3baeac1fa4d831b03cc92149bc6a641bc15d5d2537
    cni.projectcalico.org/podIP: 10.233.74.96/32
    cni.projectcalico.org/podIPs: 10.233.74.96/32
    hami.io/bind-phase: success
    hami.io/bind-time: '1733559866'
    hami.io/vgpu-devices-allocated: GPU-ebe7c3f7-303d-558d-435e-99a160631fe4,NVIDIA,1000,10:;,,0,0:;
    hami.io/vgpu-devices-to-allocate: ;,,0,0:;
    hami.io/vgpu-node: controller-node-1
    hami.io/vgpu-time: '1733559866'
spec:
  volumes:
    - name: kube-api-access-l46cm
      projected:
        sources:
          - serviceAccountToken:
              expirationSeconds: 3607
              path: token
          - configMap:
              name: kube-root-ca.crt
              items:
                - key: ca.crt
                  path: ca.crt
          - downwardAPI:
              items:
                - path: namespace
                  fieldRef:
                    apiVersion: v1
                    fieldPath: metadata.namespace
        defaultMode: 420
  containers:
    - name: container-1
      image: ubuntu:22.04
      command:
        - sleep
        - '1000000'
      env:
        - name: CUDA_TASK_PRIORITY
          value: '1'
      resources:
        limits:
          cpu: 250m
          memory: 512Mi
          nvidia.com/gpucores: '10'
          nvidia.com/gpumem: 1k
          nvidia.com/vgpu: '1'
        requests:
          cpu: 250m
          memory: 512Mi
          nvidia.com/gpucores: '10'
          nvidia.com/gpumem: 1k
          nvidia.com/vgpu: '1'
      volumeMounts:
        - name: kube-api-access-l46cm
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
    - name: container-2
      image: ubuntu:22.04
      command:
        - sleep
        - '1000000'
      resources:
        limits:
          cpu: 250m
          memory: 512Mi
        requests:
          cpu: 250m
          memory: 512Mi
      volumeMounts:
        - name: kube-api-access-l46cm
          readOnly: true
          mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      imagePullPolicy: IfNotPresent
  restartPolicy: Always
  terminationGracePeriodSeconds: 30
  dnsPolicy: ClusterFirst
  serviceAccountName: default
  serviceAccount: default
  nodeName: controller-node-1
  schedulerName: hami-scheduler

@lengrongfu
Copy link
Member

@ymbZzz I don't have the environment, can you help me test this pr Project-HAMi/ascend-device-plugin#14?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants