This is a [Kubernetes][k8s] [device plugin][dp] implementation that enables the registration of hygon DCU in a container cluster for compute workload. With the approrpriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD DCU. It supports DCU-virtualzation by using hy-virtual provided by dtk
The flow of vDCU job is as follows:
- dtk >= 24.04
- hy=smi == v1.6.0
- This plugin targets Kubernetes v1.18+.
# on the dcu node, create these directory:
$ mkdir /etc/vdev
# should change dtk-xx.xx.x to your installed dtk version
$ cp -r /opt/dtk-xx.xx.x /opt/dtk
$ kubectl apply -f k8s-dcu-rbac.yaml
$ kubectl apply -f k8s-dcu-plugin.yaml
# replace NODE_NAME with your dcu node name
$ kubectl label node NODE_NAME dcu=on
docker build .
apiVersion: v1
kind: Pod
metadata:
name: alexnet-tf-gpu-pod-mem
labels:
purpose: demo-tf-amdgpu
spec:
containers:
- name: alexnet-tf-gpu-container
image: ubuntu:20.04
workingDir: /root
command: ["sleep","infinity"]
resources:
limits:
hygon.com/dcunum: 1 # requesting a GPU
hygon.com/dcumem: 2000 # each dcu require 2000 MiB device memory
hygon.com/dcucores: 15 # each dcu use 60% of total compute cores
Inside container, use hy-virtual to validate
source /opt/hygondriver/env.sh
hy-virtual -show-device-info
There will be output like these:
Device 0:
Actual Device: 0
Compute units: 9
Global memory: 2097152000 bytes