DCU vGPU device plugin for HAMi

Introduction

This is a [Kubernetes][k8s] [device plugin][dp] implementation that enables the registration of hygon DCU in a container cluster for compute workload. With the approrpriate hardware and this plugin deployed in your Kubernetes cluster, you will be able to run jobs that require AMD DCU. It supports DCU-virtualzation by using hy-virtual provided by dtk

Architect

The flow of vDCU job is as follows:

Prerequisites

dtk >= 24.04
hy=smi == v1.6.0

Limitations

This plugin targets Kubernetes v1.18+.

Deployment

Prepare

# on the dcu node, create these directory:
$ mkdir /etc/vdev

# should change dtk-xx.xx.x to your installed dtk version
$ cp -r /opt/dtk-xx.xx.x /opt/dtk

$ kubectl apply -f k8s-dcu-rbac.yaml
$ kubectl apply -f k8s-dcu-plugin.yaml
# replace NODE_NAME with your dcu node name
$ kubectl label node NODE_NAME dcu=on

Build

docker build .

Examples

apiVersion: v1
kind: Pod
metadata:
  name: alexnet-tf-gpu-pod-mem
  labels:
    purpose: demo-tf-amdgpu
spec:
  containers:
    - name: alexnet-tf-gpu-container
      image: ubuntu:20.04
      workingDir: /root
      command: ["sleep","infinity"]
      resources:
        limits:
          hygon.com/dcunum: 1 # requesting a GPU
          hygon.com/dcumem: 2000 # each dcu require 2000 MiB device memory
          hygon.com/dcucores: 15 # each dcu use 60% of total compute cores

Validation

Inside container, use hy-virtual to validate

source /opt/hygondriver/env.sh
hy-virtual -show-device-info

There will be output like these:

Device 0:
	Actual Device: 0
	Compute units: 9
	Global memory: 2097152000 bytes

Maintainer

[email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
cmd		cmd
dcu-dcgm @ 3ad7dd4		dcu-dcgm @ 3ad7dd4
example		example
helm		helm
internal/pkg		internal/pkg
testdata		testdata
.bashrc		.bashrc
.gitignore		.gitignore
.gitmodules		.gitmodules
.profile		.profile
DCU_job_flow.png		DCU_job_flow.png
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
default_use.yaml		default_use.yaml
go.mod		go.mod
go.sum		go.sum
k8s-dcu-plugin.yaml		k8s-dcu-plugin.yaml
k8s-dcu-rbac.yaml		k8s-dcu-rbac.yaml
k8s-ds-amdgpu-dp-health.yaml		k8s-ds-amdgpu-dp-health.yaml
k8s-ds-amdgpu-dp.yaml		k8s-ds-amdgpu-dp.yaml
k8s-ds-amdgpu-labeller.yaml		k8s-ds-amdgpu-labeller.yaml
labeller.Dockerfile		labeller.Dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DCU vGPU device plugin for HAMi

Introduction

Architect

Prerequisites

Limitations

Deployment

Prepare

Build

Examples

Validation

Maintainer

About

Releases

Packages

Contributors 2

Languages

License

Project-HAMi/dcu-vgpu-device-plugin

Folders and files

Latest commit

History

Repository files navigation

DCU vGPU device plugin for HAMi

Introduction

Architect

Prerequisites

Limitations

Deployment

Prepare

Build

Examples

Validation

Maintainer

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages