-
Notifications
You must be signed in to change notification settings - Fork 47
Kueue Installation with ODH
- Prereqs
- Install Kueue
- Install Open Data Hub Operator
- Create the ODH namespace
- Deploy the DataScienceCluster
- Access the spawner page by going to your Open Data Hub dashboard
- Create Kueue cluster-wide ResourceFlavor and ClusterQueue
- Create the local Kueue
- Run a Kueue Sample job
- Cleanup Steps
0.1. You need to have an OpenShift Cluster. Either a medium or a Large QuickBurn Fyre Cluster will work
0.2. You need to have a default storage class, otherwise you can install PortWorx
0.3. You're going to need your oc login info for your cluster so you can login via your Laptop or from the Cluster terminal.
For example:
oc login --token=sha256~lamzJ-exoR16UsbltkT-l0nKCL7XTSvLqqB4i54psBM --server=https://api.jimmed414.cp.fyre.ibm.com:6443
More info about Kueue here: https://github.com/kubernetes-sigs/kueue
oc apply --server-side -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.5.1/manifests.yaml
Check that it started:
oc get pods -n kueue-system
Using the OpenShift UI, navigate to:
Operators --> OperatorHub
and search for Open Data Hub Operator and install it using the fast channel. (It should be version 2.Y.Z)
You can check it with:
oc get pods -n openshift-operators
ODH_NS=opendatahub # Note, you can change this as you need it for other namespaces
oc new-project ${ODH_NS}
cat << EOF | oc apply -f -
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
labels:
app.kubernetes.io/created-by: opendatahub-operator
app.kubernetes.io/instance: default
app.kubernetes.io/managed-by: kustomize
app.kubernetes.io/name: datasciencecluster
app.kubernetes.io/part-of: opendatahub-operator
name: example-dsc
namespace: ${ODH_NS}
spec:
components:
codeflare:
managementState: Removed
dashboard:
managementState: Managed
datasciencepipelines:
managementState: Removed
kserve:
managementState: Removed
modelmeshserving:
managementState: Removed
ray:
managementState: Managed
workbenches:
managementState: Managed
EOF
You'll end up with kuberay, Notebook-controller and the underlying dashboard, like this:
oc get pods -n ${ODH_NS}
Returns
NAME READY STATUS RESTARTS AGE
kuberay-operator-5d9567bdf4-gshxm 1/1 Running 0 79s
notebook-controller-deployment-6468bbf669-89gr8 1/1 Running 0 91s
odh-dashboard-649fdc86bb-4n9xb 2/2 Running 0 93s
odh-dashboard-649fdc86bb-5t7k4 2/2 Running 0 93s
odh-notebook-controller-manager-86d9b47b54-8jql7 1/1 Running 0 92s
Step 5. Access the spawner page by going to your Open Data Hub dashboard. It'll be in the format of:
https://odh-dashboard-$ODH_NAMESPACE.apps.<your cluster's uri>
You can find it with this command:
oc get route -n ${ODH_NS} |grep dash
For example: https://odh-dashboard-odh.apps.jimbig412.cp.fyre.ibm.com/
- If prompted, give it your kubeadmin user and password
- If prompted, grant it access as well
5.1 One the far left, click on "Data Science Projects" and the click on Create a Data Science Project. (This will be a new namespace name)
for example:
Name: demo-dsp
Description: Demo's DSP
Then press "Create"
5.2 Within your new Data Science Project, select "Create workbench"
- give it a name, like "demo-wb"
- choose "Jupyter Data Science" for the image
- click "Create workbench" at the bottom.
5.3 You'll see the status as "Starting" initially.
- Once it's in the running status, click on the blue "Open" link in the workbench to get access to the notebook.
5.4 Click on the black "Terminal" under Other section to open up a terminal window.
Inside this terminal, do an "oc login" so that terminal has access to your OpenShift Cluster. For example:
oc login --token=sha256~lamzJ-exoR16UsbltkT-l0nKCL7XTSvLqqB4i54psBM --server=https://api.jimmed414.cp.fyre.ibm.com:6443
5.5 Now you should be able to see the pods on your OpenShift cluster. For example:
oc get pods
Will return the pods in your newly created namespace:
NAME READY STATUS RESTARTS AGE
demo-wb-0 2/2 Running 0 14m
Some quick definitions:
ResourceFlavor:
An object that you can define to describe what resources are available in a cluster.
In this case, the resources in our cluster are homogeneous, we use empty ResourceFlavor instead.
Note: To associate a ResourceFlavor with a subset of nodes of your cluster, you can configure the .spec.nodeLabels field with matching node labels that uniquely identify the nodes.
ClusterQueue:
A ClusterQueue is a cluster-scoped object that governs a pool of resources such as Pods, CPU, memory, and hardware accelerators. This ClusterQueue object defines the available quotas for the default-flavors that the cluster-queue manages.
6.1 Apply yaml to create the Kueue cluster-wide ResourceFlavor and ClusterQueue:
cat << EOF | oc apply -f -
apiVersion: kueue.x-k8s.io/v1beta1
kind: ResourceFlavor
metadata:
name: "default-flavor"
---
apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
metadata:
name: "cluster-queue"
spec:
namespaceSelector: {} # match all.
resourceGroups:
- coveredResources: ["cpu", "memory"]
flavors:
- name: "default-flavor"
resources:
- name: "cpu"
nominalQuota: 9
- name: "memory"
nominalQuota: 36Gi
EOF
Two items get created at the cluster scope:
resourceflavor.kueue.x-k8s.io/default-flavor created
clusterqueue.kueue.x-k8s.io/cluster-queue created
LocalQueue
A LocalQueue is a namespaced object that groups closely related Workloads that belong to a single namespace. A LocalQueue points to one ClusterQueue from which resources are allocated to run its Workloads.
To associate a Job to the LocalQueue in the namespace. Add a metadata.labels and the namespace to the Job.
7.1 In the notebook terminal, create a Kueue localqueue:
cat << EOF | oc apply -f -
apiVersion: kueue.x-k8s.io/v1beta1
kind: LocalQueue
metadata:
name: "demo-queue"
spec:
clusterQueue: "cluster-queue"
EOF
You will end up with one item created, for example:
localqueue.kueue.x-k8s.io/demo-queue created
The resourceflavor and clusterqueue are at the cluster scope, and the localqueue is created in the default namespace as specified in the yaml above:
oc get resourceflavor -A
NAME AGE
default-flavor 104s
oc get clusterqueue -A
NAME COHORT PENDING WORKLOADS
cluster-queue 0
oc get localqueue -A
NAMESPACE NAME CLUSTERQUEUE PENDING WORKLOADS ADMITTED WORKLOADS
demo-dsp demo-queue cluster-queue 0 0
8.1 Run a Kueue Sample job against your new local queue (Adjust the queue name and namespace to reflect your names
cat << EOF | oc apply -f -
apiVersion: batch/v1
kind: Job
metadata:
name: sample-job-1
namespace: demo-dsp
labels:
kueue.x-k8s.io/queue-name: demo-queue
spec:
parallelism: 3
completions: 3
suspend: true
template:
spec:
containers:
- name: dummy-job
image: gcr.io/k8s-staging-perf-tests/sleep:v0.1.0
args: ["30s"]
resources:
requests:
cpu: 1
memory: "200Mi"
restartPolicy: Never
EOF
8.2. Check that the job and pods start:
watch oc get jobs,pods
The pods will go to complete status after 30 seconds, for example:
Every 2.0s: oc get jobs,pods jim-wb-0: Wed Jan 17 19:30:03 2024
NAME COMPLETIONS DURATION AGE
job.batch/sample-job-1 3/3 34s 40s
NAME READY STATUS RESTARTS AGE
pod/demo-wb-0 2/2 Running 0 22m
pod/sample-job-1-jc2z5 0/1 Completed 0 40s
pod/sample-job-1-pctkv 0/1 Completed 0 40s
pod/sample-job-1-tk88f 0/1 Completed 0 40s
8.3 Remove the job when you're done with it:
oc delete job sample-job-1
and it'll return:
job.batch "sample-job-1" deleted
9.1 Cleanup your jobs, for example:
oc delete job sample-job-1
9.2 Delete your localqueue, for example:
oc delete localqueue demo-queue
9.3 delete your Cluster-wide Kueue items, for example:
oc delete clusterqueue cluster-queue
oc delete resourceflavor default-flavor
9.4 Exit out of the notebook and delete the notebook resources, for example:
oc delete notebook demo-wb
oc delete pvc demo-wb
9.5 Delete the dsc, for example:
oc delete dsc example-dsc
9.6 delete the Kueue operator
oc delete -f https://github.com/kubernetes-sigs/kueue/releases/download/v0.5.1/manifests.yaml
9.7 Find the subscription and csv for ODH and delete them. For example:
oc get csv,sub -n openshift-operators
and then delete them:
oc delete csv opendatahub-operator.v2.4.0 -n openshift-operators; oc delete sub opendatahub-operator -n openshift-operators
9.8 Delete your Data Science Project, for example:
oc delete ns demo-dsp