From 6a20315e0809677ef4fc75cae4f5a9ff84d813b7 Mon Sep 17 00:00:00 2001 From: Andreas Fritzler Date: Tue, 17 Dec 2024 13:38:09 +0100 Subject: [PATCH] Add `ServerMaintenance` CRD --- docs/concepts/servermaintenance.md | 59 ++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 60 insertions(+) create mode 100644 docs/concepts/servermaintenance.md diff --git a/docs/concepts/servermaintenance.md b/docs/concepts/servermaintenance.md new file mode 100644 index 0000000..9538d3a --- /dev/null +++ b/docs/concepts/servermaintenance.md @@ -0,0 +1,59 @@ +# ServerMaintenance + +`ServerMaintenance` represents a maintenance operation for a physical server. It transitions a `Server` from an +operational state (e.g., Available/Reserved) into a Maintenance state. Each `ServerMaintenance` object tracks the +lifecycle of a specific maintenance task, ensuring servers are properly taken offline, updated, and restored. + +## Key Points + +- `ServerMaintenance` is namespaced and can represent various maintenance types (e.g., BIOSUpdate, Cleanup). +- Only one `ServerMaintenance` can be active for a given `Server` at a time. Others remain pending. +- When the active `ServerMaintenance` completes, the next pending maintenance starts. +- If no more maintenance tasks are pending, the `Server` returns to its previous operational state and can be + powered back on. +- The `metal-operator` manages `ServerMaintenance` resources and updates the `Server` state accordingly. +- A maintenance-related operator (e.g., `firmware-operator`) decides if a `Server` needs maintenance, creates the + `ServerMaintenance` resource, and a corresponding `ServerBootConfiguration`, and references it as a + MaintenanceBootConfiguration in the `Server` spec. It also handles powering servers on/off. + +## Workflow + +1. **Determining Maintenance:** + The maintenance operator creates a `ServerMaintenance` resource for the chosen `Server`. + +2. **Transition to Maintenance:** + The `metal-operator` notices the new `ServerMaintenance`, transitions the `Server` into `Maintenance` status. + +3. **Boot Configuration:** + The maintenance operator creates a `ServerBootConfiguration` resource and references it in the `Server` spec as the + MaintenanceBootConfiguration. This configuration is used to e.g. boot custom tooling to perform BIOS/firmware updates + or run cleanup tasks on the `Server`. + +4. **Performing Maintenance:** + The maintenance operator powers off the `Server`, performs the required maintenance task, then updates the + `ServerMaintenance` to `Complete`. + +5. **Post-Maintenance:** + Once complete, if no more maintenance tasks are pending, the `metal-operator` restores the `Server` to its previous + state (e.g., Available/Reserved). The maintenance operator can then power the `Server` back on. + +## Example ServerMaintenance Resource + +```yaml +apiVersion: metal.ironcore.dev/v1alpha1 +kind: ServerMaintenance +metadata: + name: bios-update + namespace: ops +spec: + type: BIOSUpdate + serverRef: + name: server-foo +status: + state: Pending +``` + +Once conditions are met, the `metal-operator` transitions the `Server` to `Maintenance`. The maintenance operator +powers the server off, applies the maintenance boot configuration, performs the maintenance, marks `ServerMaintenance` +as Complete, and then powers the server on. If multiple `ServerMaintenance` objects exist, the next pending one starts +next; otherwise, the `Server` returns to its previous operational state. diff --git a/mkdocs.yml b/mkdocs.yml index dc84df9..7d0d06f 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -56,6 +56,7 @@ nav: - Servers: concepts/servers.md - ServerBootConfigurations: concepts/serverbootconfigurations.md - ServerClaims: concepts/serverclaims.md + - ServerMaintenance: concepts/servermaintenance.md - Usage: - metalctl: usage/metalctl.md - Development Guide: