Skip to content

Commit

Permalink
Add ServerMaintenance CRD
Browse files Browse the repository at this point in the history
  • Loading branch information
afritzler committed Dec 17, 2024
1 parent 3f646dd commit d6883cf
Show file tree
Hide file tree
Showing 2 changed files with 60 additions and 0 deletions.
59 changes: 59 additions & 0 deletions docs/concepts/servermaintenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# ServerMaintenance

`ServerMaintenance` represents a maintenance operation for a physical server. It transitions a `Server` from an
operational state (e.g., Available/Reserved) into a Maintenance state. Each `ServerMaintenance` object tracks the
lifecycle of a specific maintenance task, ensuring servers are properly taken offline, updated, and restored.

## Key Points

- `ServerMaintenance` is namespaced and can represent various maintenance types (e.g., BIOSUpdate, Cleanup).
- Only one `ServerMaintenance` can be active for a given `Server` at a time. Others remain pending.
- When the active `ServerMaintenance` completes, the next pending maintenance starts.
- If no more maintenance tasks are pending, the `Server` returns to its previous operational state and can be
powered back on.
- The `metal-operator` manages `ServerMaintenance` resources and updates the `Server` state accordingly.
- A maintenance-related operator (e.g., `firmware-operator`) decides if a `Server` needs maintenance, creates the
`ServerMaintenance` resource, and a corresponding `ServerBootConfiguration`, and references it as a
MaintenanceBootConfiguration in the `Server` spec. It also handles powering servers on/off.

## Workflow

1. **Determining Maintenance:**
The maintenance operator creates a `ServerMaintenance` resource for the chosen `Server`.

2. **Transition to Maintenance:**
The `metal-operator` notices the new `ServerMaintenance`, transitions the `Server` into `Maintenance` status.

3. **Boot Configuration:**
The maintenance operator creates a `ServerBootConfiguration` resource and references it in the `Server` spec as the
MaintenanceBootConfiguration. This configuration is used to e.g. boot custom tooling to perform BIOS/firmware updates
or run cleanup tasks on the `Server`.

4. **Performing Maintenance:**
The maintenance operator powers off the `Server`, performs the required maintenance task, then updates the
`ServerMaintenance` to `Complete`.

5. **Post-Maintenance:**
Once complete, if no more maintenance tasks are pending, the `metal-operator` restores the `Server` to its previous
state (e.g., Available/Reserved). The maintenance operator can then power the `Server` back on.

## Example ServerMaintenance Resource

```yaml
apiVersion: metal.ironcore.dev/v1alpha1
kind: ServerMaintenance
metadata:
name: bios-update
namespace: ops
spec:
type: BIOSUpdate
serverRef:
name: server-foo
status:
state: Pending
```
Once conditions are met, the `metal-operator` transitions the `Server` to `InMaintenance`. The maintenance operator
powers the server off, applies the maintenance boot configuration, performs the maintenance, marks `ServerMaintenance`
as Complete, and then powers the server on. If multiple `ServerMaintenance` objects exist, the next pending one starts
next; otherwise, the `Server` returns to its previous operational state.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ nav:
- Servers: concepts/servers.md
- ServerBootConfigurations: concepts/serverbootconfigurations.md
- ServerClaims: concepts/serverclaims.md
- ServerMaintenance: concepts/servermaintenance.md
- Usage:
- metalctl: usage/metalctl.md
- Development Guide:
Expand Down

0 comments on commit d6883cf

Please sign in to comment.