Skip to content

Commit

Permalink
Add ServerMaintenance CRD
Browse files Browse the repository at this point in the history
  • Loading branch information
afritzler committed Dec 18, 2024
1 parent 3f646dd commit e50974c
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 0 deletions.
76 changes: 76 additions & 0 deletions docs/concepts/servermaintenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# ServerMaintenance

`ServerMaintenance` represents a maintenance operation for a physical server. It transitions a `Server` from its
current operational state (e.g., Available/Reserved) into a Maintenance state. Each `ServerMaintenance` object tracks the lifecycle of a maintenance task, ensuring servers are properly taken offline, updated, and restored.

## Key Points

- `ServerMaintenance` is namespaced and may represent various maintenance operations.
- Only one `ServerMaintenance` can be active per `Server` at a time. Others remain pending.
- When the active `ServerMaintenance` completes, the next pending one (if any) starts.
- If no more maintenance tasks are pending, the `Server` returns to its previous operational state.
- `policy` determines how maintenance starts:
- **OwnerApproval:** Requires a label (e.g., `ok-to-maintenance: "true"`) on the `ServerClaim`.
- **Forceful:** Does not require owner approval.

## Workflow

1. A separate operator (e.g., `foo-maintenance-operator`) or user creates a `ServerMaintenance` resource referencing a
specific `Server`.
2. If `policy` is `OwnerApproval` and no `ok-to-maintenance` label is set on the `ServerClaim`, the `ServerMaintenance`
stays in `Pending`. The `Server` also remains unchanged.
3. If `policy` is `OwnerApproval` and the `ok-to-maintenance` label is present (or if `alwaysPerformMaintenance` is
enabled), or if the policy is `Forceful`, the `metal-operator` transitions the `Server` into `Maintenance` and
updates the `ServerMaintenance` state accordingly.
4. The maintenance operator powers off the `Server`, applies a `ServerBootConfiguration` (if needed), performs the
maintenance, and sets `ServerMaintenance` to `Complete`.
5. The `metal-operator` transitions the `Server` back to its prior state. If additional `ServerMaintenance` objects are
pending, the next one is processed.

## Sequence Diagram

```mermaid
sequenceDiagram
participant Ext as External entity (e.g. foo-maintenance-operator)
participant SC as ServerClaim
participant MO as metal-operator
participant SM as ServerMaintenance
participant S as Server
participant SBC as ServerBootConfiguration
Ext->>MO: Create SM (ServerMaintenance) resource for S
MO->>SC: Check for ok-to-maintenance label (if policy=OwnerApproval)
alt Policy = OwnerApproval AND no ok-to-maintenance label
Note over MO: SM stays Pending, Server remains unchanged
else Policy = OwnerApproval AND label present OR Policy = Forceful
MO->>S: Update Server state to "InMaintenance"
MO->>SM: Set SM state to "InMaintenance"
Ext->>S: Power off Server
Ext->>MO: Create SBC and reference in Server
Note over Ext: Perform maintenance
Ext->>SM: Update SM state to "Complete"
MO->>S: Return Server to Available/Reserved
Ext->>S: Power on Server
end
```

## Example

```yaml
apiVersion: metal.ironcore.dev/v1alpha1
kind: ServerMaintenance
metadata:
name: bios-update
namespace: ops
spec:
policy: OwnerApproval
serverRef:
name: server-foo
status:
state: Pending
```
If `policy: OwnerApproval` and no `ok-to-maintenance` label exists on the `ServerClaim`, this `ServerMaintenance`
remains `Pending`, and the `Server` stays as is. Once the label is added (or if the operator setting
`alwaysPerformMaintenance` is enabled), the `metal-operator` transitions the `Server` to `InMaintenance`, and the
maintenance operator performs the maintenance task.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ nav:
- Servers: concepts/servers.md
- ServerBootConfigurations: concepts/serverbootconfigurations.md
- ServerClaims: concepts/serverclaims.md
- ServerMaintenance: concepts/servermaintenance.md
- Usage:
- metalctl: usage/metalctl.md
- Development Guide:
Expand Down

0 comments on commit e50974c

Please sign in to comment.