Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ServerMaintenance API type #208

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 76 additions & 0 deletions docs/concepts/servermaintenance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# ServerMaintenance

`ServerMaintenance` represents a maintenance operation for a physical server. It transitions a `Server` from its
current operational state (e.g., Available/Reserved) into a Maintenance state. Each `ServerMaintenance` object tracks the lifecycle of a maintenance task, ensuring servers are properly taken offline, updated, and restored.

## Key Points

- `ServerMaintenance` is namespaced and may represent various maintenance operations.
- Only one `ServerMaintenance` can be active per `Server` at a time. Others remain pending.
- When the active `ServerMaintenance` completes, the next pending one (if any) starts.
- If no more maintenance tasks are pending, the `Server` returns to its previous operational state.
- `policy` determines how maintenance starts:
- **OwnerApproval:** Requires a label (e.g., `ok-to-maintenance: "true"`) on the `ServerClaim`.
- **Forceful:** Does not require owner approval.

## Workflow

1. A separate operator (e.g., `foo-maintenance-operator`) or user creates a `ServerMaintenance` resource referencing a
specific `Server`.
2. If `policy` is `OwnerApproval` and no `ok-to-maintenance` label is set on the `ServerClaim`, the `ServerMaintenance`
stays in `Pending`. The `Server` also remains unchanged.
3. If `policy` is `OwnerApproval` and the `ok-to-maintenance` label is present (or if `alwaysPerformMaintenance` is
enabled), or if the policy is `Forceful`, the `metal-operator` transitions the `Server` into `Maintenance` and
updates the `ServerMaintenance` state accordingly.
4. The maintenance operator powers off the `Server`, applies a `ServerBootConfiguration` (if needed), performs the
maintenance, and sets `ServerMaintenance` to `Complete`.
5. The `metal-operator` transitions the `Server` back to its prior state. If additional `ServerMaintenance` objects are
pending, the next one is processed.

## Sequence Diagram

```mermaid
sequenceDiagram
participant Ext as External entity (e.g. foo-maintenance-operator)
participant SC as ServerClaim
participant MO as metal-operator
participant SM as ServerMaintenance
participant S as Server
participant SBC as ServerBootConfiguration

Ext->>MO: Create SM (ServerMaintenance) resource for S
MO->>SC: Check for ok-to-maintenance label (if policy=OwnerApproval)
alt Policy = OwnerApproval AND no ok-to-maintenance label
Note over MO: SM stays Pending, Server remains unchanged
else Policy = OwnerApproval AND label present OR Policy = Forceful
MO->>S: Update Server state to "InMaintenance"
MO->>SM: Set SM state to "InMaintenance"
Ext->>S: Power off Server
Ext->>MO: Create SBC and reference in Server
Note over Ext: Perform maintenance
Ext->>SM: Update SM state to "Complete"
MO->>S: Return Server to Available/Reserved
Ext->>S: Power on Server
end
```

## Example

```yaml
apiVersion: metal.ironcore.dev/v1alpha1
kind: ServerMaintenance
metadata:
name: bios-update
namespace: ops
spec:
policy: OwnerApproval
serverRef:
name: server-foo
status:
state: Pending
```

If `policy: OwnerApproval` and no `ok-to-maintenance` label exists on the `ServerClaim`, this `ServerMaintenance`
remains `Pending`, and the `Server` stays as is. Once the label is added (or if the operator setting
`alwaysPerformMaintenance` is enabled), the `metal-operator` transitions the `Server` to `InMaintenance`, and the
maintenance operator performs the maintenance task.
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ nav:
- Servers: concepts/servers.md
- ServerBootConfigurations: concepts/serverbootconfigurations.md
- ServerClaims: concepts/serverclaims.md
- ServerMaintenance: concepts/servermaintenance.md
- Usage:
- metalctl: usage/metalctl.md
- Development Guide:
Expand Down
Loading