EPIC: Build a Kubernetes Operator capable of deploying and managing a Block Node and related components through the full lifecycle. #546

jsync-swirlds · 2025-01-27T18:53:34Z

Epic Goal

Produce a fully functional Kubernetes Operator implementing at least "Level IV" capabilities.

Major Tasks

Create a Level I Operator

Create the basic operator using the operator framework SDK.
Add the capability to install a Block Node
Add the capability to observe a newly installed Block Node until it reaches a healthy state.
Add the capability to convey readiness of each deployed Block Node via the custom resource status block.
Add the capability to set, monitor, and update configuration for each deployed Block Node
- This includes keeping configurations synchronized to the CRD spec block
- Ensure all valid configuration values are fully documented and specified

Upgrade the Operator to Level II

Add the capability to upgrade a Block Node via the operator
Add the capability to upgrade all managed Block Nodes when the operator is upgraded
Add the capability to upgrade all (or any subset of) managed Block Nodes to a new Block Node version
Add the capability to upgrade versions of Block Node managed by an older version of the Operator to a version supported by the current version of the Operator.
Add the capability to Upgrade the Operator without upgrading all managed Block Nodes.
- If some managed Block Nodes are too old to manage, the Operator may upgrade them to the oldest supported version when upgrading the Operator.
Prior to upgrading, report the inability to manage versions older than the supported range, and the pending upgrade of those versions, via the CRD status block.

Note, we may wish to use the Operator Lifecycle Manager to better support Level II and Level III capabilities

Upgrade the Operator to Level III

Add the capability to create a backup of a Block Node
Add the capability to restore a backup of a Block Node
Add the capability to orchestrate complex re-configuration flows for a Block Node

Items excluded from Level III

The following are not supported, because Block Nodes are not (currently) clustered resources with multiple instances of various components and dynamic scaling

Add the capability to add/remove members from a clustered Block Node
Add the capability to fail-over and fail-back clustered Block Nodes
Add the capability for application-aware dynamic scaling of Block Nodes

Upgrade the Operator to Level IV

Add the capability to expose useful metrics for Operator health
Add the capability to expose health and performance metrics for each Block Node
- These should be collected by the Operator and published to Open Telemetry endpoints from there
Add the capability to collect and publish "useful" alerts from managed Block Nodes
- "Useful" here refers to symptoms that are associated with end-user pain rather than trying to catch every possible way that pain could be caused. Alerts should link to relevant consoles and make it easy to figure out which component is at fault
Add the capability to emit custom events relating to alert conditions on the managed Block Nodes
Add the capability to use Operator Metering to manage cluster resource consumption.

The text was updated successfully, but these errors were encountered:

jsync-swirlds added the Epic label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Build a Kubernetes Operator capable of deploying and managing a Block Node and related components through the full lifecycle. #546

EPIC: Build a Kubernetes Operator capable of deploying and managing a Block Node and related components through the full lifecycle. #546

jsync-swirlds commented Jan 27, 2025 •

edited

Loading

EPIC: Build a Kubernetes Operator capable of deploying and managing a Block Node and related components through the full lifecycle. #546

EPIC: Build a Kubernetes Operator capable of deploying and managing a Block Node and related components through the full lifecycle. #546

Comments

jsync-swirlds commented Jan 27, 2025 • edited Loading

Epic Goal

Major Tasks

Create a Level I Operator

Upgrade the Operator to Level II

Upgrade the Operator to Level III

Items excluded from Level III

Upgrade the Operator to Level IV

jsync-swirlds commented Jan 27, 2025 •

edited

Loading