Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EPIC: Build a Kubernetes Operator capable of deploying and managing a Block Node and related components through the full lifecycle. #546

Open
22 tasks
jsync-swirlds opened this issue Jan 27, 2025 · 0 comments
Labels

Comments

@jsync-swirlds
Copy link
Member

jsync-swirlds commented Jan 27, 2025

Epic Goal

Produce a fully functional Kubernetes Operator implementing at least "Level IV" capabilities.

Major Tasks

Create a Level I Operator

  • Create the basic operator using the operator framework SDK.
  • Add the capability to install a Block Node
  • Add the capability to observe a newly installed Block Node until it reaches a healthy state.
  • Add the capability to convey readiness of each deployed Block Node via the custom resource status block.
  • Add the capability to set, monitor, and update configuration for each deployed Block Node
    • This includes keeping configurations synchronized to the CRD spec block
    • Ensure all valid configuration values are fully documented and specified

Upgrade the Operator to Level II

  • Add the capability to upgrade a Block Node via the operator
  • Add the capability to upgrade all managed Block Nodes when the operator is upgraded
  • Add the capability to upgrade all (or any subset of) managed Block Nodes to a new Block Node version
  • Add the capability to upgrade versions of Block Node managed by an older version of the Operator to a version supported by the current version of the Operator.
  • Add the capability to Upgrade the Operator without upgrading all managed Block Nodes.
    • If some managed Block Nodes are too old to manage, the Operator may upgrade them to the oldest supported version when upgrading the Operator.
  • Prior to upgrading, report the inability to manage versions older than the supported range, and the pending upgrade of those versions, via the CRD status block.

Note, we may wish to use the Operator Lifecycle Manager to better support Level II and Level III capabilities

Upgrade the Operator to Level III

  • Add the capability to create a backup of a Block Node
  • Add the capability to restore a backup of a Block Node
  • Add the capability to orchestrate complex re-configuration flows for a Block Node

Items excluded from Level III

The following are not supported, because Block Nodes are not (currently) clustered resources with multiple instances of various components and dynamic scaling

  • Add the capability to add/remove members from a clustered Block Node
  • Add the capability to fail-over and fail-back clustered Block Nodes
  • Add the capability for application-aware dynamic scaling of Block Nodes

Upgrade the Operator to Level IV

  • Add the capability to expose useful metrics for Operator health
  • Add the capability to expose health and performance metrics for each Block Node
    • These should be collected by the Operator and published to Open Telemetry endpoints from there
  • Add the capability to collect and publish "useful" alerts from managed Block Nodes
    • "Useful" here refers to symptoms that are associated with end-user pain rather than trying to catch every possible way that pain could be caused. Alerts should link to relevant consoles and make it easy to figure out which component is at fault
  • Add the capability to emit custom events relating to alert conditions on the managed Block Nodes
  • Add the capability to use Operator Metering to manage cluster resource consumption.
@jsync-swirlds jsync-swirlds changed the title EPIC: Build a Kubernetes Operator capable of deploying and _managing_ a Block Node and related components through the full lifecycle. EPIC: Build a Kubernetes Operator capable of deploying and managing a Block Node and related components through the full lifecycle. Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant