Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add possibility to trigger / suspend a Shoot reconciliation by KIM #605

Open
6 tasks
tobiscr opened this issue Jan 14, 2025 · 2 comments
Open
6 tasks

Add possibility to trigger / suspend a Shoot reconciliation by KIM #605

tobiscr opened this issue Jan 14, 2025 · 2 comments
Assignees

Comments

@tobiscr
Copy link
Contributor

tobiscr commented Jan 14, 2025

Description

We saw cases where a cluster reconciliation was required, but KIM is currently not supporting to trigger or suspend an reconciliation from outside.

We agreed to add this possibility by adding an annotation to a RuntimeCR. KIM has to consider this annotation and reconcile the cluster. KIM has to consider the annotation only for existing clusters - cluster which have to be created/deleted don't have to consider this annotation.

Use-Case for triggering a cluster reconciliation:

  1. SRE / On-call engineer adds an annotation to the RuntimeCR (e.g. operator.kyma-project.io/force-shoot-reconciliation: true)
  2. KIM reconciles the Shoot spec
  3. KIM removes the annotation after reconciliation is completed

Use-Case for suspending a cluster reconciliation:

  1. SRE / On-call engineer adds an annotation to the RuntimeCR (e.g. operator.kyma-project.io/suspend-shoot-reconciliation: true)
  2. KIM will not reconcile the cluster as long as the annotation exists
  3. SRe removes the annotation from RuntimeCR
  4. KIM will consider the cluster for reconciliations again

AC:

  • KIM reacts on an annotation (e.g. operator.kyma-project.io/force-shoot-reconciliation: true) and reconciles a Shoot spec
    • After the reconciliation is completed, KIM removes the annotation
  • KIM reacts on an annotation (e.g. operator.kyma-project.io/suspend-shoot-reconciliation: true) and will not reconcile a Shoot spec
    • When annotation gets removed, KIM will consider the cluster for reconciliations again
    • Extend the KIM monitoring metrics to report also the amount of currently suspended clusters
    • Add a counter to our KIM dashboard which shows how many clusters are currently suspended from reconciliation
@tobiscr tobiscr assigned koala7659 and unassigned koala7659 Jan 17, 2025
@tobiscr
Copy link
Contributor Author

tobiscr commented Jan 17, 2025

first PR prepared: #606

@tobiscr
Copy link
Contributor Author

tobiscr commented Jan 17, 2025

Taken over from parallel created issue by @Disper :

Description

When the cluster reaches the failed state, it will not become ready, unless we manually, either:

  1. change the list of administrators
  2. change the value of the shoot spec

which might be problematic for PROD where customers might want us to mess around with their shoot specification or administrators list.

It should be possible to force reconciliation e.g. by adding an annotation to the Runtime CR that would trigger the reconciliation (but only once!).

Attachments
.status.state of the Runtime CR before manual fixes.

    "status": {
        "conditions": [
            {
                "lastTransitionTime": "2025-01-08T12:10:08Z",
                "message": "ERR_INFRA_QUOTA_EXCEEDED",
                "reason": "ProcessingErr",
                "status": "False",
                "type": "Provisioned"
            },
            {
                "lastTransitionTime": "2025-01-07T05:11:14Z",
                "message": "Gardener Cluster CR is ready.",
                "reason": "GardenerClusterCRReady",
                "status": "True",
                "type": "KubeconfigReady"
            },
            {
                "lastTransitionTime": "2025-01-07T05:11:15Z",
                "message": "OIDC configuration completed",
                "reason": "OidcConfigured",
                "status": "True",
                "type": "OidcConfigured"
            },
            {
                "lastTransitionTime": "2025-01-07T05:11:20Z",
                "message": "Cluster admin configuration complete",
                "reason": "AdministratorsConfigured",
                "status": "True",
                "type": "Configured"
            }
        ],
        "state": "Failed"

Example logs from the reconciliation that exists as there is nothing to patch:

{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"kcp-system/e2...994"}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"Reconciling Runtime","runtimeID":"e2...994","shootName":"c-...","requestID":1041,"Name":"e2...94","Namespace":"kcp-system"}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"Take snapshot state","runtimeID":"e2...4","shootName":"c-...","requestID":1041}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"switching state from github.com/kyma-project/infrastructure-manager/internal/controller/runtime/fsm.sFnTakeSnapshot to github.com/kyma-project/infrastructure-manager/internal/controller/runtime/fsm.sFnInitialize","runtimeID":"e2...94","shootName":"c-...","requestID":1041,"result":null,"err":null,"mFnIsNill":false}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"Gardener shoot exists, processing","runtimeID":"e2...994","shootName":"c-...","requestID":1041}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"switching state from github.com/kyma-project/infrastructure-manager/internal/controller/runtime/fsm.sFnInitialize to github.com/kyma-project/infrastructure-manager/internal/controller/runtime/fsm.sFnSelectShootProcessing","runtimeID":"e2...4","shootName":"c-...","requestID":1041,"result":null,"err":null,"mFnIsNill":false}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"Select shoot processing state","runtimeID":"e2...94","shootName":"c-...","requestID":1041}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"Stopping processing reconcile, exiting with no retry","runtimeID":"e2...4","shootName":"c-...","requestID":1041,"RuntimeCR":"e2...94","shoot":"c-...","function":"sFnSelectShootProcessing"}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"switching state from github.com/kyma-project/infrastructure-manager/internal/controller/runtime/fsm.sFnSelectShootProcessing to ","runtimeID":"e2...4","shootName":"c-...","requestID":1041,"result":null,"err":null,"mFnIsNill":true}
{"level":"info","ts":"2025-01-13T12:30:41Z","msg":"reconciliation done","runtimeID":"e2...94","shootName":"c-...","requestID":1041,"error"

@tobiscr tobiscr changed the title Add possibility to trigger a Shoot reconciliation by KIM Add possibility to trigger / cancel a Shoot reconciliation by KIM Jan 17, 2025
@tobiscr tobiscr changed the title Add possibility to trigger / cancel a Shoot reconciliation by KIM Add possibility to trigger / suspend a Shoot reconciliation by KIM Jan 17, 2025
@Disper Disper self-assigned this Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants