Race condition with node startupTaints being restored #1772
Labels
kind/bug
Categorizes issue or PR as related to a bug.
needs-triage
Indicates an issue or PR lacks a `triage/foo` label and requires one.
Description
Observed Behavior:
Karpenter restores the
startupTaints
to a node if it is removed too quickly at node startup. This results in a node being unusable. Node also never reaches a ready state, so Karpenter refuses to remove it:Cannot disrupt Node: state node isn't initialized
From AWS CloudWatch logs insights:
Expected Behavior:
Karpenter updates the existing taints on a node to remove
karpenter.sh/unregistered=NoExecute
without restoring startup taints removed by other controllers.Reproduction Steps (Please include YAML):
This is an unpredictable race condition that is near impossible to reproduce on demand.
Might be related to this code: https://github.com/rschalo/karpenter/blob/a652a4aa95dbe92159bb273a3b64ff8837d92660/pkg/controllers/nodeclaim/lifecycle/registration.go#L87
Versions:
1.0.6
kubectl version
):The text was updated successfully, but these errors were encountered: