-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduling simulation seems to take previous antiAffinity "topologyKey" instead of new updated one #1771
Comments
This issue is currently awaiting triage. If Karpenter contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
I couldn’t find any information about ·failure-domain.beta.kubernetes.io/hostname·, so your concern might be invalid. However, I found similar keys that have been deprecated. u might want to refer to: https://kubernetes.io/docs/reference/labels-annotations-taints/#failure-domainbetakubernetesioregion |
@Vacant2333 thanks for your help However, I do not understand why, when I fix it by replacing it by the valid one Seems like a bug to me but I am curious to know your thoughts on this |
hi, can u show me the logs in detail, i will try to find the resone~ |
Sure @Vacant2333 here are the logs from karpenter pod just after I edit the deployment to update the topologyKey to
the relevant part being the one from my first message :
and the Events from the new "Pending" pod are : Why karpenter still mentions the previous topologyKey instead of the new one...? |
Did you find out what the reason is? It’s strange because Karpenter uses the latest pod information for each simulation scheduling. Has the old pod been completely deleted? i cant find the resaon on my enviorment cause cant reproduce |
@Vacant2333 no we still have the issue and did not find the reason. We have the issue right now :
and if I look to the corresponding deployment, the topologyKey is the right one (
|
Additionnal information : on the node that karpenter does not manage to consolidate, if I do a "kubectl drain" (without force), the node is correctly drained |
Description
Observed Behavior:
A "debug" nodepool is configured with a taint.
A deployment is deployed to this nodepool with :
failure-domain.beta.kubernetes.io/hostname
=> why ? it is an actual use case where, investigating issues in karpenter nodes replacement after expiration, we found out that some of our users were still using this deprecated topologyKey in their antiaffinity config. As they are not able to fix this for now (production constraints), we are trying to find a way to unblock node replacement :failure-domain.beta.kubernetes.io/hostname
) by the valid one (kubernetes.io/hostname
) in one deployment but here again, karpenter is not creating a new node. Hence the current issue.Deployment :
Karpenter is creating a node "debug" and the pod is scheduled there.
Only 1 "debug" node is existing for now.
We edit the deployment to update the topologyKey :
topologyKey: failure-domain.beta.kubernetes.io/hostname
replaced by
topologyKey: kubernetes.io/hostname
A rolling update is triggered :
It is like karpenter still takes the old label into consideration for scheduling simulation whereas we fixed it with a valid one.
The new pod stays blocked in "Pending" state and the rolling update cannot succeed...
Expected Behavior:
I would understand that karpenter do not want to create a node because of the deprecated label but, after we fixed this label, I would expect karpenter to create a new "debug" node due to the antiAffinity for the new pod to be able to be scheduled to it.
Reproduction Steps (Please include YAML):
debug-deploy.yaml
Initial status : no "debug" node is present
Karpenter create a nodeClaim for the "debug" nodepool
Wait for a debug node to be up and the debug pod to be running on it
The new pod is in pending
Karpenter does not create a new nodeClaim for a debug node and logs this
Undo the rollout.
Edit the deployment and replace the toplogyKey by the valid one
topologyKey: kubernetes.io/hostname
(comment the deprecated one, uncomment the valid one)
This triggers a new rollout.
But the new pod stays in pending
Karpenter does not create a new nodeClaim for a debug node and logs this
NOTES :
So it seems that karpenter is not taken into account the new topologyKey in its schedule simulation when we edit it after creation...?
Versions:
kubectl version
): 1.27.14The text was updated successfully, but these errors were encountered: