Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure- returning in-memory size incorrect value when spot instance is deleted #7373

Open
magnetic5355 opened this issue Oct 9, 2024 · 3 comments
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.

Comments

@magnetic5355
Copy link

magnetic5355 commented Oct 9, 2024

Which component are you using?:cluster-autoscaler

What version of the component are you using?: 1.31

Component version: 1.31

What k8s version are you using (kubectl version)?: 1.30.5+k3s1

kubectl version Output
$ kubectl version

What environment is this in?: Azure

What did you expect to happen?: When a VMSS spot instance is deleted and the node is removed from the cluster I expect the autoscaler to invalidate its cache

What happened instead?: Schedulable pods are present, however the in-memory size is 9 but the actual VMSS set is only 7

1 filter_out_schedulable.go:78] Schedulable pods present │
│ I1009 02:24:15.536067 1 static_autoscaler.go:557] No unschedulable pods │
│ I1009 02:24:15.536082 1 azure_scale_set.go:217] VMSS: k8-agent-2, returning in-memory size: 0 │
│ I1009 02:24:15.536093 1 azure_scale_set.go:217] VMSS: k8-agent-d2ds_v5, returning in-memory size: 9

--- eventually this will start logging in a loop when the cluster tries to scale down ----

│ I1009 02:31:59.254556 1 static_autoscaler.go:756] Decreasing size of k8-agent-d2ds_v5, expected=9 current=7 delta=-2 │
│ I1009 02:31:59.254570 1 azure_scale_set_instance_cache.go:77] invalidating instanceCache for k8-agent-d2ds_v5 │
│ I1009 02:31:59.254579 1 azure_scale_set.go:217] VMSS: k8-agent-d2ds_v5, returning in-memory size: 9 │
│ I1009 02:31:59.254594 1 static_autoscaler.go:469] Some node group target size was fixed, skipping the iteration

How to reproduce it (as minimally and precisely as possible):

Setup K3S cluster (not using AKS)
Set provider ID on nodes to proper format ie aks:///
Set kubernetes.azure.com/agentpool node label
Add tags to VMSS for auto scaler
Increase workload to have autoscaler create new nodes.
Delete a VMSS instance from Azure

In memory size never refreshes, new nodes are never created.

I have to restart the cluster-autoscaler pod to scale the cluster back up

Anything else we need to know?:

@magnetic5355 magnetic5355 added the kind/bug Categorizes issue or PR as related to a bug. label Oct 9, 2024
@adrianmoisey
Copy link
Member

/kind cluster-autoscaler

@k8s-ci-robot
Copy link
Contributor

@adrianmoisey: The label(s) kind/cluster-autoscaler cannot be applied, because the repository doesn't have them.

In response to this:

/kind cluster-autoscaler

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@adrianmoisey
Copy link
Member

/area cluster-autoscaler

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cluster-autoscaler kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

3 participants