Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update health checks to use Kube-Proxy when no other configuration is provided #622

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

JoelSpeed
Copy link
Contributor

What type of PR is this?

/kind bug

What this PR does / why we need it:

When a service is using the default configuration (ie they haven't changed any of port, path or protocol via annotations), this updates the health check to, rather then checking the traffic port, check the kube-proxy health endpoint.

This is deliberately done so that if a user has specified any change to the port/path/protocol, we do not change the behaviour of their health checks. But if they have no opinion at all, we give them an improved health check.

Which issue(s) this PR fixes:

KEP 3458 was added to the AWS provider in release 1.27. This means now that a nodes presence in the load balancer backends, is no longer determined by whether the instance is healthy or not.

We have observed long periods of disruption (15-20s) during upgrades with the AWS provider since this went in. The issue here is because the health check is checking the traffic port, and not the nodes ability to route traffic to the backend.

This is specifically for exxternalTrafficPolicy: Cluster. In this topology, all nodes in the cluster accept connections for the backend for the service, and then route that internally. So, unless all endpoints are down, the traffic port that is exposed will always return a healthy result. When a node is going away, it's ability to route the traffic from the traffic port to the backend, is based on whether kube-proxy is running or not. In fact, so much so, that actually, it makes more sense to health check kube-proxy.

When kube-proxy is being terminated, assuming you have configured graceful shutdown, it will return bad healthchecks for a period before shutting down. This allows the AWS health check to observe the node is going away, before it loses the ability to route traffic to the backend.
In the current guise, it only fails health checks after the node can no longer route traffic.

I did a lot of research into this earlier in the year and wrote up kubernetes-sigs/cloud-provider-azure#3499 which explains the problem in a little more depth, note as well, that GCP uses kube-proxy based health checks in this manner and we are working to move Azure over too.

There's also KEP-3836 which suggests this is the way to go forward as well.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Update health checks to use Kube-Proxy when no other configuration is provided

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Jun 28, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot requested a review from hakman June 28, 2023 11:04
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hakman for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 28, 2023
@hakman
Copy link
Member

hakman commented Jun 28, 2023

/cc @dims

@k8s-ci-robot k8s-ci-robot requested a review from dims June 28, 2023 12:41
@dims
Copy link
Member

dims commented Jun 28, 2023

@kmala can you please review this?

/assign @kmala

@k8s-ci-robot
Copy link
Contributor

@dims: GitHub didn't allow me to assign the following users: kmala.

Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

@kmala can you please review this?

/assign @kmala

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@JoelSpeed
Copy link
Contributor Author

/retest

To check this out, I created a cluster manually and ran the test suite, worked for me there so I think this may be a flake in this case.

Could also see that it was creating the health check as expected.

I've been testing this change as well in OpenShift and the CI we run for disruption of workload services is looking good with this patch included

@haoranleo
Copy link
Contributor

/retest

@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 29, 2023
@kmala
Copy link
Member

kmala commented Jul 5, 2023

/test pull-cloud-provider-aws-e2e

@cartermckinnon
Copy link
Contributor

cartermckinnon commented Jul 20, 2023

/retest

failure seemed like a test-infra issue, giving it another go before I dig in.

@JoelSpeed
Copy link
Contributor Author

/retest

portModified := parseStringAnnotation(svc.Annotations, ServiceAnnotationLoadBalancerHealthCheckPort, &hc.Port)

// For a non-local service, we override the health check to use the kube-proxy port when no other overrides are provided.
// The kube-proxy port should be open on all nodes and allows the health check to check the nodes ability to proxy traffic.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kube-proxy port should be open on all nodes and allows the health check to check the nodes ability to proxy traffic.

My guess is that the assumption is not correct and the 10256 port is blocked by kOps.
You can get an idea of what security group rules are created by default:
https://github.com/kubernetes/kops/blob/master/tests/integration/update_cluster/aws-lb-controller/kubernetes.tf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewing the TF you've linked, I can't see where ingress is allowed for any nodeport to the workers, unless I've misread, it looks like users have to manually allow the ports to the worker nodes to allow Service Of Type Load Balancer already?

My comment here was more, every node will open this port, because every node should be running kube-proxy or an equivalent substitute. Whether the security group rules are configured or not wasn't intended to be captured

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nodes open ports to each other, but not to any other IP. I think currently AWS CCM add the necessary rules for node ports, so port 10256 should be added there.
We can also test by opening node ports during testing, by adding --set cluster.spec.nodePortAccess=0.0.0.0/0 here:

--create-args="--dns=none --zones=${ZONES} --node-size=m5.large --master-size=m5.large --override=cluster.spec.cloudControllerManager.cloudProvider=aws --override=cluster.spec.cloudControllerManager.image=${IMAGE_NAME}:${IMAGE_TAG}" \

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack that's helpful, I'll add that hack for now and investigate updating to set up the rules correctly via CCM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this nodePortAccess seems to have not worked, latest CI result suggests that is creating a duplicate rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems that the security group for the ELB is creating an allow all rule anyway, do you agree?

allProtocols := "-1"
permission := &ec2.IpPermission{}
permission.IpProtocol = &allProtocols
permission.UserIdGroupPairs = []*ec2.UserIdGroupPair{sourceGroupID}

I spun up a cluster and can see that all traffic is allowed between the load balancer security group and the worker security group, the health checks are working, so perhaps theres some other issue at play with the E2E?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this nodePortAccess seems to have not worked, latest CI result suggests that is creating a duplicate rule?

You are right, not sure how we can work around that.

Hmm, seems that the security group for the ELB is creating an allow all rule anyway, do you agree?

I see you are looking at the Classic LB related function. Look for updateInstanceSecurityGroupsForNLB().

I spun up a cluster...

How did you do that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the NLB, and tested it live, the NLB version does the same thing security group wise.

How did you do that?

Not using KOPS, but I had an AWS cluster that I could test this all on by other means.
I'll see if I can use the script for the run e2e job to spin up a cluster using KOPS

Comment on lines 70 to 73

// We need the port as both string and int.
kubeProxyHealthCheckPort = "10256"
kubeProxyHealthCheckPortInt = int32(10256)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, it should be pretty easy to do strconv.Itoa(kubeProxyHealthCheckPort) where needed.

Suggested change
// We need the port as both string and int.
kubeProxyHealthCheckPort = "10256"
kubeProxyHealthCheckPortInt = int32(10256)
kubeProxyHealthCheckPort = int32(10256)

@JoelSpeed
Copy link
Contributor Author

JoelSpeed commented Aug 15, 2023

So on further investigation, it seems the issue right now is that when kops configures Cilium to replace kube-proxy, it doesn't tell it to replace kube-proxies health check endpoint by default, and Cilium by default doesn't expose the health check. Strikes me as odd, that, when you have strict kube-proxy replacement, you wouldn't want to emulate the health check endpoint that is expected of kube-proxy.

Manually configuring the healthz port on cilium to 0.0.0.0:10256 allows the tests to pass.

@JoelSpeed
Copy link
Contributor Author

@hakman Coming back to this, I think we may want a softer approach to this change. I think it's the correct direction to take, but, in discovering that kops isn't configuring Cilium "correctly" right now, and having worked through the same change with the Azure folks, I'm going to propose another approach.

With Azure, we have added a way for users to choose between individual and shared health checks. The default for now being the individual health checks. That maintains the existing behaviour but for those in the know, they can update their health checks to use the new shared behaviour.

The intention there is to then go through a cycle of announcements and change the default over time, giving users the chance to make sure they have the kube-proxy health checks configured, and giving them a way to retain the old, broken behaviour if they desire.

Do you agree with moving forward with the same pattern here? If so, I'll get the PR updated to reflect what was done in Azure so there's a consistent experience across the two

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle stale
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 14, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the PR is closed

You can:

  • Mark this PR as fresh with /remove-lifecycle rotten
  • Close this PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 13, 2024
@JoelSpeed
Copy link
Contributor Author

/remove-lifecycle rotten

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 13, 2024
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Aug 19, 2024
@JoelSpeed
Copy link
Contributor Author

@kmala I've updated this now to reflect the changes that we made in the Azure provider, where, the health checks are configurable in the cloud config level, as to whether we should use Shared (new) or the old ServiceNodePort health check options.

This will allow users to configure how they want, but keeps the existing behaviour as the default.

I suggest over time we move the default across to the other and make the ServiceNodePort behaviour a more legacy option

@JoelSpeed
Copy link
Contributor Author

Anyone around who can help get this across the line?

This is adding an opt-in way to move health checks towards an agreed KEP that means that the health checks actually make sense, where today they don't actually work and don't gracefully drain nodes correctly.

In the future we should look to make these health checks the default, but this gives us the switch to allow people to start using it, before we make that default switch in the future.

I'm curious if the ALBO has implemented this KEP yet?

@kmala
Copy link
Member

kmala commented Nov 8, 2024

will take a look at this today

@kmala
Copy link
Member

kmala commented Nov 8, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2024
@kmala
Copy link
Member

kmala commented Nov 8, 2024

I'm curious if the ALBO has implemented this KEP yet?

I don't think so, @oliviassss can you confirm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants