-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkerd-destination: unable to connect to validator #11597
Comments
@matthiasdeblock hi, sounds like the validator is detecting erroneous configuration in your network stack. The validator is attempting to connect to a server it creates in order to test iptables destination re-writing works as expected. I see that you're using Cilium. We have a cluster configuration section in our docs aimed at getting Linkerd to work with Cilium. Their socket level load balancing capability can sometimes mess up routing for other services. Can you check if that's affecting you here? |
Hi @mateiidavid |
If you think linkerd-cni is the culprit, I'd suggest having a look at some logs. Specifically:
I'd perhaps start with the last one if it's easy. It might be that the configuration wasn't appended properly for some reason. |
I'll give it a retry next week. I did check all these but I'll give it another look:
I'll verify this by the beginning of next week. Regards |
@matthiasdeblock Any joy retrying this? |
@matthiasdeblock Happy new year! Still curious if you got a chance to retry things? 🙂 |
Hi |
Hi, As a colleague of @matthiasdeblock i'd like to give some extra info about this issue:
Looks like the config doesn't get written to the file, contents of /etc/cni/net.d/05-cilium.conflist
|
Fixes linkerd/linkerd2#11597 When the cni plugin is triggered, it validates that the proxy has been injected into the pod before setting up the iptables rules. It does so by looking for the "linkerd-proxy" container. However, when the proxy is injected as a native sidecar, it gets added as an _init_ container, so it was being disregarded here. We don't have integration tests for validating native sidecars when using linkerd-cni because [Calico doesn't work in k3s since k8s 1.27](k3d-io/k3d#1375), and we require k8s 1.29 for using native sidecars. I did nevertheless successfully test this fix in an AKS cluster.
Hi As our cluster is air-gapped I noticed the 1.1.1.1 as connection address isn't correct. I've fixed this in our helm chart and we are now getting a bit further but still running into an error:
(using the kubernetes api IP to connect to) So it now connects but is still throwing an error. Regards |
Thanks @matthiasdeblock for circling back. With version 2.14.9 we have added a cni-repair-controller component that should detect race conditions between the cluster's cni and linkerd-cni. You can enable it via the linkerd2-cni chart value repairController.enabled=true. |
Hi The cni-repair-controller just keeps restarting the linkerd control plane. This isn't fixing the issue. You have linked linkerd/linkerd2-proxy-init#362 as well, can this be the issue we are running into? Regards |
I linked linkerd/linkerd2-proxy-init#362 by mistake. That should be unrelated unless you're using native sidecars too. |
Hi
|
@mateiidavid , any news on this one? |
@matthiasdeblock sorry, I think this was closed automatically when I hit the merge button on the PR above. Since it did not fix your issue, I'm going to re-open this. |
Hi @mateiidavid I have changed the timeout from 10s to 60s and now I am getting a different error:
So, it is still the same connecting address 172.24.214.93:6443 which is our kubernetes-api but it is now throwing another error... Thank you! |
Hi I have changed linkerd to the latest edge-24.5.5 and CNI to 1.5.0. Also have been putting the timeout to 30s. Still the same issue:
|
Hi I've been looking into this myself a bit better and I found the issue here. It seems like the Cilium needed this config:
What this means: make Cilium take ownership over the /etc/cni/net.d directory on the node, renaming all non-Cilium CNI configurations to *.cilium_bak. This ensures no Pods can be scheduled using other CNI plugins during Cilium agent downtime. |
Thanks for the feedback @matthiasdeblock ! I've confirmed the fix and pushed some updates to our docs. |
* Add notes about Cilium's exclusive mode Closes linkerd/linkerd2#11597 Co-authored-by: Flynn <[email protected]> Co-authored-by: William Morgan <[email protected]>
What is the issue?
Hi
After installing linkerd-cni. the Linkerd pods are unable to start due to the following error:
How can it be reproduced?
Install linkerd-cni and linkerd on a flatcar kubernetes 1.28.3 cluster with cilium as CNI.
Logs, error output, etc
output of
linkerd check -o short
Environment
Possible solution
No response
Additional context
No response
Would you like to work on fixing this bug?
maybe
The text was updated successfully, but these errors were encountered: