Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent error "proxy::h2 ping error: broken pipe" from ztunnel containers across cluster #1346

Open
jessebye opened this issue Oct 25, 2024 · 6 comments

Comments

@jessebye
Copy link

We have rolled out Istio with ambient mode enabled, and observed that there is a consistent large volume of this error being logged by the ztunnel container:

2024-10-25T19:47:50.381780Z	error	proxy::h2	ping error: broken pipe

In just the past 24 hours we observed this error happening 2.29K times (across 40-50 nodes).

Possibly unrelated, but we also discovered this error happening regularly:

2024-10-25T18:09:30.842925Z	error	access	connection complete	src.addr=10.0.81.228:58446 src.workload="pronode-7xxxxxx-xxxx" src.namespace="services" src.identity="spiffe://cluster.local/ns/services/sa/pronode" dst.addr=10.0.x.x:15008 dst.hbone_addr=10.0.x.x:8480 dst.service="portfolios-graphql.services.svc.cluster.local" dst.workload="portfolios-graphql-6xxxxxxx-xxxxx" dst.namespace="services" dst.identity="spiffe://cluster.local/ns/services/sa/portfolios-graphql" direction="outbound" bytes_sent=5226 bytes_recv=19440 duration="1546750ms" error="while closing connection: send: io error: stream closed because of a broken pipe"

We see corresponding 502 errors being logged from our services for these requests. We only began to observe the elevated 502 rate after switching to Ambient mode.

@howardjohn
Copy link
Member

@jessebye
Copy link
Author

@howardjohn we are on Istio 1.23.2 though, so shouldn't be seeing those errors? Also note it's not a ping timeout, it's broken pipe.

@howardjohn
Copy link
Member

Thanks for the correction. That does seem different then.

Focusing on this while closing connection: send: io error: stream closed because of a broken pipe" error since it gives a bit more info... this means while we were attempting to close the connection we found it was already closed. This is after ~25minutes, so its a long live connection.

Do we have any info on what is going on in the destination side? It seems plausible the destination app/destination ztunnel shutdown?

@jessebye
Copy link
Author

Ok, yes I think these are long running requests timing out because the pod went away or something like that. In which case this is probably not a problem for ztunnel. However I am still wondering about those h2 pings that get a broken pipe error.

@howardjohn
Copy link
Member

I suspect those are the similar cause. The backend is closing as we try to send a ping?

@jessebye
Copy link
Author

That would be surprising, given it is happening so frequently. I know we have pods going up and down every few minutes maybe, but not 95x per hour 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants