Consistent error "proxy::h2 ping error: broken pipe" from ztunnel containers across cluster #1346

jessebye · 2024-10-25T20:19:42Z

We have rolled out Istio with ambient mode enabled, and observed that there is a consistent large volume of this error being logged by the ztunnel container:

2024-10-25T19:47:50.381780Z	error	proxy::h2	ping error: broken pipe

In just the past 24 hours we observed this error happening 2.29K times (across 40-50 nodes).

Possibly unrelated, but we also discovered this error happening regularly:

2024-10-25T18:09:30.842925Z	error	access	connection complete	src.addr=10.0.81.228:58446 src.workload="pronode-7xxxxxx-xxxx" src.namespace="services" src.identity="spiffe://cluster.local/ns/services/sa/pronode" dst.addr=10.0.x.x:15008 dst.hbone_addr=10.0.x.x:8480 dst.service="portfolios-graphql.services.svc.cluster.local" dst.workload="portfolios-graphql-6xxxxxxx-xxxxx" dst.namespace="services" dst.identity="spiffe://cluster.local/ns/services/sa/portfolios-graphql" direction="outbound" bytes_sent=5226 bytes_recv=19440 duration="1546750ms" error="while closing connection: send: io error: stream closed because of a broken pipe"

We see corresponding 502 errors being logged from our services for these requests. We only began to observe the elevated 502 rate after switching to Ambient mode.

The text was updated successfully, but these errors were encountered:

howardjohn · 2024-10-25T20:20:51Z

Thanks for the report. SEe https://github.com/istio/istio/wiki/Troubleshooting-Istio-Ambient#scenario-ztunnel-logs-hbone-ping-timeouterror-and-ping-timeout

jessebye · 2024-10-25T20:25:17Z

@howardjohn we are on Istio 1.23.2 though, so shouldn't be seeing those errors? Also note it's not a ping timeout, it's broken pipe.

howardjohn · 2024-10-25T20:28:50Z

Thanks for the correction. That does seem different then.

Focusing on this while closing connection: send: io error: stream closed because of a broken pipe" error since it gives a bit more info... this means while we were attempting to close the connection we found it was already closed. This is after ~25minutes, so its a long live connection.

Do we have any info on what is going on in the destination side? It seems plausible the destination app/destination ztunnel shutdown?

jessebye · 2024-10-25T20:30:58Z

Ok, yes I think these are long running requests timing out because the pod went away or something like that. In which case this is probably not a problem for ztunnel. However I am still wondering about those h2 pings that get a broken pipe error.

howardjohn · 2024-10-25T20:34:47Z

I suspect those are the similar cause. The backend is closing as we try to send a ping?

jessebye · 2024-10-25T20:39:10Z

That would be surprising, given it is happening so frequently. I know we have pods going up and down every few minutes maybe, but not 95x per hour 🤔

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consistent error "proxy::h2 ping error: broken pipe" from ztunnel containers across cluster #1346

Consistent error "proxy::h2 ping error: broken pipe" from ztunnel containers across cluster #1346

jessebye commented Oct 25, 2024

howardjohn commented Oct 25, 2024

jessebye commented Oct 25, 2024

howardjohn commented Oct 25, 2024

jessebye commented Oct 25, 2024

howardjohn commented Oct 25, 2024

jessebye commented Oct 25, 2024

Consistent error "proxy::h2 ping error: broken pipe" from ztunnel containers across cluster #1346

Consistent error "proxy::h2 ping error: broken pipe" from ztunnel containers across cluster #1346

Comments

jessebye commented Oct 25, 2024

howardjohn commented Oct 25, 2024

jessebye commented Oct 25, 2024

howardjohn commented Oct 25, 2024

jessebye commented Oct 25, 2024

howardjohn commented Oct 25, 2024

jessebye commented Oct 25, 2024