Vector crashes on k8s if the sink goes down #22498
Replies: 1 comment
-
On a related note, I've been testing the integration between http client <> Vector http server sink <> Vector Elasticsearch sink and found something that I find rather interesting. According to the Elasticsearch sink buffer doc, events are flushed either when the timeout expires or when the batch size is >= max bytes/events. But what I'm seeing right now is that Vector, at times, doesn't send a batch until another request is sent with another batch. I've enabled http logging on the Elasticsearch side to verify if requests are coming through, and that's how I discovered this. I enabled debug logging in Vector to try to understand what's going on. Events pushed log
Events not pushed log
But I wasn't able to figure out why the events weren't pushed based on the logs. All I can see is that when they were, the logs show that a connection is established with Elasticsearch and events are sent. I'm not sure if this is expected behaviour or if there's some issue somewhere (in Vector or somewhere else). |
Beta Was this translation helpful? Give feedback.
-
I'm in a bit of a pickle. I'm running vector as an aggregator on k8s, configured to consume data from a http server source and push to an Elasticsearch sink.
After running a few tests, it looks like if the sink goes down, but data is still flowing in, Vector pods will eventually crash and show the following error:
I've found the following related issues:
But I'm a bit confused about it. Mainly because as soon as the sink goes down, the sources will start seeing non-200 http status codes, which to me feels like the the buffering mechanism does not work or it's not intended to work in this case. If it would, I should see 200 since the request made it through to vector and should only start returning non-200 after the buffer is full and can no longer take anymore data.
I'm just looking for some clarification on what exactly is the expected behaviour:
I use the vector helm chart with the following config:
Config
NOTE: On the source side, we have agents pushing to the http endpoint. As soon as a non-200 occurs, they will stop sending new events and keep retrying (at the first failure) until it receives 200, then it continues pushing. There's also a buffering mechanism on the agent side.
Beta Was this translation helpful? Give feedback.
All reactions