You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Chart name and version
chart: victoria-logs-single
version: v0.8.13
Describe the bug
TL:DR; Vector gets stuck unable to send logs to VL after restart of VL StatefulSet due to Vector using direct IP for the VL Pod, which changes after restart, without Vector being able to reconnect.
We have just deployed VictoriaLogs and Vector using the helm chart and as many default values as possible.
In the cluster (GKE) we also have our own deployment of istio (v1.24).
As defined in the helm chart, we get a StatefulSet and a headless Service (request: add -headless suffix to the Service since this stumped us for a few minutes). It does not appear to be possible to make the Service not headless (i.e. to have it obtain a ClusterIP and thus let K8s manage routing).
By default the helm chart produces config for Vector with endpoint statefulset-pod-name-0.victorialogs-namespace.svc.cluster.local. This makes Vector get the IP of statefulset-pod-name-0 directly. However if statefulset-pod-name-0 is ever restarted or rescheduled for any reason, it's IP will change. This change is not picked up by Vector (or possibly istio-proxy sidecar), causing it to be stuck unable to send logs until all Vector pods are restarted.
The logs emitted by vector look like this
2025-01-20T11:34:49.192561Z ERROR sink{component_kind="sink" component_id=vlogs component_type=elasticsearch}:request{request_id=272}: vector::sinks::elasticsearch::service: Response contained errors. error_code="http_response_503" response=Response { status: 503, version: HTTP/1.1, headers: {"content-length": "114", "content-type": "text/plain", "date": "Mon, 20 Jan 2025 11:34:48 GMT", "server": "envoy"}, body: b"upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout" }
2025-01-20T11:34:49.192619Z WARN sink{component_kind="sink" component_id=vlogs component_type=elasticsearch}:request{request_id=272}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true
I'm not sure if this is 100% Vectors fault (it's docs say it does complete reconnects every time), istio's fault (this may indicate it's not entirely unrelated: istio/istio#54539 ), or a combination.
Proposed fix
If we can have an option in the helm chart to generate a Service that is not headless. Or choose to create an additional non-headless service, this would "fix" the problem.
I'm sure you have your reasons for using a headless Service with regards to HA and clustering etc. But for the time being and VictoriaLogs being a "Single instance" service I don't see any significant drawbacks of using a regular non-headless Service.
Custom values
Relevant excerpts of our values.yaml. This contains workarounds to the problems above by overriding the default Vector endpoint to a custom VictoriaLogs Service we've deployed.
global:
cluster:
# Override since trailing dot is not understood by istiodnsDomain: cluster.localserver:
retentionPeriod: 14d# Disk space usage in gigabytesretentionDiskSpaceUsage: "25"persistentVolume:
enabled: truestorageClassName: "standard"size: 30Givector:
enabled: truepodPriorityClassName: "high-priority-preempt"updateStrategy:
type: RollingUpdaterollingUpdate:
maxUnavailable: 100%sinks:
vlogs:
# Our own custom non-headless Serviceendpoints: ["http://vls-victoria-logs-single-server-clusterip.logging-v2.svc.cluster.local:9428/insert/elasticsearch"]
The text was updated successfully, but these errors were encountered:
Chart name and version
chart: victoria-logs-single
version: v0.8.13
Describe the bug
TL:DR; Vector gets stuck unable to send logs to VL after restart of VL StatefulSet due to Vector using direct IP for the VL Pod, which changes after restart, without Vector being able to reconnect.
We have just deployed VictoriaLogs and Vector using the helm chart and as many default values as possible.
In the cluster (GKE) we also have our own deployment of istio (v1.24).
As defined in the helm chart, we get a StatefulSet and a headless Service (request: add
-headless
suffix to the Service since this stumped us for a few minutes). It does not appear to be possible to make the Service not headless (i.e. to have it obtain a ClusterIP and thus let K8s manage routing).By default the helm chart produces config for Vector with endpoint
statefulset-pod-name-0.victorialogs-namespace.svc.cluster.local
. This makes Vector get the IP ofstatefulset-pod-name-0
directly. However ifstatefulset-pod-name-0
is ever restarted or rescheduled for any reason, it's IP will change. This change is not picked up by Vector (or possibly istio-proxy sidecar), causing it to be stuck unable to send logs until all Vector pods are restarted.The logs emitted by vector look like this
I'm not sure if this is 100% Vectors fault (it's docs say it does complete reconnects every time), istio's fault (this may indicate it's not entirely unrelated: istio/istio#54539 ), or a combination.
Proposed fix
If we can have an option in the helm chart to generate a Service that is not headless. Or choose to create an additional non-headless service, this would "fix" the problem.
I'm sure you have your reasons for using a headless Service with regards to HA and clustering etc. But for the time being and VictoriaLogs being a "Single instance" service I don't see any significant drawbacks of using a regular non-headless Service.
Custom values
Relevant excerpts of our values.yaml. This contains workarounds to the problems above by overriding the default Vector endpoint to a custom VictoriaLogs Service we've deployed.
The text was updated successfully, but these errors were encountered: