bug: Vector stops sending logs after StatefulSet restart due to headless service #1938

StianOvrevage · 2025-01-20T23:59:54Z

Chart name and version
chart: victoria-logs-single
version: v0.8.13

Describe the bug
TL:DR; Vector gets stuck unable to send logs to VL after restart of VL StatefulSet due to Vector using direct IP for the VL Pod, which changes after restart, without Vector being able to reconnect.

We have just deployed VictoriaLogs and Vector using the helm chart and as many default values as possible.

In the cluster (GKE) we also have our own deployment of istio (v1.24).

As defined in the helm chart, we get a StatefulSet and a headless Service (request: add -headless suffix to the Service since this stumped us for a few minutes). It does not appear to be possible to make the Service not headless (i.e. to have it obtain a ClusterIP and thus let K8s manage routing).

By default the helm chart produces config for Vector with endpoint statefulset-pod-name-0.victorialogs-namespace.svc.cluster.local. This makes Vector get the IP of statefulset-pod-name-0 directly. However if statefulset-pod-name-0 is ever restarted or rescheduled for any reason, it's IP will change. This change is not picked up by Vector (or possibly istio-proxy sidecar), causing it to be stuck unable to send logs until all Vector pods are restarted.

The logs emitted by vector look like this

2025-01-20T11:34:49.192561Z ERROR sink{component_kind="sink" component_id=vlogs component_type=elasticsearch}:request{request_id=272}: vector::sinks::elasticsearch::service: Response contained errors. error_code="http_response_503" response=Response { status: 503, version: HTTP/1.1, headers: {"content-length": "114", "content-type": "text/plain", "date": "Mon, 20 Jan 2025 11:34:48 GMT", "server": "envoy"}, body: b"upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout" }
2025-01-20T11:34:49.192619Z  WARN sink{component_kind="sink" component_id=vlogs component_type=elasticsearch}:request{request_id=272}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true

I'm not sure if this is 100% Vectors fault (it's docs say it does complete reconnects every time), istio's fault (this may indicate it's not entirely unrelated: istio/istio#54539 ), or a combination.

Proposed fix
If we can have an option in the helm chart to generate a Service that is not headless. Or choose to create an additional non-headless service, this would "fix" the problem.

I'm sure you have your reasons for using a headless Service with regards to HA and clustering etc. But for the time being and VictoriaLogs being a "Single instance" service I don't see any significant drawbacks of using a regular non-headless Service.

Custom values
Relevant excerpts of our values.yaml. This contains workarounds to the problems above by overriding the default Vector endpoint to a custom VictoriaLogs Service we've deployed.

global:
  cluster:
    # Override since trailing dot is not understood by istio
    dnsDomain: cluster.local

server:
  retentionPeriod: 14d
  # Disk space usage in gigabytes
  retentionDiskSpaceUsage: "25"
  persistentVolume:
    enabled: true
    storageClassName: "standard"
    size: 30Gi

vector:
  enabled: true
  podPriorityClassName: "high-priority-preempt"

  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 100%

    sinks:
      vlogs:
        # Our own custom non-headless Service
        endpoints: ["http://vls-victoria-logs-single-server-clusterip.logging-v2.svc.cluster.local:9428/insert/elasticsearch"]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Vector stops sending logs after StatefulSet restart due to headless service #1938

bug: Vector stops sending logs after StatefulSet restart due to headless service #1938

StianOvrevage commented Jan 20, 2025

bug: Vector stops sending logs after StatefulSet restart due to headless service #1938

bug: Vector stops sending logs after StatefulSet restart due to headless service #1938

Comments

StianOvrevage commented Jan 20, 2025