Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Vector stops sending logs after StatefulSet restart due to headless service #1938

Open
StianOvrevage opened this issue Jan 20, 2025 · 0 comments

Comments

@StianOvrevage
Copy link
Contributor

Chart name and version
chart: victoria-logs-single
version: v0.8.13

Describe the bug
TL:DR; Vector gets stuck unable to send logs to VL after restart of VL StatefulSet due to Vector using direct IP for the VL Pod, which changes after restart, without Vector being able to reconnect.

We have just deployed VictoriaLogs and Vector using the helm chart and as many default values as possible.

In the cluster (GKE) we also have our own deployment of istio (v1.24).

As defined in the helm chart, we get a StatefulSet and a headless Service (request: add -headless suffix to the Service since this stumped us for a few minutes). It does not appear to be possible to make the Service not headless (i.e. to have it obtain a ClusterIP and thus let K8s manage routing).

By default the helm chart produces config for Vector with endpoint statefulset-pod-name-0.victorialogs-namespace.svc.cluster.local. This makes Vector get the IP of statefulset-pod-name-0 directly. However if statefulset-pod-name-0 is ever restarted or rescheduled for any reason, it's IP will change. This change is not picked up by Vector (or possibly istio-proxy sidecar), causing it to be stuck unable to send logs until all Vector pods are restarted.

The logs emitted by vector look like this

2025-01-20T11:34:49.192561Z ERROR sink{component_kind="sink" component_id=vlogs component_type=elasticsearch}:request{request_id=272}: vector::sinks::elasticsearch::service: Response contained errors. error_code="http_response_503" response=Response { status: 503, version: HTTP/1.1, headers: {"content-length": "114", "content-type": "text/plain", "date": "Mon, 20 Jan 2025 11:34:48 GMT", "server": "envoy"}, body: b"upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout" }
2025-01-20T11:34:49.192619Z  WARN sink{component_kind="sink" component_id=vlogs component_type=elasticsearch}:request{request_id=272}: vector::sinks::util::retries: Retrying after response. reason=503 Service Unavailable: upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection timeout internal_log_rate_limit=true

I'm not sure if this is 100% Vectors fault (it's docs say it does complete reconnects every time), istio's fault (this may indicate it's not entirely unrelated: istio/istio#54539 ), or a combination.

Proposed fix
If we can have an option in the helm chart to generate a Service that is not headless. Or choose to create an additional non-headless service, this would "fix" the problem.

I'm sure you have your reasons for using a headless Service with regards to HA and clustering etc. But for the time being and VictoriaLogs being a "Single instance" service I don't see any significant drawbacks of using a regular non-headless Service.

Custom values
Relevant excerpts of our values.yaml. This contains workarounds to the problems above by overriding the default Vector endpoint to a custom VictoriaLogs Service we've deployed.

global:
  cluster:
    # Override since trailing dot is not understood by istio
    dnsDomain: cluster.local

server:
  retentionPeriod: 14d
  # Disk space usage in gigabytes
  retentionDiskSpaceUsage: "25"
  persistentVolume:
    enabled: true
    storageClassName: "standard"
    size: 30Gi

vector:
  enabled: true
  podPriorityClassName: "high-priority-preempt"

  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 100%

    sinks:
      vlogs:
        # Our own custom non-headless Service
        endpoints: ["http://vls-victoria-logs-single-server-clusterip.logging-v2.svc.cluster.local:9428/insert/elasticsearch"]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant