Adding sidecars to nodepools for any additional functionality #928

samof76 · 2025-01-06T16:56:51Z

Description

Adding sidecars functionality to nodepools

Issues Resolved

No issue was created but i will try to explain why we would need sidecars. Since containers run their own network namespace, that network cannot be monitored from node-exporter that is running on the node as daemonset. So to monitor the network namespace it critical to run the node-exporter as a sidecar.

Especially in systems where there is huge network through put, it is very critical to run it.

Check List

Commits are signed per the DCO using --signoff
Unittest added for the new/changed functionality and all unit tests are successful
Customer-visible features documented
No linter warnings (make lint)

If CRDs are changed:

CRD YAMLs updated (make manifests) and also copied into the helm chart
Changes to CRDs documented

Please refer to the PR guidelines before submitting this pull request.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: M Samuel Vijaykumar <[email protected]>

prudhvigodithi · 2025-01-09T14:12:41Z

Thanks @samof76, adding @swoehrl-mw to please take a look and review.

prudhvigodithi · 2025-01-09T17:43:16Z

Thanks for your contribution @samof76. I understood the motivation of this PR to have a generic sidecars, coming to the root problem of node-exporter, even though node-exporter when installed as daemon set which can query the OpenSearch pods via the cluster DNS there is some limitation to get the Pod-level (pod specific) metrics, just curious there should be some solution for this supported out of the box by prometheus, do we need to only run as a side car to solve this? I was reading some examples and saw https://github.com/lightstep/opentelemetry-prometheus-sidecar, https://docs.lightstep.com/docs/replace-prometheus-with-an-otel-collector-on-kubernetes, https://signoz.io/guides/how-to-monitor-custom-kubernetes-pod-metrics-using-prometheus/. I assume it should be possible to scrape the pod specific metrics.

Adding @eyenx @lukas-vlcek @getsaurabh02 @swoehrl-mw

eyenx · 2025-01-10T05:30:32Z

I am not sure this is how node exporter should be used. It's called node exporter for a reason. We'd rather have the application export proper metrics either directly or via the prometheus exporter.

swoehrl-mw · 2025-01-10T10:42:51Z

The ability to configure sidecars is one I can support. But I agree with @eyenx, node-exporter does not belong in the sidecar. Opensearch-specific metrics can be collected via the specific exporter that can be configured via the monitoring feature, other metrics should come from Kubernetes or node-level exporters.

@samof76
Please rework the example in the docs to have a meaningful example that has nothing to do with node-exporter or fluent-bit (both belong on the node level and I don't want that as a good example in the docs).
Also you did not copy the changed CRD yamls into the operator chart.

samof76 · 2025-01-28T12:07:16Z

@eyenx @swoehrl-mw

Since the pods run in the isolated network namespaces with all the container in that pod, it not easy to get those metrics from the node-exporter running as a daemonset on the node. So we run a node-exporter in all out high throughput pods to collect those tcp metrics with something like this...

apiVersion: apps/v1
kind: Deployment
metadata:
  name: foo
spec:
  ...
  ...
  template:
    ...
    ...
    spec:
      ...
      ...
      containers:
      - image: foo
        ...
        ...
      - image: node-exporter
        name: node-exporter
        args:
        - --web.listen-address=0.0.0.0:9100
        - --collector.disable-defaults
        - --web.disable-exporter-metrics
        - --collector.conntrack
        - --collector.filefd
        - --collector.netstat
        - --collector.sockstat
        ...
        ...
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          allowPrivilegeEscalation: false
          capabilities:
            drop: ["all"]
        ...
        ...

And it is the only way currently to get the pod's network namespace metrics. So it is critical for you to support this pattern as it will help in monitoring opensearch more efficiently.

eyenx · 2025-01-30T06:37:01Z

And it is the only way currently to get the pod's network namespace metrics.

I disagree, we use the kubelet metrics to gather metrics like

container_network_receive_bytes_total
container_network_transmit_bytes_total
...

This metrics are grabbed via cAdvisor from kubelet:

https://www.cloudforecast.io/blog/cadvisor-and-kubernetes-monitoring-guide/

samof76 added 2 commits January 6, 2025 22:14

Adding sidecars to nodepools for any additional functionality

5a59223

Signed-off-by: M Samuel Vijaykumar <[email protected]>

Added documentation for sidecar containers feature

c0440b4

Signed-off-by: M Samuel Vijaykumar <[email protected]>

samof76 requested review from idanl21, swoehrl-mw, prudhvigodithi, jochenkressin, pchmielnik and salyh as code owners January 6, 2025 16:56

test: Add Replicas field back to nodePool in cluster tests

84ef38c

Signed-off-by: M Samuel Vijaykumar <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding sidecars to nodepools for any additional functionality #928

Adding sidecars to nodepools for any additional functionality #928

samof76 commented Jan 6, 2025

prudhvigodithi commented Jan 9, 2025

prudhvigodithi commented Jan 9, 2025 •

edited

Loading

eyenx commented Jan 10, 2025

swoehrl-mw commented Jan 10, 2025

samof76 commented Jan 28, 2025 •

edited

Loading

eyenx commented Jan 30, 2025

Adding sidecars to nodepools for any additional functionality #928

Are you sure you want to change the base?

Adding sidecars to nodepools for any additional functionality #928

Conversation

samof76 commented Jan 6, 2025

Description

Issues Resolved

Check List

prudhvigodithi commented Jan 9, 2025

prudhvigodithi commented Jan 9, 2025 • edited Loading

eyenx commented Jan 10, 2025

swoehrl-mw commented Jan 10, 2025

samof76 commented Jan 28, 2025 • edited Loading

eyenx commented Jan 30, 2025

prudhvigodithi commented Jan 9, 2025 •

edited

Loading

samof76 commented Jan 28, 2025 •

edited

Loading