You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have systems with NVidia GPUs running Kubernetes and the nvidia-dcgm-exporter pod. We're collecting these metrics into our Datadog instance via agent v7.62.0 with the documented annotation:
And this is collecting metrics, but it's including the namespace of the nvidia-dcgm-exporter pod in the kube_namespace tag.
So when I query DCGM metrics and group by kube_namespace, I get the metrics grouped as: kube_system, project-foo kube_system, project-bar gpu-operator, project-foo gpu-operator, project-bar
(the exporter is running in the kube_system namespace in one type of cluster and in the gpu-operator namespace in another type of cluster)
I found this PR: #18654 which seems to be trying to address this via the IGNORED_TAGS setting. But somehow it doesn't seem to be working as intended?
The text was updated successfully, but these errors were encountered:
Followup: I believe the issue here is that in dcgm/datadog_checks/dcgm/check.py, in get_default_config(), the tags to ignore are labeled ignored_tags, instead of ignore_tags, and therefore the config is - ironically - being ignored.
Adding an ignore_tags stanza to our annotation, like so, works around the problem for us.
We have systems with NVidia GPUs running Kubernetes and the nvidia-dcgm-exporter pod. We're collecting these metrics into our Datadog instance via agent v7.62.0 with the documented annotation:
And this is collecting metrics, but it's including the namespace of the nvidia-dcgm-exporter pod in the
kube_namespace
tag.So when I query DCGM metrics and group by
kube_namespace
, I get the metrics grouped as:kube_system, project-foo
kube_system, project-bar
gpu-operator, project-foo
gpu-operator, project-bar
(the exporter is running in the
kube_system
namespace in one type of cluster and in thegpu-operator
namespace in another type of cluster)I found this PR: #18654 which seems to be trying to address this via the IGNORED_TAGS setting. But somehow it doesn't seem to be working as intended?
The text was updated successfully, but these errors were encountered: