Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow all semaphore kube clients 2xx response codes to avoid spam alerts #142

Merged
merged 1 commit into from
Jan 23, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions common/all.yaml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -511,7 +511,7 @@ groups:

If requeued items are not being processed promptly then this indicates a persistent issue. The mirror services are likely to be in an incorrect state.
- alert: SemaphoreServiceMirrorKubeClientErrors
expr: 'sum(rate(semaphore_service_mirror_kube_http_request_total{code!="200"}[10m])) / sum(rate(semaphore_service_mirror_kube_http_request_total[10m])) > 0.1'
expr: 'sum(rate(semaphore_service_mirror_kube_http_request_total{code!~"2.."}[10m])) / sum(rate(semaphore_service_mirror_kube_http_request_total[10m])) > 0.1'
for: 5m
labels:
team: infra
Expand All @@ -530,13 +530,13 @@ groups:

If requeued items are not being processed promptly then this indicates a persistent issue. The xDS configuration served to clients is likely to be in an incorrect state.
- alert: SemaphoreXDSKubeClientErrors
expr: 'increase(semaphore_xds_kube_http_request_total{code!="200"}[5m]) > 0'
expr: 'increase(semaphore_xds_kube_http_request_total{code!~"2.."}[5m]) > 0'
for: 10m
labels:
team: infra
annotations:
summary: "{{ $labels.app }} kubernetes client reports errors speaking to apiserver at {{ $labels.host }} for more than 10 minutes"
description: "Kubernetes client requests returning code different than 200 for longer than 10 minutes. Check the pods logs for further information."
description: "Kubernetes client requests returning code different than 2xx for longer than 10 minutes. Check the pods logs for further information."
logs: '<https://grafana.$ENVIRONMENT.aws.uw.systems/explore?left=["now-1h","now","Loki",{"expr":"{kubernetes_cluster=\"{{$labels.kubernetes_cluster}}\",kubernetes_namespace=\"{{$labels.namespace}}\",container=\"{{$labels.container}}\"}"}]|link>'
- alert: SemaphoreXDSNoZoneEndpoint
expr: 'sum(semaphore_xds_snapshot_endpoint{locality_zone="none"}) > 0'
Expand Down
Loading