Skip to content

Commit

Permalink
alerts6
Browse files Browse the repository at this point in the history
  • Loading branch information
benzekrimaha committed Aug 20, 2024
1 parent 3acfc95 commit ad04f2e
Showing 1 changed file with 6 additions and 9 deletions.
15 changes: 6 additions & 9 deletions monitoring/pra/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,6 @@ x-inputs:
- name: kafka_connect_sink_job
type: config
value: artesca-data-dr-sink-base-queue-connector-metrics
- name: threshold
type: constant
value: 10 # to be defined
- name: zenko_sink_instance
type: config
value: artesca-data-dr
Expand Down Expand Up @@ -44,28 +41,28 @@ groups:

- alert: KafkaConnectOutageSource
expr: |
rate(sum(kafka_connect_task_error_total_record_errors{job="${kafka_connect_src_job}"})[$__rate_interval]) > ${threshold}
rate(sum(kafka_connect_task_error_total_record_errors{job="${kafka_connect_src_job}"})[$__rate_interval]) > 0
OR
rate(sum(kafka_connect_task_error_total_record_failures{job="${kafka_connect_src_job}"})[$__rate_interval]) > ${threshold}
rate(sum(kafka_connect_task_error_total_record_failures{job="${kafka_connect_src_job}"})[$__rate_interval]) > 0
for: 1m
labels:
severity: critical
annotations:
description: >-
Kafka-connect on source is not working nominally. The rate of errors or failures has exceeded the threshold. This could lead DR to get out of sync if not addressed promptly.
Kafka-connect on source is not working nominally. The rate of errors or failures has exceeded 0. This could lead DR to get out of sync if not addressed promptly.
summary: 'Kafka Connect not working'

- alert: KafkaConnectOutageSink
expr: |
rate(sum(kafka_connect_task_error_total_record_errors{job="${kafka_connect_sink_job}"})[$__rate_interval]) > ${threshold}
rate(sum(kafka_connect_task_error_total_record_errors{job="${kafka_connect_sink_job}"})[$__rate_interval]) > 0
OR
rate(sum(kafka_connect_task_error_total_record_failures{job="${kafka_connect_sink_job}"})[$__rate_interval]) > ${threshold}
rate(sum(kafka_connect_task_error_total_record_failures{job="${kafka_connect_sink_job}"})[$__rate_interval]) > 0
for: 1m
labels:
severity: critical
annotations:
description: >-
Kafka-connect on sink is not working nominally. The rate of errors or failures has exceeded the threshold. This could lead to data loss if not addressed promptly.
Kafka-connect on sink is not working nominally. The rate of errors or failures has exceeded 0. This could lead to data loss if not addressed promptly.
summary: 'Kafka Connect not working'

- alert: WriteTimesLatency
Expand Down

0 comments on commit ad04f2e

Please sign in to comment.