-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial runbook entries for AM's alerts #577
Conversation
Signed-off-by: Douglas Camata <[email protected]>
docs/sop/observatorium.md
Outdated
|
||
### Impact | ||
|
||
For users this means that their most recent update to alerts might not be currently in use. Ultimately, this means some of the alerts they have configured may not be firing as expected. Subsequente updates to Alertmanager configuration won't be picked up until the reload succeeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Subsequente -> subsequent (language typo? :) )
docs/sop/observatorium.md
Outdated
|
||
### Steps | ||
|
||
- In the OSD console for the affected cluster, find the Alertmanager Route. Check that it correctly points to the Alertmanager Service. Check that the Service correctly points to the **all** the Alertmanager pods. Open the Route's address, go to the "Status" tab, and note the IP addresses of the discovered Alertmanager instances. Check if they match the addresses of **all** the Alertmanager pods, none should be missing or mismatching. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the Route really relevant here? I would have thought the impact here would be due to issues with the internal network and peering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mention to the route there is only to find and open the Alertmanager UI, nothing else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I think that makes more sense. There is probably some kubectl command we could use to achieve same but this is fine too in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI I'm making the text clearer that the purpose of finding the route is to open the AM UI.
docs/sop/observatorium.md
Outdated
|
||
### Summary | ||
|
||
One of the Alertmanager instances in the cluster cannot send alerts to integrations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
integrations -> receivers
Signed-off-by: Douglas Camata <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM mod @philipgough's comments.
Really wish we could import this somehow, similar to mixin.
Signed-off-by: Douglas Camata <[email protected]>
Some additional reformatting happened because the TOC generation (which is automatic) now changed the way it's writing markdown lists.
This is partially based on https://runbooks.prometheus-operator.dev/runbooks/alertmanager.