-
Notifications
You must be signed in to change notification settings - Fork 536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add heartbeat endpoint for background services monitoring (Fixes #20602) #20603
Conversation
b8c7145
to
8bb9272
Compare
f41fa8b
to
a218561
Compare
I suspect this will not work because among all AMO components only the celery workers have access to the remote-settings writer, but the monitor endpoint runs on the web component. |
I can't recall, is that by design? |
The heartbeat already checks connectivity with Autograph, and I followed what was there, thinking Remote Settings would be in a similar position. That being said, I can move the implementation of the connectivity check to the worker, and execute it "synchronously" from the monitor view. |
Where do you think the tests should live now that the test is done in a task? In |
@bqbn how often do we call Moving the check in a task is an interesting idea but I'm worried that sometimes the task might take too long to be processed... |
@diox what would be your pessimistic estimate here? More than 30 seconds? From what @ahoneiser could see in Pingdom, heartbeat endpoints are pulled every minute. |
Long comment about legacy decisions: The main issue is that the The best option in this case is going to be the Since the goal here is not to test how fast tasks queues are being processed, maybe we should add code to ensure we don't wait too long, and add a soft fail mode that doesn't page anyone if it times out? TL;DR: use |
We don't monitor this endpoint actually. The backend services this endpoint reports, such as memcache, database, elasticsearch and etc. are monitored separately. |
What is by design? Only celery workers can access the remote-settings writer, or the monitor endpoint runs on the web component? |
Sorry, it's actually an alias for |
That only celery workers can access the |
We monitor |
It's more like to follow the least privilege access principle because the celery workers are the only component that needs to access the remote-settings writer. |
After more discussion with @bqbn : we want to keep the So autograph, rabbitmq and, more relevant to this PR, remote settings monitoring should move to a separate endpoint (and can use a celery task on the |
Oh, but that's our Dockerflow standard 😕 We rely on those standard endpoints for alerting and troubleshooting in other systems like https://delivery-checks.prod.mozaws.net
Ok. Makes sense!
Isn't it the role of |
Agreed, and I think it would be useful to monitor that endpoint on AMO going forward.
If you want to take care of it, then I think it's fine to just repurpose the issue you opened about remote settings, editing its title/description. Then moving your new probe to that new endpoint.
|
I wondered:
|
We should probably just get rid of it entirely, so don't worry about it.
It should catch everything - ideally we don't want the endpoint to ever raise a 500 for anything. |
No idea why yet, but when trying this locally, I get:
I think we at least need the following patch ( diff --git a/src/olympia/blocklist/tasks.py b/src/olympia/blocklist/tasks.py
index 42d3e6e70c..cc4ba5c5bd 100644
--- a/src/olympia/blocklist/tasks.py
+++ b/src/olympia/blocklist/tasks.py
@@ -64,8 +64,8 @@ def process_blocklistsubmission(multi_block_submit_id, **kw):
raise exc
-@task
-def monitor_remote_settings():
+@task(ignore_result=False)
+def monitor_remote_settings(*args, **kwargs):
# check Remote Settings connection
client = RemoteSettings(
settings.REMOTE_SETTINGS_WRITER_BUCKET, But that wasn't enough to fix it for me... |
This is our fault, a problem with the way our https://github.com/mozilla/django-post-request-task/ behaves |
Hopefully fixed in leplatrem@a0a6da8 |
Great! 🎉
I haven't fully understood the purpose of https://docs.celeryq.dev/en/stable/userguide/configuration.html#std-setting-task_ignore_result |
Agreed, added in leplatrem@3d97f16 |
|
Fixes mozilla/addons#9152
This pull-requests adds a simple monitor check that will read the heartbeat of the configured Remote Settings server
PRs open for the same issue.
Fixes #ISSUENUM
at the top of your PR.Add before and after screenshots (Only for changes that impact the UI).