Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"auto_reschedule_checks" Causes Indefinite Delay in Service Checks #947

Open
benbyr opened this issue Feb 26, 2024 · 1 comment
Open

"auto_reschedule_checks" Causes Indefinite Delay in Service Checks #947

benbyr opened this issue Feb 26, 2024 · 1 comment
Labels

Comments

@benbyr
Copy link

benbyr commented Feb 26, 2024

I've encountered an issue where enabling the auto_reschedule_checks option in Nagios results in some services not running as expected. Additionally, the "Next Scheduled Check" for affected services gets pushed forward indefinitely, preventing these checks from being executed according to their intended schedule. Also results in some new services being stuck in a "pending" state indefinitely. I understand that this option is still considered experimental, but it's the only option that effectively decreases the monitoring load on our system.

Expected Behavior:
When auto_reschedule_checks is enabled, all services should continue to run at their scheduled intervals, with reasonable adjustments to distribute the load evenly. The "Next Scheduled Check" should be rescheduled within a practical timeframe, ensuring timely execution of all checks.

Actual Behavior:
For some services, after enabling auto_reschedule_checks, the checks do not run, and the "Next Scheduled Check" time is indefinitely postponed. Also results in some new services being stuck in a "pending" state indefinitely. This issue persists across service checks, leading to gaps in monitoring and potential oversight of critical issues. This is particularly concerning given that auto_reschedule_checks is the only option that significantly reduces the monitoring load on our systems.

Nagios Version:
Nagios Core 4.4.10

Nagios Config:
auto_reschedule_checks=1
auto_rescheduling_interval=30
auto_rescheduling_window=180

Just to note, we have not experienced this issue across all our monitoring nodes; it has only occurred with servers from a specific provider. This inconsistency is confusing, especially considering that it works correctly most of the time. All of our nodes are deployed exactly the same.

@benbyr
Copy link
Author

benbyr commented Feb 26, 2024

Looks like this might be related to: #893

@sawolf sawolf added the Bug label Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants