Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT] Enable Baseline Exclusion for SMART Errors in Scrutiny Reports #713

Closed
jacobalberty opened this issue Nov 12, 2024 · 6 comments
Closed

Comments

@jacobalberty
Copy link

Is your feature request related to a problem? Please describe.
I'm encountering false failure reports from Scrutiny due to specific SMART errors. The drive itself is in good condition, with the errors likely caused by past unsafe power downs. These errors are static, not increasing over time, yet the collector still flags the drive as failed.

Describe the solution you'd like
I would like a feature in the Scrutiny collector configuration to set a baseline for specific SMART errors—such as media errors—so that only new or increased errors trigger a failure report. For example, if my current media error count is 6, the software should only flag the drive if this count rises above 6, allowing me to monitor new issues without persistent false reports.

Additional context
This would be helpful for managing drives with a history of non-progressive errors, as it would allow Scrutiny to focus on tracking changes rather than reporting known static issues.

@AnalogJ
Copy link
Owner

AnalogJ commented Jan 4, 2025

This feature already exists: #547

@AnalogJ AnalogJ closed this as completed Jan 4, 2025
@jacobalberty
Copy link
Author

@AnalogJ #547 looks like it only relates to notifications, not device health status and I'm still seeing status as failed when I enable it. This PR only appears to affect notifications and not device health reporting. I don't believe #547 implements this feature at this time.
{642E433E-34C8-4174-BE23-16642B5B3D58}

@AnalogJ
Copy link
Owner

AnalogJ commented Jan 5, 2025

Hey @jacobalberty
So the failure thresholds are based on the Backblaze data. If they determine that it's likely to cause a failure, Scrutiny flags it.

I think you have 2 options:

@jacobalberty
Copy link
Author

So this is a common issue with this drive in particular. When the system is shut down suddenly (power loss) it generates an entry in the media errors table. So it's common to end up with a few spurious errors here and there. However if they do start increasing with an unknown cause it indicates the drive is dying.

I'm proposing allowing setting a baseline that would be subtracted out when comparing against the scrutiny threshold. The drive as it is right now is healthy but I'm forced to ignore a better source of data because I also have these benign media errors entries that don't actually indicate a failure.

Some way of just storing the current smart data and treating it as our 0 instead of starting from actual zero, whether just a snapshot to diff against of configurable offsets in the collector either would work and would allow users to enable scrutiny thresholds instead of relying on smart threshold

@jacobalberty
Copy link
Author

#729 is an example of this behavior

@zwimer
Copy link

zwimer commented Jan 10, 2025

This is related to: #553

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants