[FEAT] Enable Baseline Exclusion for SMART Errors in Scrutiny Reports #713

jacobalberty · 2024-11-12T20:21:54Z

Is your feature request related to a problem? Please describe.
I'm encountering false failure reports from Scrutiny due to specific SMART errors. The drive itself is in good condition, with the errors likely caused by past unsafe power downs. These errors are static, not increasing over time, yet the collector still flags the drive as failed.

Describe the solution you'd like
I would like a feature in the Scrutiny collector configuration to set a baseline for specific SMART errors—such as media errors—so that only new or increased errors trigger a failure report. For example, if my current media error count is 6, the software should only flag the drive if this count rises above 6, allowing me to monitor new issues without persistent false reports.

Additional context
This would be helpful for managing drives with a history of non-progressive errors, as it would allow Scrutiny to focus on tracking changes rather than reporting known static issues.

AnalogJ · 2025-01-04T23:28:41Z

This feature already exists: #547

jacobalberty · 2025-01-04T23:51:18Z

@AnalogJ #547 looks like it only relates to notifications, not device health status and I'm still seeing status as failed when I enable it. This PR only appears to affect notifications and not device health reporting. I don't believe #547 implements this feature at this time.

AnalogJ · 2025-01-05T00:23:16Z

Hey @jacobalberty
So the failure thresholds are based on the Backblaze data. If they determine that it's likely to cause a failure, Scrutiny flags it.

I think you have 2 options:

You can use the new functionality introduced in Add support for disabling repeat notifications if the values haven't changed #547 to ignore those notifications if nothing has changed
ignore the Backblaze data completely and just depend on the SMART tests to determine the health of the disk (see the settings page to enable this)

jacobalberty · 2025-01-05T00:33:47Z

So this is a common issue with this drive in particular. When the system is shut down suddenly (power loss) it generates an entry in the media errors table. So it's common to end up with a few spurious errors here and there. However if they do start increasing with an unknown cause it indicates the drive is dying.

I'm proposing allowing setting a baseline that would be subtracted out when comparing against the scrutiny threshold. The drive as it is right now is healthy but I'm forced to ignore a better source of data because I also have these benign media errors entries that don't actually indicate a failure.

Some way of just storing the current smart data and treating it as our 0 instead of starting from actual zero, whether just a snapshot to diff against of configurable offsets in the collector either would work and would allow users to enable scrutiny thresholds instead of relying on smart threshold

jacobalberty · 2025-01-05T00:34:26Z

#729 is an example of this behavior

zwimer · 2025-01-10T00:46:22Z

This is related to: #553

AnalogJ closed this as completed Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Enable Baseline Exclusion for SMART Errors in Scrutiny Reports #713

[FEAT] Enable Baseline Exclusion for SMART Errors in Scrutiny Reports #713

jacobalberty commented Nov 12, 2024

AnalogJ commented Jan 4, 2025

jacobalberty commented Jan 4, 2025

AnalogJ commented Jan 5, 2025

jacobalberty commented Jan 5, 2025

jacobalberty commented Jan 5, 2025

zwimer commented Jan 10, 2025

[FEAT] Enable Baseline Exclusion for SMART Errors in Scrutiny Reports #713

[FEAT] Enable Baseline Exclusion for SMART Errors in Scrutiny Reports #713

Comments

jacobalberty commented Nov 12, 2024

AnalogJ commented Jan 4, 2025

jacobalberty commented Jan 4, 2025

AnalogJ commented Jan 5, 2025

jacobalberty commented Jan 5, 2025

jacobalberty commented Jan 5, 2025

zwimer commented Jan 10, 2025