-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk and filesystem error metrics #3005
Comments
This would be very useful for us as well. Any update on this? |
Hi @sasa-tomic Currently I am working on a PR for this. |
WOW, That will be very cool. |
PR: #3047 - first draft |
so #3047 ended up being moved to prometheus/procfs#651, and was merged, from what I gathered in #3047, what's the next step here? :) As I mentioned in #3113, it's not clear to me how procfs and the node exporter packages interact, does an implementation in procfs automatically end up in node exporter or are we missing some shim here? |
We need to wait for a new release of procfs. Then we can import the new package here in node_exporter and merge it @anarcat |
I recently had a disk fail on a system, which I found out from errors in dmesg. (
blk_update_request: critical medium error
)I wanted to set up some alerts on prometheus so I could get notified the next time the same thing happens but couldn't find any metric from node exporter on the machine that indicated anything was wrong. The only disk error related metric I found is
node_filesystem_device_error
, which just returns the errors returned from the statfs syscall.I went digging around in sysfs on the machine and found data about ext4 filesystem errors in these files:
/sys/fs/ext4/<partition>/errors_count
: number of ext4 errors (commit)/sys/fs/ext4/<partition>/warning_count
: number of ext4 warning log messages (commit)/sys/fs/ext4/<partition>/msg_count
: number of other ext4 log messages...and SCSI disk errors in these files (hexadecimal):
/sys/block/<disk>/device/ioerr_cnt
: number of SCSI commands that completed with an error/sys/block/<disk>/device/iodone_cnt
: number of completed or rejected SCSI commandsI think node exporter should export these metrics. Maybe somewhat like this:
The text was updated successfully, but these errors were encountered: