Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Cannot pull metrics from aacraid raid #608

Open
wjbridge opened this issue Mar 19, 2024 · 3 comments
Open

[BUG] Cannot pull metrics from aacraid raid #608

wjbridge opened this issue Mar 19, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@wjbridge
Copy link

Describe the bug
Not able to pull metrics for aacraid raid anymore. Uses the command smartctl --xall --json --device sat /dev/sda instead of smartctl --xall --json --device aacraid,0,0,0 /dev/sda.

The command smartctl --xall --json --device sat /dev/sda returns in error and can reproduce this in the container (i.e. docker exec). If I use smartctl --xall --json --device aacraid,0,0,0 /dev/sda1, this works great.

Expected behavior
Pull metrics from aacraid raid.

Screenshots

time="2024-03-18T20:06:00-04:00" level=info msg="Verifying required tools" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Executing command: smartctl --scan --json" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Checking Influxdb & Sqlite health" type=web
time="2024-03-18T20:06:00-04:00" level=info msg="127.0.0.1 - 4f824e999d62 [18/Mar/2024:20:06:00 -0400] \"GET /api/health\" 200 16 \"\" \"curl/7.88.1\" (1ms)" clientIP=127.0.0.1 hostname=4f824e999d62 latency=1 method=GET path=/api/health referer= respLength=16 statusCode=200 type=web userAgent=curl/7.88.1
time="2024-03-18T20:06:00-04:00" level=info msg="Executing command: smartctl --info --json --device aacraid,0,0,0 /dev/sda" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Generating WWN" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Executing command: smartctl --info --json --device aacraid,0,0,1 /dev/sda" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Generating WWN" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Executing command: smartctl --info --json --device aacraid,0,0,2 /dev/sda" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Generating WWN" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Executing command: smartctl --info --json --device aacraid,0,0,3 /dev/sda" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Generating WWN" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Executing command: smartctl --info --json --device auto /dev/nvme0n1" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Using WWN Fallback" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Sending detected devices to API, for filtering & validation" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="127.0.0.1 - 4f824e999d62 [18/Mar/2024:20:06:00 -0400] \"POST /api/devices/register\" 200 2827 \"\" \"Go-http-client/1.1\" (1ms)" clientIP=127.0.0.1 hostname=4f824e999d62 latency=1 method=POST path=/api/devices/register referer= respLength=2827 statusCode=200 type=web userAgent=Go-http-client/1.1
time="2024-03-18T20:06:00-04:00" level=info msg="Collecting smartctl results for sda\n" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Executing command: smartctl --xall --json --device sat /dev/sda" type=metrics
time="2024-03-18T20:06:00-04:00" level=error msg="smartctl returned an error code (2) while processing sda\n" type=metrics
time="2024-03-18T20:06:00-04:00" level=error msg="smartctl could not open device" type=metrics
time="2024-03-18T20:06:00-04:00" level=info msg="Publishing smartctl results for 0x5000cca2abeb2d07\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Successfully sent notifications. Check logs for more information." type=web
time="2024-03-18T20:06:01-04:00" level=info msg="Collecting smartctl results for sda\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Executing command: smartctl --xall --json --device sat /dev/sda" type=metrics
time="2024-03-18T20:06:01-04:00" level=error msg="smartctl returned an error code (2) while processing sda\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=error msg="smartctl could not open device" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Publishing smartctl results for 0x5000cca2b6f39395\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Successfully sent notifications. Check logs for more information." type=web
time="2024-03-18T20:06:01-04:00" level=info msg="Collecting smartctl results for sda\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Executing command: smartctl --xall --json --device sat /dev/sda" type=metrics
time="2024-03-18T20:06:01-04:00" level=error msg="smartctl returned an error code (2) while processing sda\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=error msg="smartctl could not open device" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Publishing smartctl results for 0x5000cca2eccb1d14\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Successfully sent notifications. Check logs for more information." type=web
time="2024-03-18T20:06:01-04:00" level=info msg="Collecting smartctl results for sda\n" type=metrics
time="2024-03-18T20:06:01-04:00" level=info msg="Executing command: smartctl --xall --json --device sat /dev/sda" type=metrics
time="2024-03-18T20:06:02-04:00" level=error msg="smartctl returned an error code (2) while processing sda\n" type=metrics
time="2024-03-18T20:06:02-04:00" level=error msg="smartctl could not open device" type=metrics
time="2024-03-18T20:06:02-04:00" level=info msg="Publishing smartctl results for 0x5000cca2b6f61045\n" type=metrics
time="2024-03-18T20:06:02-04:00" level=info msg="Successfully sent notifications. Check logs for more information." type=web

Log Files
Docker Config

  #-------------------------------------------
  # Scrutiny - WebUI for smartd S.M.A.R.T monitoring
  # https://github.com/AnalogJ/scrutiny/blob/master/docker/example.omnibus.docker-compose.yml
  #-------------------------------------------
  scrutiny:
    image: ghcr.io/analogj/scrutiny:master-omnibus
    container_name: scrutiny
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/api/health"]
      interval: 5s
      timeout: 10s
      retries: 20
      start_period: 10s
    networks:
      - $T2_NETWORK
    depends_on:
      - $PROXY
    cap_add:
      - SYS_RAWIO
      - SYS_ADMIN
    environment:
      - PUID=$PUID
      - PGID=$PGID
      - TZ=$TZ
    volumes:
      - $DOCKERCFG/Scrutiny/config:/opt/scrutiny/config
      - $DOCKERCFG/Scrutiny/influxdb:/opt/scrutiny/influxdb
      - /run/udev:/run/udev:ro
    devices:
      - /dev/nvme0n1:/dev/nvme0n1
      - /dev/sda:/dev/sda
      - /dev/aac0:/dev/aac0
    labels:
      - com.centurylinklabs.watchtower.enable=true
@wjbridge wjbridge added the bug Something isn't working label Mar 19, 2024
@wjbridge wjbridge changed the title [BUG] [BUG] Cannot pull metrics from aacraid raid Mar 19, 2024
@AnalogJ
Copy link
Owner

AnalogJ commented Mar 19, 2024

have you created a collector config file?

https://github.com/AnalogJ/scrutiny/blob/master/example.collector.yaml#L42-L50

If smartctl is returning an error, you need to provide a config file to override/configure the smartctl command for your disks

@thomashilzendegen
Copy link

I have the same problem. I tracked it down to changes in the release 0.7.3 - with 0.7.2 it still works. As a workaround I will stay with that version.

@wjbridge
Copy link
Author

wjbridge commented Mar 21, 2024

Yes, I have created the config file. Here is my collector file. I can also confirm everything works correctly with v0.7.2.

######################################################################
# Version
#
# version specifies the version of this configuration file schema, not
# the scrutiny binary. There is only 1 version available at the moment
version: 1

# The host id is a label used for identifying groups of disks running on the same host
# Primiarly used for hub/spoke deployments (can be left empty if using all-in-one image).
host:
  id: ""

# This block allows you to override/customize the settings for devices detected by
# Scrutiny via `smartctl --scan`
# See the "--device=TYPE" section of https://linux.die.net/man/8/smartctl
# type can be a 'string' or a 'list'
devices:
# examples showing how to force smartctl to detect disks inside a raid array/virtual disk
  - device: /dev/sda
    type:
      - aacraid,0,0,0
      - aacraid,0,0,1
      - aacraid,0,0,2
      - aacraid,0,0,3

# example for forcing device type detection for a single disk
  - device: /dev/nvme0n1
    type: 'auto'

########################################################################################################################
# FEATURES COMING SOON
#
# The following commented out sections are a preview of additional configuration options that will be available soon.
#
########################################################################################################################

I am not sure how to override/configure the smartctl command for your disks that replaces the --device sat with --device aacraid,0,0,0. I thought that was coming from the collector file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants