Skip to content

Commit

Permalink
Error handling while running the Gluster commands
Browse files Browse the repository at this point in the history
A new metric is added to capture the metrics collection errors
`glusterfs_error_count` with label "name" to identify the failed
metrics collector. Example:

```prometheus
glusterfs_error_count{name="list_volumes"} 1.0
glusterfs_error_count{name="list_peers"} 1.0
```

On error, volume and peer metrics will not be collected. Once
Glusterd comes back online or after the issue is fixed, then these
metrics will start appearing.

DW_FORM_data16 exception is fixed with the latest version
of the Crystal
(Ref: crystal-lang/crystal#12744)

Fixes: #38
Signed-off-by: Aravinda Vishwanathapura <[email protected]>
  • Loading branch information
aravindavk committed May 31, 2024
1 parent 910492f commit b84f710
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 9 deletions.
10 changes: 5 additions & 5 deletions shard.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,23 @@ version: 2.0
shards:
backtracer:
git: https://github.com/sija/backtracer.cr.git
version: 1.2.1
version: 1.2.2

crometheus:
git: https://github.com/darwinnn/crometheus.git
version: 0.3.0+git.commit.8394850abd90b976aa205f58de49a886c91aa802
version: 0.3.0+git.commit.c71a13174d02e3767b62faa1771132d4245e1ce5

exception_page:
git: https://github.com/crystal-loot/exception_page.git
version: 0.2.2
version: 0.4.1

glustercli:
git: https://github.com/aravindavk/glustercli-crystal.git
version: 0.2.0+git.commit.0881f270471be31d4f6d06f45d896e17adace0db
version: 0.2.0+git.commit.fb2f36881d79523990ca8b5f23212d4181718c8f

kemal:
git: https://github.com/kemalcr/kemal.git
version: 1.2.0
version: 1.5.0

radix:
git: https://github.com/luislavena/radix.git
Expand Down
19 changes: 19 additions & 0 deletions src/metrics/errors.cr
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
module GlusterMetricsExporter
Crometheus.alias ErrorGauge = Crometheus::Gauge[:name]

@@error_count = ErrorGauge.new(:error_count, "Metrics collection errors")

def self.clear_error_metrics
@@error_count.clear
end

handle_metrics(["error"]) do |metrics_data|
# Reset all Metrics to avoid stale data. Careful if
# counter type is used
clear_error_metrics

metrics_data.errors.each do |err|
@@error_count[name: err.name].set(1)
end
end
end
26 changes: 23 additions & 3 deletions src/metrics/helpers.cr
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
require "glustercli"

module GlusterMetricsExporter
struct MetricError
include JSON::Serializable

property name = ""

def initialize(@name)
end
end

class MetricsData
include JSON::Serializable

property volumes = [] of GlusterCLI::VolumeInfo,
peers = [] of GlusterCLI::NodeInfo,
local_metrics = Hash(String, GlusterCLI::LocalMetrics).new,
exporter_health = Hash(String, Int32).new
exporter_health = Hash(String, Int32).new,
errors = [] of MetricError

def self.collect
data = MetricsData.new
Expand All @@ -24,11 +34,21 @@ module GlusterMetricsExporter
status_collect = true
end

data.volumes = cli.list_volumes(status: status_collect)
begin
data.volumes = cli.list_volumes(status: status_collect)
rescue ex : GlusterCLI::CommandException
data.errors << MetricError.new("list_volumes")
Log.error &.emit("Error while collecting the Volumes metrics", error: ex.message)
end
end

if GlusterMetricsExporter.config.enabled?("peer")
data.peers = cli.list_peers
begin
data.peers = cli.list_peers
rescue ex : GlusterCLI::CommandException
data.errors << MetricError.new("list_peers")
Log.error &.emit("Error while collecting the Peers metrics", error: ex.message)
end
end

# TODO: API calls concurrently
Expand Down
2 changes: 1 addition & 1 deletion src/metrics/peer.cr
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ module GlusterMetricsExporter

metrics_data.peers.each do |peer|
# Peer State 1 => Connected, 0 => Disconnected/Unknown
state = peer.connected ? 1 : 0
state = peer.connected? ? 1 : 0
@@peer_state[hostname: peer.hostname].set(state)
end
end
Expand Down

0 comments on commit b84f710

Please sign in to comment.