-
Notifications
You must be signed in to change notification settings - Fork 0
How to configure Argos
This tool is based on Akka, if you want to configure the application logs, please look at the akka documentation. The argos specific configuration entry point is identified by the argos section that contains three main entries :
- metrics : this section defines the JMX connectivity
- sentinel : this section defines the configuration of each sentinel
- notifiers : this section defines the configuration for each notification process (currently, there are only one type of notifier mail)
An additional parameter scheduler-interval
is available to define the delay between two sentinel control :
Parameter | Type | Description |
---|---|---|
scheduler-interval | Duration |
Duration between two Metrics validation (Default: 60 seconds) |
cassandra-version | Double |
the cassandra version x.y (ex: 2.1) (Default: 3.0) |
The sentinels need some metrics provided by the JMX interface of the cassandra node, this section configures the JMX connection parameters.
Parameter | Type | Description |
---|---|---|
jmx-host | String |
IP on which the instance of Cassandra binds the JMX server (Default: 127.0.0.1) |
jmx-port | Integer |
Listening port of the JMX Server (Default: 7199) |
Because the Metric Provider can send notification when the connection to the cassandra node is lost and reestablished, there are some optional configuration parameters.
Parameter | Type | Description |
---|---|---|
node-down-label | String |
Level of the notification when the Cassandra node is unreachable (Default: CRITIC) |
node-down-level | String |
Label used into the notification title (Default: Cassandra node is DOWN) |
node-up-label | String |
Level of the notification when the Cassandra node comes back on line (Default: INFO) |
node-up-level | String |
Label used into the notification title (Default: Cassandra node is UP) |
A sentinel is an actor that control a specific information provided by the JMX interface of the Cassandra server. If the information is considered as " wrong ", the sentinel will will send a notification message to the notifiers
.
The sentinel name is defined by the children key of the argos.sentinel
entry.
Some sentinels use a Buffer to keep track of the metrics. The threshold validation applies on all elements of the buffer and all of them have to exceed the configured value to throw a notification.
This sentinel examines the Load Average.
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (Default: true) |
threshold | Float |
The maximum value authorized for the LoadAvg metrics (Default: 16.0) |
level | String |
Level of the notification (Default: CRITIC) |
label | String |
Label used into the notification title (Default: "Load Average") |
window-size | Integer |
the size of the buffer (default: 1) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
These sentinels examine the number of dropped messages, if the number of dropped messages change between to controls a notification is triggered. There are one sentinel per type of Messages.
- dropped-counter
- dropped-mutation
- dropped-read
- dropped-read-repair
- dropped-range-slice
- dropped-request-response
- dropped-page-range
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: WARNING) |
label | String |
Label used into the notification title (default: the name of the sentinel) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
These sentinels examine the number of ReadRepair tasks (Background & Blocking) and send a notification if the 1MinuteRate is different of 0.
There are two sentinels, one per ReadRepair type:
- consistency-repaired-blocking
- consistency-repaired-background
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (Default: true) |
threshold | Integer |
The maximum value authorized (default : 0) |
level | String |
Level of the notification (default: INFO) |
label | String |
Label used into the notification title (default: the name of the sentinel) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
These sentinels examine the number of Connection Timeout and send a notification if the 1MinuteRate is different of 0.
- connection-timeouts
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: WARNING) |
label | String |
Label used into the notification title (default: "Connection Timeouts") |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
These sentinels examines the number of blocked tasks and send a notification if the result is different of 0. There are one sentinel per type of ThreadPool.
- blocked-stage-counter
- blocked-stage-gossip
- blocked-stage-internal
- blocked-stage-memtable
- blocked-stage-mutation
- blocked-stage-read
- blocked-stage-read-repair
- blocked-stage-request-response
- blocked-stage-compaction
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: WARNING) |
label | String |
Label used into the notification title (default: the name of the sentinel) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
These sentinels examines the number of pending tasks and send a notification if the result exceed the threshold. There are one sentinel per type of ThreadPool. Because having too many pending tasks may be not an issue, these sentinels cumulates the ThreadPool state in a queue and throws a notification only if all states present into the buffer exceed the limit.
- pending-stage-counter
- pending-stage-gossip
- pending-stage-internal
- pending-stage-memtable
- pending-stage-mutation
- pending-stage-read
- pending-stage-read-repair
- pending-stage-request-response
- pending-stage-compaction
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: INFO) |
label | String |
Label used into the notification title (default: the name of the sentinel) |
window-size | Integer |
the size of the buffer (default: 5) |
threshold | Integer |
The maximum value authorized (default : 25) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
This sentinel examines the number of storage exceptions and send a notification if the result is different of 0
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: CRITIC) |
label | String |
Label used into the notification title (default: "Cassandra Storage Exception") |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
This sentinel examines the number of Hinted-Handoff and send a notification if this number increases between two controls.
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: WARNING) |
label | String |
Label used into the notification title (default: "Network partition") |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
This sentinel examines the used space on each directory declared in the Cassandra.yml. If the available space is too low, a notification is triggered.
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: CRITIC) |
label | String |
Label used into the notification title (default: "Few Disk Space") |
threshold | Integer |
Percentage of available space required for the data directories (default: 50) |
commitlog-threshold | Integer |
Percentage of available space required for the commtilog directory (default: 5) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
This sentinel examines the duration of GC. If a GC duration is too long (bigger than the threshold), a notification is triggered.
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: WARNING) |
label | String |
Label used into the notification title (default: "GC Inspector - too long GC") |
threshold | Integer |
max duration (in ms) for a GC (default: 200 ms) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
This sentinel send a notification if the JMX listener is informed about a ERROR or an ABORTED operation (like a repair)
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: INFO) |
label | String |
Label used into the notification title (default: "Progress Event") |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
This sentinel send a notification if the declared ConsistencyLevel can't be reach for a TokenRange
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (Default: false) |
level | String |
Level of the notification (Default: CRITIC) |
label | String |
Label used into the notification title (Default: "Consitency Level") |
keyspaces | List |
List of Object to defined the expected ConsistencyLevel for a given keyspace |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
The "keyspace" object has two attributes :
- name : to identify the keyspace
- cl : to define to expected ConsistencyLevel
Because there are a lot of metrics available through the JMX interface, you can define your own sentinels thanks to the 'Custom sentinel'. To do that, you only have to create a section per sentinel into the 'argos.sentinel.custom-sentinels'. The configuration key of each Config will be used as Sentinel name. Here is an example :
custom-sentinels {
mysentinel {
enabled = true
level= "WARNING"
label= "Test label"
objectName = "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate"
objectAttr = "Value"
threshold=0.64
}
}
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the sentinel is activated (default: true) |
level | String |
Level of the notification (default: WARNING) |
label | String |
Label used into the notification title (default: the name of the sentinel) |
objectName | String |
name of the JMX Object |
objectAttr | String |
attribute name of the JMX Object that contains the metrics value |
window-size | Integer |
the size of the buffer (default: 1) |
message | String |
A custom message for the notification. This message will be prepended to the static message generated by argos. (default: None) |
threshold | Double |
The maximum value authorized (default : 0.0) |
precision | Double |
the precision used to test if two double values are equals. (default : 0.01) |
Each sentinel may configure the 'period' between two notifications to avoid flooding the alert receiver. By default, the period between two alerts is set to 15 minutes (excepted for disk capacity, the period is 4 H)
A notifier is an object managing the notification message send by the Sentinels.
Each notifier configuration must have the key providerClass
specifying the implementation of the NotifierProvider
trait.
With this trait, the provider has to implement a props method that return the "Props" object used to create the Notifier actor.
Whatever the notifier implementation, there are two common configuration entries available to filter the notifications:
Parameter | Type | Description |
---|---|---|
white-list | List[String] |
the notifier will manage notifications send by the sentinel present in this list |
black-list | List[String] |
the notifier will ignore notifications send by the sentinel present in this list |
If these lists are empty, the filtering is disabled and all notification will be managed by the Notifier implementation. If both lists are filled, only the while list will be used.
Currently, there are only one notifier named mail
.
Parameter | Type | Description |
---|---|---|
providerClass | String |
the provider class io.argos.agent.notifiers.MailNotifierProvider
|
smtp-host | String |
hostname of the SMTP service |
smtp-port | String |
port of the SMTP service |
from | String |
The email address specified into the from header |
recipients | List[String] |
list of recipients that will receive the notification |
To configure your Argos instance in a local mode, you have to configure the Akka system with the local provider.
akka {
actor {
provider = local
}
}
The orchestrator and the gateway must be switched off.
argos {
orchestrator {
enable = false
}
gateway {
enable = false
}
}
To configure your Argos instance in a local mode, you have to configure the Akka system with the remote provider specifying on which hostname and tcp port the process will bind and listen.
akka {
actor {
provider = remote
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "a.b.c.d"
port = 9000
}
}
}
The orchestrator and the gateway must be switched on.
The orchestrator exposes two configuration attributes in order to specify the HTTP endpoint configuration :
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the orchestrator is activated |
http-hostname | String |
the hostname/IP address on which the HTTP Endpoint will be bounded |
http-port | Integer |
The HTTP port on which the process will listen |
The gateway exposes two configuration attributes in order to specify the orchestrator contact point configuration :
Parameter | Type | Description |
---|---|---|
enabled | Boolean |
Specify if the gateway is activated |
orchestrator-hostname | String |
the hostname/IP address of the orchestrator (orchestrator should be unique so this may be a remote IP - the one defined in the akka remote section) |
orchestrator-port | Integer |
The tcp port on which the orchestrator process is listen (the one defined in the akka remote section) |
argos {
orchestrator {
enable = true
http-hostname = "a.b.c.d"
http-port = 8080
}
gateway {
enable = true
http-hostname = "a.b.c.d"
http-port = 9000
}
}
akka {
loglevel = "INFO"
actor {
provider = remote
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "a.b.c.d"
port = 9000
}
}
}
argos {
scheduler-interval = 5 seconds
cassandra-version = "2.1"
metrics {
jmx-host = "127.0.0.1"
jmx-port = 7100
}
orchestrator {
enable = true
http-hostname = "a.b.c.d"
http-port = 8080
}
gateway {
enable = true
http-hostname = "a.b.c.d"
http-port = 9000
}
sentinel {
load-avg {
enabled= true
threshold= 20.0
level= "CRITIC"
label= "Load Average"
}
consitency-level {
enabled= true
period = 5 minutes
level= "CRITIC"
label= "Consitency Level"
keyspaces= [
{
name= "excelsior"
cl= "quorum"
},
{
name= "excelsior"
cl= "local_one"
},
{
name= "excelsior"
cl= "all"
}
]
}
consistency-repaired-blocking {
enabled= true
level= "WARNING"
label= "Blocking Read repairs"
}
consistency-repaired-background {
enabled= true
level= "WARNING"
label= "Background Read repairs"
}
connection-timeouts {
enabled= true
level= "WARNING"
label= "Connection Timeouts"
}
dropped-counter {
enabled= true
level= "WARNING"
label= "Dropped Counter Mutation"
}
dropped-mutation {
enabled= true
level= "WARNING"
label= "Dropped Mutation"
}
dropped-read {
enabled= true
level= "WARNING"
label= "Dropped Read"
}
dropped-read-repair {
enabled= true
level= "WARNING"
label= "Dropped ReadRepair"
}
dropped-range-slice {
enabled= true
level= "WARNING"
label= "Dropped Range Slice"
}
dropped-request-response {
enabled= true
level= "WARNING"
label= "Dropped Request Response"
}
dropped-page-range {
enabled= true
level= "WARNING"
label= "Dropped Request Response"
}
storage-space {
enabled= true
level= "CRITIC"
label= "Few Disk Space"
threshold= 50
commitlog-threshold= 5
}
storage-exception {
enabled= true
level= "CRITIC"
label= "Cassandra Storage Exception"
}
storage-hints {
enabled= true
level= "CRITIC"
label= "Network partition"
}
gc-inspector{
enabled= true
level= "WARNING"
label= "GC too long"
threshold= 200
}
blocked-stage-counter {
enabled= true
level= "WARNING"
label= "Stage counter mutation"
}
blocked-stage-gossip {
enabled= true
level= "WARNING"
label= "Stage gossip"
}
blocked-stage-internal {
enabled= true
level= "WARNING"
label= "Stage Internal Response"
}
blocked-stage-memtable {
enabled= true
level= "WARNING"
label= "Stage Memtable Write Flusher"
}
blocked-stage-mutation {
enabled= true
level= "WARNING"
label= "Stage Mutation"
}
blocked-stage-read {
enabled= true
level= "WARNING"
label= "Stage Read"
}
blocked-stage-read-repair {
enabled= true
level= "WARNING"
label= "Stage ReadRepair"
}
blocked-stage-request-response {
enabled= true
level= "WARNING"
label= "Stage Request Response"
}
blocked-stage-compaction {
enabled= true
level= "WARNING"
label= "Stage Compaction Executor"
}
notification-jmx {
enabled= true
level= "INFO"
label= "Progress Event"
}
pending-stage-counter {
enabled= true
level= "INFO"
label= "Stage counter mutation - pending"
threshold= 25
window-size= 10
}
pending-stage-gossip {
enabled= true
level= "INFO"
label= "Stage gossip - pending"
threshold= 25
window-size= 10
}
pending-stage-internal {
enabled= true
level= "INFO"
label= "Stage Internal Response - pending"
threshold= 25
window-size= 10
}
pending-stage-memtable {
enabled= true
level= "INFO"
label= "Stage Memtable Write Flusher - pending"
threshold= 25
window-size= 10
}
pending-stage-mutation {
enabled= true
level= "INFO"
label= "Stage Mutation - pending"
threshold= 25
window-size= 10
}
pending-stage-read {
enabled= true
level= "INFO"
label= "Stage Read - pending"
threshold= 25
window-size= 10
}
pending-stage-read-repair {
enabled= true
level= "INFO"
label= "Stage ReadRepair - pending"
threshold= 25
window-size= 10
}
pending-stage-request-response {
enabled= true
level= "INFO"
label= "Stage Request Response - pending"
threshold= 25
window-size= 10
}
pending-stage-compaction {
enabled= true
level= "INFO"
label= "Stage Compaction Executor - pending"
threshold= 25
window-size= 10
}
}
notifiers {
mail {
providerClass = "io.argos.agent.notifiers.MailNotifierProvider"
smtp-host= "127.0.0.1"
smtp-port= "25"
from= "cassandra-agent@no-reply"
recipients = ["[email protected]", "[email protected]"]
//white-list= ["consitency-level", "notification-jmx"]
//black-list= ["consitency-level", "notification-jmx"]
}
}
}