Skip to content

Commit

Permalink
boulder-observer (letsencrypt#5315)
Browse files Browse the repository at this point in the history
Add configuration driven Prometheus black box metric exporter
  • Loading branch information
beautifulentropy authored Mar 29, 2021
1 parent 1e5d89e commit 97e393d
Show file tree
Hide file tree
Showing 19 changed files with 1,495 additions and 0 deletions.
216 changes: 216 additions & 0 deletions cmd/boulder-observer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
# boulder-observer

A modular configuration driven approach to black box monitoring with
Prometheus.

* [boulder-observer](#boulder-observer)
* [Usage](#usage)
* [Options](#options)
* [Starting the boulder-observer
daemon](#starting-the-boulder-observer-daemon)
* [Configuration](#configuration)
* [Root](#root)
* [Schema](#schema)
* [Example](#example)
* [Monitors](#monitors)
* [Schema](#schema-1)
* [Example](#example-1)
* [Probers](#probers)
* [DNS](#dns)
* [Schema](#schema-2)
* [Example](#example-2)
* [HTTP](#http)
* [Schema](#schema-3)
* [Example](#example-3)
* [Metrics](#metrics)
* [obs_monitors](#obs_monitors)
* [obs_observations](#obs_observations)
* [Development](#development)
* [Starting Prometheus locally](#starting-prometheus-locally)
* [Viewing metrics locally](#viewing-metrics-locally)

## Usage

### Options

```shell
$ ./boulder-observer -help
-config string
Path to boulder-observer configuration file (default "config.yml")
```

### Starting the boulder-observer daemon

```shell
$ ./boulder-observer -config test/config-next/observer.yml
I152525 boulder-observer _KzylQI Versions: main=(Unspecified Unspecified) Golang=(go1.16.2) BuildHost=(Unspecified)
I152525 boulder-observer q_D84gk Initializing boulder-observer daemon from config: test/config-next/observer.yml
I152525 boulder-observer 7aq68AQ all monitors passed validation
I152527 boulder-observer yaefiAw kind=[HTTP] success=[true] duration=[0.130097] name=[https://letsencrypt.org-[200]]
I152527 boulder-observer 65CuDAA kind=[HTTP] success=[true] duration=[0.148633] name=[http://letsencrypt.org/foo-[200 404]]
I152530 boulder-observer idi4rwE kind=[DNS] success=[false] duration=[0.000093] name=[[2606:4700:4700::1111]:53-udp-A-google.com-recurse]
I152530 boulder-observer prOnrw8 kind=[DNS] success=[false] duration=[0.000242] name=[[2606:4700:4700::1111]:53-tcp-A-google.com-recurse]
I152530 boulder-observer 6uXugQw kind=[DNS] success=[true] duration=[0.022962] name=[1.1.1.1:53-udp-A-google.com-recurse]
I152530 boulder-observer to7h-wo kind=[DNS] success=[true] duration=[0.029860] name=[owen.ns.cloudflare.com:53-udp-A-letsencrypt.org-no-recurse]
I152530 boulder-observer ovDorAY kind=[DNS] success=[true] duration=[0.033820] name=[owen.ns.cloudflare.com:53-tcp-A-letsencrypt.org-no-recurse]
...
```

## Configuration

Configuration is provided via a YAML file.

### Root

#### Schema

`debugaddr`: The Prometheus scrape port prefixed with a single colon
(e.g. `:8040`).

`buckets`: List of floats representing Prometheus histogram buckets (e.g
`[.001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10]`)

`syslog`: Map of log levels, see schema below.

- `stdoutlevel`: Log level for stdout, see legend below.
- `sysloglevel`:Log level for stdout, see legend below.

`0`: *EMERG* `1`: *ALERT* `2`: *CRIT* `3`: *ERR* `4`: *WARN* `5`:
*NOTICE* `6`: *INFO* `7`: *DEBUG*

`monitors`: List of monitors, see [monitors](#monitors) for schema.

#### Example

```yaml
debugaddr: :8040
buckets: [.001, .002, .005, .01, .02, .05, .1, .2, .5, 1, 2, 5, 10]
syslog:
stdoutlevel: 6
sysloglevel: 6
-
...
```
### Monitors
#### Schema
`period`: Interval between probing attempts (e.g. `1s` `1m` `1h`).

`kind`: Kind of prober to use, see [probers](#probers) for schema.

`settings`: Map of prober settings, see [probers](#probers) for schema.

#### Example

```yaml
monitors:
-
period: 5s
kind: DNS
settings:
...
```

### Probers

#### DNS

##### Schema

`protocol`: Protocol to use, options are: `udp` or `tcp`.

`server`: Hostname, IPv4 address, or IPv6 address surrounded with
brackets + port of the DNS server to send the query to (e.g.
`example.com:53`, `1.1.1.1:53`, or `[2606:4700:4700::1111]:53`).

`recurse`: Bool indicating if recursive resolution is desired.

`query_name`: Name to query (e.g. `example.com`).

`query_type`: Record type to query, options are: `A`, `AAAA`, `TXT`, or
`CAA`.

##### Example

```yaml
monitors:
-
period: 5s
kind: DNS
settings:
protocol: tcp
server: [2606:4700:4700::1111]:53
recurse: false
query_name: letsencrypt.org
query_type: A
```

#### HTTP

##### Schema

`url`: Scheme + Hostname to send a request to (e.g.
`https://example.com`).

`rcodes`: List of expected HTTP response codes.

##### Example

```yaml
monitors:
-
period: 2s
kind: HTTP
settings:
url: http://letsencrypt.org/FOO
rcodes: [200, 404]
```

## Metrics

Observer provides the following metrics.

### obs_monitors

Count of configured monitors.

**Labels:**

`kind`: Kind of Prober the monitor is configured to use.

`valid`: Bool indicating whether settings provided could be validated
for the `kind` of Prober specified.

### obs_observations

**Labels:**

`name`: Name of the monitor.

`kind`: Kind of prober the monitor is configured to use.

`duration`: Duration of the probing in seconds.

`success`: Bool indicating whether the result of the probe attempt was
successful.

**Bucketed response times:**

This is configurable, see `buckets` under [root/schema](#schema).

## Development

### Starting Prometheus locally

Please note, this assumes you've installed a local Prometheus binary.

```shell
prometheus --config.file=boulder/test/prometheus/prometheus.yml
```

### Viewing metrics locally

When developing with a local Prometheus instance you can use this link
to view metrics: [link](http://0.0.0.0:9090)
35 changes: 35 additions & 0 deletions cmd/boulder-observer/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
package main

import (
"flag"
"io/ioutil"

"github.com/letsencrypt/boulder/cmd"
"github.com/letsencrypt/boulder/observer"
"gopkg.in/yaml.v2"
)

func main() {
configPath := flag.String(
"config", "config.yml", "Path to boulder-observer configuration file")
flag.Parse()

configYAML, err := ioutil.ReadFile(*configPath)
cmd.FailOnError(err, "failed to read config file")

// Parse the YAML config file.
var config observer.ObsConf
err = yaml.Unmarshal(configYAML, &config)
if err != nil {
cmd.FailOnError(err, "failed to parse YAML config")
}

// Make an `Observer` object.
observer, err := config.MakeObserver()
if err != nil {
cmd.FailOnError(err, "config failed validation")
}

// Start the `Observer` daemon.
observer.Start()
}
63 changes: 63 additions & 0 deletions observer/mon_conf.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
package observer

import (
"errors"
"strings"
"time"

"github.com/letsencrypt/boulder/cmd"
"github.com/letsencrypt/boulder/observer/probers"
"gopkg.in/yaml.v2"
)

// MonConf is exported to receive YAML configuration in `ObsConf`.
type MonConf struct {
Period cmd.ConfigDuration `yaml:"period"`
Kind string `yaml:"kind"`
Settings probers.Settings `yaml:"settings"`
}

// validatePeriod ensures the received `Period` field is at least 1µs.
func (c *MonConf) validatePeriod() error {
if c.Period.Duration < 1*time.Microsecond {
return errors.New("period must be at least 1µs")
}
return nil
}

// unmarshalConfigurer constructs a `Configurer` by marshaling the
// value of the `Settings` field back to bytes, then passing it to the
// `UnmarshalSettings` method of the `Configurer` type specified by the
// `Kind` field.
func (c MonConf) unmarshalConfigurer() (probers.Configurer, error) {
kind := strings.Trim(strings.ToLower(c.Kind), " ")
configurer, err := probers.GetConfigurer(kind)
if err != nil {
return nil, err
}
settings, _ := yaml.Marshal(c.Settings)
configurer, err = configurer.UnmarshalSettings(settings)
if err != nil {
return nil, err
}
return configurer, nil
}

// makeMonitor constructs a `monitor` object from the contents of the
// bound `MonConf`. If the `MonConf` cannot be validated, an error
// appropriate for end-user consumption is returned instead.
func (c MonConf) makeMonitor() (*monitor, error) {
err := c.validatePeriod()
if err != nil {
return nil, err
}
probeConf, err := c.unmarshalConfigurer()
if err != nil {
return nil, err
}
prober, err := probeConf.MakeProber()
if err != nil {
return nil, err
}
return &monitor{c.Period.Duration, prober}, nil
}
33 changes: 33 additions & 0 deletions observer/mon_conf_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
package observer

import (
"testing"
"time"

"github.com/letsencrypt/boulder/cmd"
)

func TestMonConf_validatePeriod(t *testing.T) {
type fields struct {
Period cmd.ConfigDuration
}
tests := []struct {
name string
fields fields
wantErr bool
}{
{"valid", fields{cmd.ConfigDuration{Duration: 1 * time.Microsecond}}, false},
{"1 nanosecond", fields{cmd.ConfigDuration{Duration: 1 * time.Nanosecond}}, true},
{"none supplied", fields{cmd.ConfigDuration{}}, true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
c := &MonConf{
Period: tt.fields.Period,
}
if err := c.validatePeriod(); (err != nil) != tt.wantErr {
t.Errorf("MonConf.validatePeriod() error = %v, wantErr %v", err, tt.wantErr)
}
})
}
}
40 changes: 40 additions & 0 deletions observer/monitor.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
package observer

import (
"strconv"
"time"

blog "github.com/letsencrypt/boulder/log"
"github.com/letsencrypt/boulder/observer/probers"
)

type monitor struct {
period time.Duration
prober probers.Prober
}

// start spins off a 'Prober' goroutine on an interval of `m.period`
// with a timeout of half `m.period`
func (m monitor) start(logger blog.Logger) {
ticker := time.NewTicker(m.period)
timeout := m.period / 2
go func() {
for {
select {
case <-ticker.C:
// Attempt to probe the configured target.
success, dur := m.prober.Probe(timeout)

// Produce metrics to be scraped by Prometheus.
histObservations.WithLabelValues(
m.prober.Name(), m.prober.Kind(), strconv.FormatBool(success),
).Observe(dur.Seconds())

// Log the outcome of the probe attempt.
logger.Infof(
"kind=[%s] success=[%v] duration=[%f] name=[%s]",
m.prober.Kind(), success, dur.Seconds(), m.prober.Name())
}
}
}()
}
Loading

0 comments on commit 97e393d

Please sign in to comment.