Skip to content

kedark3/cpa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

0382386 · Nov 23, 2021

History

33 Commits
Nov 23, 2021
Nov 23, 2021
Nov 23, 2021
Nov 9, 2021
Oct 27, 2021
Nov 8, 2021
Nov 23, 2021
Nov 10, 2021
Nov 10, 2021
Nov 23, 2021

Repository files navigation

CPA - Continuous Performance Analysis

This tool allows OpenShift users to run a watcher for Prometheus queries and define thresholds (using a yaml file) to observe the performance of the OpenShift cluster during performance testing. It could be generalized to run constantly against a cluster and alert you when cluster is looking bad. It may sound like some of the other monitoring & alerting solutions but its supposed to be simple, scalable and user-friendly.

Why use CPA:

  • Runs external and can work with any "Prometheus"
  • Can be extended to run queries other than Prometheus, such as ElasticSearch, or simple OC CLI commands
  • History for each time you run - can be stored in log files

Design:

                        ┌─────────────────────┐                             ┌───────────────────────────┐
                        │                     │                             │           OpenShift       │
                        │ Benchmark Job       │                             │                           │
                        │                     │                             │     ┌───────────────┐     │
                        │ (optional)          │                             │     │ Prometheus    │     │
                        └────────▲────────────┘                             │     │       ▲       │     │
                                 │                                          │     └───────┬───────┘     │ - At least one prometheus cluster info required
                         Ability to kill benchmark job                      │             │             │
                                 │                                          │             │             │
                                 │                                          └─────────────┼─────────────┘
                        ┌────────┴────────────┐                                           │
┌─────────────────┐     │                     │        Determines Url and Token           │
│                 │     │ Continuous Perf     ├───────────────────────────────────────────┘
│  Slack Notifs.  ◄─────┤   Analysis - CPA    │            Runs Queries
│                 │     │                     │
│                 │     │                     │                              ┌──────────────────────────┐
└─────────────────┘     └───────┬─────────────┘                              │                          │
                                │                                            │                          │
                                │         Requires Url and Token             │  Prometheus - external
                                └───────────────────────────────────────────►│                          │
                                              Runs Queries                   │                          │
                                                                             └──────────────────────────┘

Features:

  • Create oc cli connection to OpenShift/Kubernetes using Kubeconfig
  • Determine Prometheus url, bearerToken for OpenShift
  • If Prometheus url, bearerToken already included in the yaml, use that
  • Create yaml format for queries, and expected outcomes (Use a struct to read that in)
  • Spwan go routine to run queries and analyze results
  • Spwan goroutine to receive notification when a query yields "False" value
  • Update to latest go and recompile
  • Add CLI to the program
    • Add a parameter to read different query files in config dir
    • Add parameter for clearing/not-clearing screen
    • Add Parameter for timeout
  • Add a Makefile
  • File logging the output
  • Print output to screen even when logging enabled - simultaneously
  • Let user decide query frequency
  • Slack Notification
  • Notify/Do Something(e.g. Pause/Kill benchmark jobs to preserve cluster) when results don't match conditions
  • Spawn goroutines to keep running queries and evaluating results to handle scale - e.g. when we have very large number of queries in the yaml file, we can divide and concurrently run queries
  • If slack config is not set, it is ignored and no attempts will be made to notify via slack
  • debug/verbose mode
  • Enhance log files to include uuid/time
  • Use env vars
  • RFE: come up with a basic "cluster health" profile that anyone can use. Operator monitoring + some best practice monitors from the dittybopper dashboards

Usage:

  • Then build the binary using make file: make build or update your binary using make update. You Can clean existin binary with make clean or do clean and update/build using make all.
  • Set KUBECONFIG envvar, and make sure to review config/queries.yaml.
  • You can then run the following command:
./bin/cpa -t 60s -h
Usage: cpa [--noclrscr] [--queries QUERIES] [--query-frequency QUERY-FREQUENCY] [--timeout TIMEOUT] [--log-output] [--terminate-benchmark TERMINATE-BENCHMARK]

Options:
  --noclrscr             Do not clear screen after each iteration. Clears screen by default. [default: false]
  --queries QUERIES, -q QUERIES
                         queries file to use [default: queries.yaml]
  --query-frequency QUERY-FREQUENCY, -f QUERY-FREQUENCY
                         How often do we run queries. You can pass values like 4h or 1h10m10s [default: 20s]
  --timeout TIMEOUT, -t TIMEOUT
                         Duration to run Continuous Performance Analysis. You can pass values like 4h or 1h10m10s [default: 4h]
  --log-output, -l       Output will be stored in a log file(cpa.log) in addition to stdout. [default: false]
  --terminate-benchmark TERMINATE-BENCHMARK, -k TERMINATE-BENCHMARK
                         When CPA is running in parallel with benchmark job, let CPA know to kill benchmark if any query fail. (E.g. -k <processID>) Helpful to preserve cluster for further analysis.
  --help, -h             display this help and exit