Prometheus monitoring for Kitex

English | 中文

Abstract

The approximate workflow of Prometheus:

The Prometheus server periodically pulls metrics from configured jobs or exporters (pull mode), or receives metrics pushed from Pushgateway (push mode), or fetches metrics from other Prometheus servers.
The Prometheus server locally stores the collected metrics, runs defined alert.rules, records new time series, or sends alerts to Alertmanager.
Alertmanager processes received alerts according to the configuration file and issues alarms.
In the graphical interface, visualizes collected data, for example, integrating with Grafana.

Data Model

The data stored in Prometheus consists of time series, uniquely identified by a metric's name and a series of labels (key-value pairs), where different labels represent different time series.

name: Typically represents the functionality of the metric; note that metric names consist of ASCII characters, digits, underscores, and colons and must adhere to the regular expression [a-zA-Z_:][a-zA-Z0-9_:]*.
tag: Identifies feature dimensions for filtering and aggregation. For example, PSM and method information. Tag keys consist of ASCII characters, digits, and underscores and must adhere to the regular expression [a-zA-Z_:][a-zA-Z0-9_:]*.
sample: Actual time series comprising a float64 value and a timestamp in milliseconds.
metric: Represented in the following format: <metric name>{<label name>=<label value>, ...}

Metric Types

Counter

Understandable as an increment-only counter, typical applications include counting requests, completed tasks, occurring errors, etc.
Corresponds to gopkg/metrics' EmitCounter.

Gauge

A standard metric; typical applications include counting goroutines.
Can be increased or decreased arbitrarily.
Corresponds to gopkg/metrics' EmitStore.

Histogram

Generates histogram data used for statistical analysis of sample distributions; typical applications include pct99, average CPU usage, etc.
Allows sampling, grouping, and statistics on observed results.
Corresponds to gopkg/metrics' EmitTimer.

Summary

Similar to Histogram, providing count and sum functions for observed values.
Offers percentile functionality, dividing tracked results by percentage.
Summary's percentiles are calculated directly on the client-side, resulting in better performance when querying via PromQL. In contrast, Histogram consumes more resources; for clients, Histogram consumes fewer resources.

Labels

type - Request type
- pingpong - Single request, single response
- oneway - Single request, no response
- streaming - Multiple requests, multiple responses
caller - Requesting service name
callee - Requested service name
method - Request method name
status - Status after a complete RPC:
- succeed - Request successful
- error - Request failed

Metrics

Total number of requests handled by the Client:
- Name: kitex_client_throughput
- Tags: type, caller, callee, method, status
Latency of request handling at the Client (Response received time - Request initiation time, in microseconds):
- Name: kitex_client_latency_us
- Tags: type, caller, callee, method, status
Total number of requests handled by the Server:
- Name: kitex_server_throughput
- Tags: type, caller, callee, method, status
Latency of request handling at the Server (Processing completion time - Request received time, in microseconds):
- Name: kitex_server_latency_us
- Tags: type, caller, callee, method, status

Useful Examples

For Prometheus query syntax, refer to Querying basics | Prometheus. Here are some commonly used examples:

Client throughput of succeed requests

sum(rate(kitex_client_throughput{status="succeed"}[1m])) by (callee,method)

Client latency pct99 of succeed requests

histogram_quantile(0.99,
sum(rate(kitex_client_latency_us_bucket{status="succeed"}[1m])) by (caller,callee,method,le)
)

Server throughput of succeed requests

sum(rate(kitex_server_throughput{status="succeed"}[1m])) by (code,callee,method)

Server latency pct99 of succeed requests

histogram_quantile(0.99,
sum(rate(kitex_server_latency_us_bucket{status="succeed"}[5m])) by (caller,callee,method,le)
)

Pingpong request error rate

sum(rate(kitex_server_throughput{status="error"}[1m])) by (status,callee,method)

Usage Example

Client

import (
    "github.com/kitex-contrib/monitor-prometheus"
    kClient "github.com/cloudwego/kitex/client"
)

...
	client, _ := testClient.NewClient(
	"DestServiceName", 
	kClient.WithTracer(prometheus.NewClientTracer(":9091", "/kitexclient")))
	
	resp, _ := client.Send(ctx, req)
...

Server

import (
    "github.com/kitex-contrib/monitor-prometheus"
    kServer "github.com/cloudwego/kitex/server"
)

func main() {
...
	svr := api.NewServer(
	    &myServiceImpl{}, 
	    kServer.WithTracer(prometheus.NewServerTracer(":9092", "/kitexserver")))
	svr.Run()
...
}

Visualization Interface

Installing Prometheus

Refer to Official Documentation, download and install the Prometheus server.
Edit prometheus.yml, modify the scrape_configs item:

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: 'kitexclient'
  scrape_interval: 1s
  metrics_path: /kitexclient
  static_configs:
  - targets: ['localhost:9091'] # scrape data endpoint
- job_name: 'kitexserver'
  scrape_interval: 1s
  metrics_path: /kitexserver
  static_configs:
  - targets: ['localhost:9092'] # scrape data endpoint

Start Prometheus:

prometheus --config.file=prometheus.yml --web.listen-address="0.0.0.0:9090"

Access http://localhost:9090/targets in your browser to view the configured scrape nodes.

Installing Grafana

Refer to the official website, download and install Grafana.
Access http://localhost:3000 in your browser; the default username and password are both admin.
Configure the data source by navigating to Configuration -> Data Source -> Add data source. After configuring, click on Save & Test to verify if it's functioning properly.
Create monitoring dashboards by going to Create -> Dashboard. Add metrics like throughput and pct99 based on your requirements. You can refer to the sample configurations provided in the "Useful Examples" section above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Prometheus monitoring for Kitex

Abstract

Data Model

Metric Types

Counter

Gauge

Histogram

Summary

Labels

Metrics

Useful Examples

Usage Example

Client

Server

Visualization Interface

Installing Prometheus

Installing Grafana

Files

README.md

Latest commit

History

README.md

File metadata and controls

Prometheus monitoring for Kitex

Abstract

Data Model

Metric Types

Counter

Gauge

Histogram

Summary

Labels

Metrics

Useful Examples

Usage Example

Client

Server

Visualization Interface

Installing Prometheus

Installing Grafana