Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(metrics): separate metrics recorder impl from the server #2561

Merged
merged 10 commits into from
Oct 19, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

14 changes: 4 additions & 10 deletions bin/torii/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@

use anyhow::Context;
use clap::{ArgAction, Parser};
use dojo_metrics::{metrics_process, prometheus_exporter};
use dojo_metrics::exporters::prometheus::PrometheusRecorder;
use dojo_utils::parse::{parse_socket_address, parse_url};
use dojo_world::contracts::world::WorldContractReader;
use sqlx::sqlite::{
Expand Down Expand Up @@ -296,16 +296,10 @@
}

if let Some(listen_addr) = args.metrics {
let prometheus_handle = prometheus_exporter::install_recorder("torii")?;

info!(target: LOG_TARGET, addr = %listen_addr, "Starting metrics endpoint.");
prometheus_exporter::serve(
listen_addr,
prometheus_handle,
metrics_process::Collector::default(),
Vec::new(),
)
.await?;
let prometheus_handle = PrometheusRecorder::install("torii")?;
let server = dojo_metrics::Server::new(prometheus_handle).with_process_metrics();
tokio::spawn(server.start(listen_addr));

Check warning on line 302 in bin/torii/src/main.rs

View check run for this annotation

Codecov / codecov/patch

bin/torii/src/main.rs#L300-L302

Added lines #L300 - L302 were not covered by tests
}

let engine_handle = tokio::spawn(async move { engine.start().await });
Expand Down
30 changes: 13 additions & 17 deletions crates/katana/node/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
use config::metrics::MetricsConfig;
use config::rpc::{ApiKind, RpcConfig};
use config::{Config, SequencingConfig};
use dojo_metrics::prometheus_exporter::PrometheusHandle;
use dojo_metrics::{metrics_process, prometheus_exporter, Report};
use dojo_metrics::exporters::prometheus::PrometheusRecorder;
use dojo_metrics::{Report, Server as MetricsServer};
use hyper::{Method, Uri};
use jsonrpsee::server::middleware::proxy_get_request::ProxyGetRequestLayer;
use jsonrpsee::server::{AllowHosts, ServerBuilder, ServerHandle};
Expand Down Expand Up @@ -81,7 +81,6 @@
pub pool: TxPool,
pub db: Option<DbEnv>,
pub task_manager: TaskManager,
pub prometheus_handle: PrometheusHandle,
pub backend: Arc<Backend<BlockifierFactory>>,
pub block_producer: BlockProducer<BlockifierFactory>,
pub rpc_config: RpcConfig,
Expand All @@ -98,23 +97,19 @@
let chain = self.backend.chain_spec.id;
info!(%chain, "Starting node.");

// TODO: maybe move this to the build stage
if let Some(ref cfg) = self.metrics_config {
let addr = cfg.addr;
let mut reports = Vec::new();
let mut reports: Vec<Box<dyn Report>> = Vec::new();

Check warning on line 102 in crates/katana/node/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/katana/node/src/lib.rs#L102

Added line #L102 was not covered by tests

if let Some(ref db) = self.db {
reports.push(Box::new(db.clone()) as Box<dyn Report>);
}

prometheus_exporter::serve(
addr,
self.prometheus_handle.clone(),
metrics_process::Collector::default(),
reports,
)
.await?;
let exporter = PrometheusRecorder::current().expect("qed; should exist at this point");
kariy marked this conversation as resolved.
Show resolved Hide resolved
let server = MetricsServer::new(exporter).with_process_metrics().with_reports(reports);

Check warning on line 109 in crates/katana/node/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/katana/node/src/lib.rs#L108-L109

Added lines #L108 - L109 were not covered by tests

info!(%addr, "Metrics endpoint started.");
self.task_manager.task_spawner().build_task().spawn(server.start(cfg.addr));
info!(addr = %cfg.addr, "Metrics server started.");

Check warning on line 112 in crates/katana/node/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/katana/node/src/lib.rs#L111-L112

Added lines #L111 - L112 were not covered by tests
}

let pool = self.pool.clone();
Expand Down Expand Up @@ -156,9 +151,11 @@
/// This returns a [`Node`] instance which can be launched with the all the necessary components
/// configured.
pub async fn build(mut config: Config) -> Result<Node> {
// Metrics recorder must be initialized before calling any of the metrics macros, in order
// for it to be registered.
let prometheus_handle = prometheus_exporter::install_recorder("katana")?;
if config.metrics.is_some() {
// Metrics recorder must be initialized before calling any of the metrics macros, in order
// for it to be registered.
let _ = PrometheusRecorder::install("katana")?;

Check warning on line 157 in crates/katana/node/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/katana/node/src/lib.rs#L157

Added line #L157 was not covered by tests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ohayo, sensei! Remove unnecessary assignment to _

Assigning the result to _ is unnecessary when using the ? operator. You can directly call the function and let the ? operator handle any errors.

Apply this diff:

-let _ = PrometheusRecorder::install("katana")?;
+PrometheusRecorder::install("katana")?;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
let _ = PrometheusRecorder::install("katana")?;
PrometheusRecorder::install("katana")?;

}

// --- build executor factory

Expand Down Expand Up @@ -223,7 +220,6 @@
pool,
backend,
block_producer,
prometheus_handle,
rpc_config: config.rpc,
metrics_config: config.metrics,
messaging_config: config.messaging,
Expand Down
1 change: 1 addition & 0 deletions crates/metrics/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ metrics-derive = "0.1"
metrics-exporter-prometheus = "0.15.3"
metrics-process = "2.1.0"
metrics-util = "0.17.0"
tokio-util.workspace = true

[target.'cfg(not(windows))'.dependencies]
jemalloc-ctl = { version = "0.5.0", optional = true }
Expand Down
7 changes: 7 additions & 0 deletions crates/metrics/src/exporters/mod.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
pub mod prometheus;

/// Trait for metrics recorder whose metrics can be exported.
pub trait Exporter: Clone + Send + Sync {
/// Export the metrics that have been recorded by the metrics thus far.
fn export(&self) -> String;
}
Comment on lines +3 to +7
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Ohayo again, sensei! The Exporter trait looks solid, with a small suggestion.

The Exporter trait is well-defined and documented. The Clone + Send + Sync bounds ensure thread safety, which is crucial for metrics systems. The export method signature is clear and straightforward.

One small suggestion to consider:

You might want to make the export method more flexible by allowing parameters. For example:

fn export(&self, format: Option<ExportFormat>) -> String;

This would allow users to specify different export formats if needed in the future, while maintaining backwards compatibility with the current implementation.

52 changes: 52 additions & 0 deletions crates/metrics/src/exporters/prometheus.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
//! Prometheus exporter

use std::sync::OnceLock;

use metrics_exporter_prometheus::PrometheusBuilder;
pub use metrics_exporter_prometheus::PrometheusHandle as Prometheus;
use metrics_util::layers::{PrefixLayer, Stack};
use tracing::info;

use crate::exporters::Exporter;
use crate::Error;

static PROMETHEUS_HANDLE: OnceLock<Prometheus> = OnceLock::new();

/// Prometheus exporter recorder.
#[derive(Debug)]
pub struct PrometheusRecorder;

impl PrometheusRecorder {
/// Installs Prometheus as the metrics recorder.
///
/// ## Arguments
///
/// * `prefix` - Apply a prefix to all metrics keys.
pub fn install(prefix: &str) -> Result<Prometheus, Error> {
let recorder = PrometheusBuilder::new().build_recorder();
let handle = recorder.handle();

// Build metrics stack and install the recorder
Stack::new(recorder)
.push(PrefixLayer::new(prefix))
.install()
.map_err(|_| Error::GlobalRecorderAlreadyInstalled)?;

Check warning on line 33 in crates/metrics/src/exporters/prometheus.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/exporters/prometheus.rs#L25-L33

Added lines #L25 - L33 were not covered by tests

info!(target: "metrics", %prefix, "Prometheus recorder installed.");

Check warning on line 35 in crates/metrics/src/exporters/prometheus.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/exporters/prometheus.rs#L35

Added line #L35 was not covered by tests

let _ = PROMETHEUS_HANDLE.set(handle.clone());
kariy marked this conversation as resolved.
Show resolved Hide resolved

Ok(handle)
}

Check warning on line 40 in crates/metrics/src/exporters/prometheus.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/exporters/prometheus.rs#L37-L40

Added lines #L37 - L40 were not covered by tests

/// Get the handle to the installed Prometheus recorder (if any).
pub fn current() -> Option<Prometheus> {
PROMETHEUS_HANDLE.get().cloned()
}

Check warning on line 45 in crates/metrics/src/exporters/prometheus.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/exporters/prometheus.rs#L43-L45

Added lines #L43 - L45 were not covered by tests
}

impl Exporter for Prometheus {
fn export(&self) -> String {
self.render()
}

Check warning on line 51 in crates/metrics/src/exporters/prometheus.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/exporters/prometheus.rs#L49-L51

Added lines #L49 - L51 were not covered by tests
}
25 changes: 24 additions & 1 deletion crates/metrics/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,8 @@
pub mod prometheus_exporter;
pub mod exporters;
mod process;
mod server;

use std::net::SocketAddr;

#[cfg(all(feature = "jemalloc", unix))]
use jemallocator as _;
Expand All @@ -8,16 +12,35 @@
pub use metrics_derive::Metrics;
/// Re-export the metrics-process crate
pub use metrics_process;
pub use server::*;

// We use jemalloc for performance reasons
#[cfg(all(feature = "jemalloc", unix))]
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

#[derive(Debug, thiserror::Error)]

Check warning on line 22 in crates/metrics/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/lib.rs#L22

Added line #L22 was not covered by tests
pub enum Error {
#[error("global metrics recorder already installed.")]
GlobalRecorderAlreadyInstalled,

#[error("could not bind to address: {addr}")]
FailedToBindAddress { addr: SocketAddr },

#[error(transparent)]
Server(#[from] hyper::Error),
}

/// A helper trait for reporting metrics.
///
/// This is meant for types that require a specific trigger to register their metrics.
pub trait Report: Send + Sync {
/// Report the metrics.
fn report(&self);
}

impl Report for ::metrics_process::Collector {
fn report(&self) {
self.collect();
}

Check warning on line 45 in crates/metrics/src/lib.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/lib.rs#L43-L45

Added lines #L43 - L45 were not covered by tests
}
122 changes: 122 additions & 0 deletions crates/metrics/src/process.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
use metrics::{describe_gauge, gauge};

const LOG_TARGET: &str = "metrics";

#[cfg(all(feature = "jemalloc", unix))]
pub fn collect_memory_stats() {
use jemalloc_ctl::{epoch, stats};

if epoch::advance()
.map_err(|error| {
tracing::error!(

Check warning on line 11 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L6-L11

Added lines #L6 - L11 were not covered by tests
target: LOG_TARGET,
error = %error,
"Advance jemalloc epoch."

Check warning on line 14 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L14

Added line #L14 was not covered by tests
)
})
.is_err()

Check warning on line 17 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L16-L17

Added lines #L16 - L17 were not covered by tests
{
return;
}

Check warning on line 20 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L19-L20

Added lines #L19 - L20 were not covered by tests
kariy marked this conversation as resolved.
Show resolved Hide resolved

if let Ok(value) = stats::active::read().map_err(|error| {
tracing::error!(

Check warning on line 23 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L22-L23

Added lines #L22 - L23 were not covered by tests
target: LOG_TARGET,
error = %error,
"Read jemalloc.stats.active."

Check warning on line 26 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L26

Added line #L26 was not covered by tests
)
}) {
gauge!("jemalloc.active").increment(value as f64);
kariy marked this conversation as resolved.
Show resolved Hide resolved
}

Check warning on line 30 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L28-L30

Added lines #L28 - L30 were not covered by tests

if let Ok(value) = stats::allocated::read().map_err(|error| {
tracing::error!(

Check warning on line 33 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L32-L33

Added lines #L32 - L33 were not covered by tests
target: LOG_TARGET,
error = %error,
"Read jemalloc.stats.allocated."

Check warning on line 36 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L36

Added line #L36 was not covered by tests
)
}) {
gauge!("jemalloc.allocated").increment(value as f64);
}

Check warning on line 40 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L38-L40

Added lines #L38 - L40 were not covered by tests

if let Ok(value) = stats::mapped::read().map_err(|error| {
tracing::error!(

Check warning on line 43 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L42-L43

Added lines #L42 - L43 were not covered by tests
target: LOG_TARGET,
error = %error,
"Read jemalloc.stats.mapped."

Check warning on line 46 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L46

Added line #L46 was not covered by tests
)
}) {
gauge!("jemalloc.mapped").increment(value as f64);
}

Check warning on line 50 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L48-L50

Added lines #L48 - L50 were not covered by tests

if let Ok(value) = stats::metadata::read().map_err(|error| {
tracing::error!(

Check warning on line 53 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L52-L53

Added lines #L52 - L53 were not covered by tests
target: LOG_TARGET,
error = %error,
"Read jemalloc.stats.metadata."

Check warning on line 56 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L56

Added line #L56 was not covered by tests
)
}) {
gauge!("jemalloc.metadata").increment(value as f64);
}

Check warning on line 60 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L58-L60

Added lines #L58 - L60 were not covered by tests

if let Ok(value) = stats::resident::read().map_err(|error| {
tracing::error!(

Check warning on line 63 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L62-L63

Added lines #L62 - L63 were not covered by tests
target: LOG_TARGET,
error = %error,
"Read jemalloc.stats.resident."

Check warning on line 66 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L66

Added line #L66 was not covered by tests
)
}) {
gauge!("jemalloc.resident").increment(value as f64);
}

Check warning on line 70 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L68-L70

Added lines #L68 - L70 were not covered by tests

if let Ok(value) = stats::retained::read().map_err(|error| {
tracing::error!(

Check warning on line 73 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L72-L73

Added lines #L72 - L73 were not covered by tests
target: LOG_TARGET,
error = %error,
"Read jemalloc.stats.retained."

Check warning on line 76 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L76

Added line #L76 was not covered by tests
)
}) {
gauge!("jemalloc.retained").increment(value as f64);
}
kariy marked this conversation as resolved.
Show resolved Hide resolved
}

Check warning on line 81 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L78-L81

Added lines #L78 - L81 were not covered by tests

#[cfg(all(feature = "jemalloc", unix))]
pub fn describe_memory_stats() {

Check warning on line 84 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L84

Added line #L84 was not covered by tests
describe_gauge!(
"jemalloc.active",
metrics::Unit::Bytes,
"Total number of bytes in active pages allocated by the application"
);
describe_gauge!(
"jemalloc.allocated",
metrics::Unit::Bytes,
"Total number of bytes allocated by the application"
);
describe_gauge!(
"jemalloc.mapped",
metrics::Unit::Bytes,
"Total number of bytes in active extents mapped by the allocator"
);
describe_gauge!(
"jemalloc.metadata",
metrics::Unit::Bytes,
"Total number of bytes dedicated to jemalloc metadata"
);
describe_gauge!(
"jemalloc.resident",
metrics::Unit::Bytes,
"Total number of bytes in physically resident data pages mapped by the allocator"
);
describe_gauge!(
"jemalloc.retained",
metrics::Unit::Bytes,
"Total number of bytes in virtual memory mappings that were retained rather than being \
returned to the operating system via e.g. munmap(2)"
);
}

Check warning on line 116 in crates/metrics/src/process.rs

View check run for this annotation

Codecov / codecov/patch

crates/metrics/src/process.rs#L116

Added line #L116 was not covered by tests

#[cfg(not(all(feature = "jemalloc", unix)))]
pub fn collect_memory_stats() {}

#[cfg(not(all(feature = "jemalloc", unix)))]
pub fn describe_memory_stats() {}
Loading
Loading