Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kafka/server: add metrics and config for consumer lag reporting #24977

Open
wants to merge 3 commits into
base: dev
Choose a base branch
from

Conversation

IoannisRP
Copy link
Contributor

@IoannisRP IoannisRP commented Jan 29, 2025

Implements: https://redpandadata.atlassian.net/browse/CORE-8914

Introduce "enable_consumer_group_lag_metrics" which controls whether the consumer lag metrics are active. This can be changed without needing a restart.

Introduce the metrics scaffolding needed to have metrics that can be enabled/disabled at runtime.

Metric Type Description Labels Aggregation labels
redpanda_kafka_consumer_group_lag_max gauge Maximum consumer group lag across all partitions in a group group, shard
redpanda_kafka_consumer_group_lag_sum gauge Sum of consumer group lag for all partitions in a group group, shard

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

Note that this commit contains only the metric infrastructure, i.e. the
probe and the mechanism to dynamically enable/disable these metrics.
A subsequent commit will implement the logic to populate the consumer
lag metrics data.
@IoannisRP IoannisRP requested review from BenPope and a team January 29, 2025 15:13
@IoannisRP IoannisRP requested a review from a team as a code owner January 29, 2025 15:13
@IoannisRP IoannisRP changed the title Core 8914/consumer lag config kafka/server: add metrics and config for consumer lag reporting Jan 29, 2025
@vbotbuildovich
Copy link
Collaborator

CI test results

test results on build#61359
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61359#0194b2e4-6b16-498b-a3a6-1735b6488617 FLAKY 1/2
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=False.mixed_versions=False.with_tiered_storage=False.with_iceberg=True.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/61359#0194b2e9-78d9-4746-a266-103519385a06 FLAKY 1/2

Copy link
Contributor

@michael-redpanda michael-redpanda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@@ -965,6 +967,7 @@ class group final : public ss::enable_lw_shared_from_this<group> {
chunked_hash_map<model::topic_partition, offset_metadata>
_pending_offset_commits;
enable_group_metrics _enable_group_metrics;
config::binding<bool> _enable_consumer_lag_metrics;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you aren't using or assigning this in this commit so it should be moved to a different commit

Comment on lines +681 to +684
# wait for some messages
wait_until(
lambda: ConsumerGroupTest.group_consumed_at_least(
consumers, 50 * len(consumers)), 30, 2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add an error message in wait_until

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants