Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components #11051

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

OwenCorrigan76
Copy link
Contributor

@OwenCorrigan76 OwenCorrigan76 commented Jan 16, 2025

Type of change

  • Enhancement / new feature

Description

This patch adds support for the Strimzi Metrics Reporter to Kafka brokers/controllers components as described by the following proposal:

https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md

Related to #10753

Support for Kafka Connect and MirrorMaker2 will be added in subsequent PRs.

Checklist

  • Write tests
  • Make sure all tests pass
  • Update documentation
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Reference relevant issue(s) and close them after merging
  • Update CHANGELOG.md

@OwenCorrigan76 OwenCorrigan76 requested review from a team and mimaison January 16, 2025 17:15
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from e569be2 to 1180ee2 Compare January 16, 2025 17:23
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments after an initial pass. You should also update the names / description / CHANGELOG to make it clear what this really does. Kafka compoenents are brokers/controllers, Connect and MM2. You add this only for brokers/controllers. It should be clear from the CHANGELOG and PR name / desc.

CHANGELOG.md Outdated
@@ -6,6 +6,7 @@
* Support for MirrorMaker 1 has been removed
* Added support to configure `dnsPolicy` and `dnsConfig` using the `template` sections.
* Store Kafka node certificates in separate Secrets, one Secret per pod.
* Added support for Strimzi Metrics Reporter to the Kafka component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Kafka component?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback @scholzj. I will address this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressed yet I guess?

Suggested change
* Added support for Strimzi Metrics Reporter to the Kafka component.
* Added support for Strimzi Metrics Reporter to the Kafka brokers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to Kafka brokers / controllers

@@ -62,7 +62,8 @@ public class KafkaClusterSpec implements HasConfigurableMetrics, HasConfigurable
+ "cruise.control.metrics.topic, cruise.control.metrics.reporter.bootstrap.servers, "
+ "node.id, process.roles, controller., metadata.log.dir, zookeeper.metadata.migration.enable, " // KRaft options
+ "client.quota.callback.static.kafka.admin., client.quota.callback.static.produce, client.quota.callback.static.fetch, "
+ "client.quota.callback.static.storage.per.volume.limit.min.available., client.quota.callback.static.excluded.principal.name.list";
+ "client.quota.callback.static.storage.per.volume.limit.min.available., client.quota.callback.static.excluded.principal.name.list, "
+ "kafka.metric.reporters, prometheus.metrics.reporter.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we disabling kafka.metric.reporters? What if someone uses it? If we want to disable it, it is definitely worth adding to CHANGELOG. But maybe it should be kept configurable?

Copy link
Contributor Author

@OwenCorrigan76 OwenCorrigan76 Jan 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming here is a little unfortunate. We are not disabling metric.reporters but we are locking down kafka.metrics.reporters as the proposal requests. metric.reporters is a Kafka config that is configurable but kafka.metrics.reporters is a Reporter config that is not configurable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no option metrics.reporters. You are disabling the option kafka.metric.reporters that is a standard broker configuration option and might be used by the users. If nothing else, this deserves more discussion as it has backward compatibility implications. Was it part of the proposal? I do not remember that.

From the Kafka docs, kafka.metric.reporters seems to be a list. So are there any reasons why we can't have custom and our reporters?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have to remove kafka.metric.reporters here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed this. I will fix

@@ -67,7 +67,7 @@ public static Map<String, String> generateMetricsAndLogConfigMapData(Reconciliat
data.put(supportsLogging.logging().configMapKey(), supportsLogging.logging().loggingConfiguration(reconciliation, metricsAndLogging.loggingCm()));
}

if (model instanceof SupportsMetrics supportMetrics) {
if (model instanceof SupportsMetrics supportMetrics && supportMetrics.metrics() != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now refactored

Comment on lines 240 to 236
private MetricsModel jmxExporterMetrics;
private StrimziMetricsReporterModel strimziMetricsReporter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have MetricsModel for a reason. We should try to unify both metrics types into a single class (e.g. have MetricsModel as abstract class or interface and then two implementations for the different metric types).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit deals with this and provides and interface with two implementations.

/**
* The configuration field name for Kafka metric reporters.
*/
public static final String KAFKA_METRIC_REPORTERS_CONFIG_FIELD = "metric.reporters";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not even used here. So why is it defined here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defined it here because this is where the other Kafka config variables were defined. I will move it into the class where it is used and make it private.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 1180ee2 to f943007 Compare January 17, 2025 12:00
@OwenCorrigan76
Copy link
Contributor Author

@scholzj I am currently working on the changes I did not comment on yet.

@fvaleri fvaleri added this to the 0.46.0 milestone Jan 17, 2025
@OwenCorrigan76 OwenCorrigan76 changed the title Add support for the Strimzi Metrics Reporter to Kafka component Add support for the Strimzi Metrics Reporter to Kafka brokers/controllers components Jan 17, 2025
@EqualsAndHashCode()
@ToString
public class StrimziMetricsReporterValues implements UnknownPropertyPreserving {
private static final String DEFAULT_REGEX = ".*";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should have the default tailor-made for each component. (Or it should be required) And the allow list should default to null to allow you to distinguish between not set and set to .*.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still Todo

@@ -31,7 +30,6 @@ public class JmxPrometheusExporterMetrics extends MetricsConfig {
private ExternalConfigurationReference valueFrom;

@Description("ConfigMap entry where the Prometheus JMX Exporter configuration is stored. ")
@JsonProperty(required = true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, you had to make it optional because of how the API is constructed. But it is required. So, we have to make sure:

  • It is handled in the code
  • Use the CEL validation (Use CEL validation in our CRDs #9417) to make this required that way (but this would need to be done right away). If you want, I can look at that when I get back to the office, but that would definitely delay this PR.

Copy link
Member

@scholzj scholzj Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I opened #11068 for the CEL validation. Once/if merged, you would need to add the corresponding rule here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CEL Validation added and working well

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 2 times, most recently from 84fb016 to 8f5b28f Compare February 13, 2025 11:16
Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some more comments. There also seem to be many unaddressed older comments. Not sure if that was intentional or not at this point.

CHANGELOG.md Outdated
@@ -6,6 +6,7 @@
* Support for MirrorMaker 1 has been removed
* Added support to configure `dnsPolicy` and `dnsConfig` using the `template` sections.
* Store Kafka node certificates in separate Secrets, one Secret per pod.
* Added support for Strimzi Metrics Reporter to the Kafka component.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressed yet I guess?

Suggested change
* Added support for Strimzi Metrics Reporter to the Kafka component.
* Added support for Strimzi Metrics Reporter to the Kafka brokers.

@@ -237,6 +249,10 @@ public static CruiseControl fromCrd(
}
}

private boolean hasMetricsConfig() {
return metrics != null && metrics.isEnabled();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there be some situation when it is not null and yet it is disabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. && metrics.isEnabled() now removed.

@@ -499,7 +515,7 @@ public HashLoginServiceApiCredentials apiCredentials() {
/**
* @return Metrics Model instance for configuring Prometheus metrics
*/
public MetricsModel metrics() {
public JmxPrometheusExporterModel metrics() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this not return the generic class?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still returning the concrete class. Please change to MetricsModel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should return the generic class. Now refatored.

*
* @param spec Custom resource section configuring metrics.
*/
public StrimziMetricsReporterModel(HasConfigurableMetrics spec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the way this is constructed. You should check the type you have outside and create the right class given we have always only one type of metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored. Does it look better now?

*
* @param spec Custom resource section configuring metrics
*/
public JmxPrometheusExporterModel(HasConfigurableMetrics spec) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the other class. I don't understand the way this is constructed. You should check the type you have outside and create the right class given we have always only one type of metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored. Does it look better now?

*/
public static final String METRICS_PORT_NAME = "tcp-prometheus";

public interface MetricsModel {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work better as an abstract class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scholzj Wondering, why this would be better as an abstract class? Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit hard to explain and sometimes you need to write the code to see what works better. But I feel like having an abstract class would better encapsulate the shared logic and shared fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will return to this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This directory likely needs some better name + some README with explanation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to strimzi-metrics-reporter and added a brief README

@fvaleri
Copy link
Contributor

fvaleri commented Feb 17, 2025

There also seem to be many unaddressed older comments. Not sure if that was intentional or not at this point.

@scholzj I'm supporting Owen with this feature. He is working on and will eventually address all comments.

@OwenCorrigan76
Copy link
Contributor Author

@scholzj I know progress is going quite slowly. I will address the previous comments shortly. I'm currently working on the CEL validation. When that is committed, I'll go back and tackle the older comments and work through them.

@scholzj
Copy link
Member

scholzj commented Feb 17, 2025

@scholzj I know progress is going quite slowly. I will address the previous comments shortly. I'm currently working on the CEL validation. When that is committed, I'll go back and tackle the older comments and work through them.

No worries Owen, I was just unsure what the expected state is. I did not intend to put any pressure on your. Sorry.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch 3 times, most recently from ece3b1f to 75a829f Compare February 24, 2025 13:23
@OwenCorrigan76
Copy link
Contributor Author

The build failed with the following tests:
io.strimzi.api.kafka.model.kafka.KafkaCrdIT.testKafkaWithNullMaintenance
Currently investigating why.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @OwenCorrigan76, we still need to do some changes and address the remaining comments, but it works fine.

For the KafkaCrdIT.testKafkaWithNullMaintenance failure, I think we simply need to update the expected error messages removing the invalid: prefix. Now, you get invalid: [, rather than invalid: . With CEL validation rules, we get the additional error message "validation rules were not checked".

@@ -62,7 +62,8 @@ public class KafkaClusterSpec implements HasConfigurableMetrics, HasConfigurable
+ "cruise.control.metrics.topic, cruise.control.metrics.reporter.bootstrap.servers, "
+ "node.id, process.roles, controller., metadata.log.dir, zookeeper.metadata.migration.enable, " // KRaft options
+ "client.quota.callback.static.kafka.admin., client.quota.callback.static.produce, client.quota.callback.static.fetch, "
+ "client.quota.callback.static.storage.per.volume.limit.min.available., client.quota.callback.static.excluded.principal.name.list";
+ "client.quota.callback.static.storage.per.volume.limit.min.available., client.quota.callback.static.excluded.principal.name.list, "
+ "kafka.metric.reporters, prometheus.metrics.reporter.";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have to remove kafka.metric.reporters here.

@@ -499,7 +515,7 @@ public HashLoginServiceApiCredentials apiCredentials() {
/**
* @return Metrics Model instance for configuring Prometheus metrics
*/
public MetricsModel metrics() {
public JmxPrometheusExporterModel metrics() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still returning the concrete class. Please change to MetricsModel.

Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OwenCorrigan76 I was wrong on the "Unsupported metrics type", please correct.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 4304607 to f88ec2c Compare February 26, 2025 10:51
Copy link
Contributor

@fvaleri fvaleri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes looks good to me for now, but we still have some comments to address.

@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from f88ec2c to 3c94f43 Compare February 27, 2025 15:39
This patch adds support for the Strimzi Metrics Reporter to brokers and controllers as described by the following proposal:

https://github.com/strimzi/proposals/blob/main/064-prometheus-metrics-reporter.md

Signed-off-by: ocorriga <[email protected]>
ocorriga added 7 commits March 4, 2025 11:55
Address initial feedback

Signed-off-by: ocorriga <[email protected]>

Reverse dashboards and fix build issue

Signed-off-by: ocorriga <[email protected]>

Update derived resources

Signed-off-by: ocorriga <[email protected]>

Trigger build

Signed-off-by: Federico Valeri <[email protected]>

Add MetricsModel interface and refactor affected classes

Signed-off-by: ocorriga <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: ocorriga <[email protected]>
Signed-off-by: ocorriga <[email protected]>
@OwenCorrigan76 OwenCorrigan76 force-pushed the integrate_metrics_reporter branch from 3c94f43 to 3969049 Compare March 4, 2025 12:21
@OwenCorrigan76 OwenCorrigan76 requested a review from fvaleri March 4, 2025 14:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants