Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete dry-run and get manager image free #1234

Closed
a-thaler opened this issue Jul 2, 2024 · 0 comments
Closed

Delete dry-run and get manager image free #1234

a-thaler opened this issue Jul 2, 2024 · 0 comments
Assignees
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@a-thaler
Copy link
Collaborator

a-thaler commented Jul 2, 2024

Description
As part of #767 the manager should get freed of the validating webhook for the LogPipeline, which does some validation and also executes the pipeline validation using the fluentbit dry-run mode.

The dry-run mode was initially introduced to have advanced validation options for the unsupported mode, so when providing free-style fluentbit filter and outputs. Here, the dry-run mode can bring value as it will check for syntax and semantic problems in the free-style texts.
The price for that feature is a hard dependency to the fluentbit binary as part of the manager container, incl debian base image usage.

When the dry-run mode got introduced, no e2e tests were written, so it is hard to say if that functionality is still revealing the expected outcome, manual tests are positive so far. However, the feedback very poor "error: logpipelines.telemetry.kyma-project.io "cls" is invalid", there is no detailed message to figure out what is wrong and you have no chance to look it up anywhere.

Also, leveraging the otel-collector in future, an unsupported mode will not be present anymore providing free-text possibilities.

Instead of moving the logic into a validation phase as part of the reconciler, we should just directly remove the feature, reducing complexity a lot and runtime dependencies. Instead, we should improve the agent health status to reflect startup problems in a meaningful way. If a pod exits with code 255 for example, we could use a dedicated reason with a message iondicating to check the pod logs, where a more advanced message can be found.

Criterias

  • The dry-run functionality is removed
  • The manager base image is "from scratch"
  • Make sure that If a configuration mistake in a custom component happens, the AgentHealthy condition should indicate a problem with the agent, and the message should indicate to check the logs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logs LogPipeline kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

2 participants