-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[system] - Add a flag to the system module to control whether metrics failures mark agent as degraded #42160
[system] - Add a flag to the system module to control whether metrics failures mark agent as degraded #42160
Conversation
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
metricbeat/metricbeat.reference.yml
Outdated
# Enable below config to mark metricset as degraded if partial metrics are emitted | ||
#degrade_on_partial: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This either needs more documentation or a more specific name. Users reading this will have no idea what partial metrics mean or what can lead to this happening.
I like the hostmetrics receiver's mute_*
settings as a model: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver#process
Maybe we can't detect in code the specific type of error yet, but we can at least document the error classes this actually covers. We can also expect we may need or want to extend this to be more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll document this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmacknz I've documented in detail in reference.yml. I'll also open up a separate ticket to update https://www.elastic.co/guide/en/beats/metricbeat/current/metricbeat-metricset-system-process.html, once we're satisfied with the wordings here. Let me know what do you think
…into add-degrading-config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, a minor nit on the wording but in reviewing this again I realized documenting this for standalone Beats doesn't really make sense. It has to be documented and exposed in the system integration or nobody can use it.
Co-authored-by: Craig MacKenzie <[email protected]>
@cmacknz I've removed references of config from beats, as it doesn't make sense for standalone beats. I'll work on a followup PR for agent. Let me know if you're good with this PR! Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nit on the sleep in one of the unit tests.
… failures mark agent as degraded (#42160) * chore: initial commit * doc and test cases * lint * remove errors.Is * chore: notice * update docs * Update metricbeat/mb/event.go Co-authored-by: Craig MacKenzie <[email protected]> * remove references * remove sleep --------- Co-authored-by: Craig MacKenzie <[email protected]> (cherry picked from commit 7a2f8d4)
Add a new config to mark metricsets as degarded if partial metrics are enabled.
Currently, it's disabled by default and we will only enable it in integration test cases in elastic-agent. Once we're confident enough, we will enable this feature by default in beats and eventually, remove the flag.
Closes #40543