-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration for the StackHPC fork of Redfish Exporter #1530
base: stackhpc/2024.1
Are you sure you want to change the base?
Conversation
79f8936
to
690623c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, many thanks for adding this.
env: "{{ kayobe_environment | default('openstack') }}" | ||
group: "{{ hostvars[host]['redfish_exporter_scrape_group'] | default('overcloud') }}" | ||
{% endfor %} | ||
- job_name: redfish-exporter-collectlog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wondered if we should put this behind a redfish_exporter_collect_logs
flag so we can easily disable it at sites if it causes issues. Having said that, it should be a lot more robust now it lives in a separate scrape job. Many thanks for adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its more nuanced than that, and I couldn't quite get my brain around it when I made this PR, but I think its a bit clearer to me now...
There's two cases of scrape style (currently anyway!):
- [iDRAC style] Scrape normally in a single job with collectlog not present in the job, just use the defaults - this is what we've always done and should be the default IMO
- [Lenovo XCC style] Two jobs, one with collectlog=true and the other more frequent with collectlog=false
I think we should put the second style behind a feature flag as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good. A limp mode flag (2) for cases when the logs are taking too long to fetch, and 1 as the default. On SMSlab (iDRAC) I noticed that most of the logs fetched are actually logs from logging into and out of the BMC - so I am hoping that once we switch to using persistent sessions, the scrape time will improve. I see about 5 minutes for an iDRAC there, which easily causes trouble.
For me a bunch of stuff doesn't work with dell. I will try and fix up a few bits. We also have lost the health summary. Did that not work on lenovo? That was one of more useful bits for me. |
This is just up as a record of things that worked on Lenovo, I don't have the systems to be able to coalesce the dashboards to work on both types of hardware unfortunately :(. Added to metrics and panel names not really matching up, I didn't make any real attempt to remain compatible with the Dell metrics. I also had to remove some bits of the dashboard because of the Angular deprecation, though I don't remember if the health summary was one of those. |
Dashboard needs testing for compatibility with metrics produced by other manufacturer's Redfish implementations.