Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add command line options to disable specific collectors #60

Open
ostertagconrad opened this issue Oct 21, 2022 · 14 comments
Open

Add command line options to disable specific collectors #60

ostertagconrad opened this issue Oct 21, 2022 · 14 comments

Comments

@ostertagconrad
Copy link

When we scrape some of our redfish server one scrape takes up to 40 seconds.

We tried to disabled some collectors which information/metrics we don't need and got down to like 10 to 20 seconds. To disable we just removed them in the source code and build the exporter.

It would be very helpful if we could disable specific collectors via command line option or by config file. At start it will be maybe enough to disable one or two of the three collectors (manager, chassis, system). In the next step a more fine grained configuration to disable for example memory and storage in the system collector would be nice but definitely more work to implement.

@stmcginnis
Copy link

Out of curiosity, do you happen to have the names of the resources you removed that were taking a long time? There is some work recently added to make collection fetching parallel in the gofish library. Right now it is limited to one type that a user saw a need to speed up, but if there are a set of resources that generally take a longer time to collect, maybe we can expand that pattern in the library to speed up some of these other cases.

@ostertagconrad
Copy link
Author

We now disabled the whole system collector and the scrape duration went down from 40 to 4 seconds. A colleague did some more testing and found out, that the PCI (functions and devices) did need the most time (18 and 13 seconds). Probably because our servers have like 60 PCI devices and 80 PCI functions. We don't know if the the exporter does one long API call to collect all the info or a lot of fast calls. But the many PCI devices and functions seem to be the main problem.

@tazend
Copy link

tazend commented Oct 24, 2022

Hi, I'm the colleague of @ostertagconrad

yeah especially the PCI stuff took a long time. As @ostertagconrad said because our Servers have so many of them and it seems like each PCI Device/Function has to be fetched with a seperate API-Call each.

Just for reference, this is the place in the gofish library, where for each PCIDevice a seperate call is made - I guess fetching this in parallel will definitely reduce execution time a lot.

@stmcginnis
Copy link

No worries if this isn't something you have time for, but would love it if you could try an updated gofish with this change to see if it makes anything better.

@tazend
Copy link

tazend commented Oct 26, 2022

Thanks @stmcginnis , this looks good. I will try it out with our servers some time this week.

@tazend
Copy link

tazend commented Oct 30, 2022

Hi @stmcginnis

I was now trying to test out your changes, but ran into some problems.
The redfish-exporter uses the functions PCIeDevices and PCIeFunctions from here which contains seperate implementations to get all the pcie devices / pcie functions of the systems, instead of using the one you updated in pciedevices.go

In that same place I then tried to replace the code to simply use the ListReferencesPCIeDevices function in the same manner the other functions do.

However the problem is, that computersystems.pcieDevices is a []string, and ListReferencedPCIeDevices expects just a string - so I don't know what exactly should be passed to it then and how to fix it.

I assume the computersystems.pcieDevices member could be updated to just be a string? (which is then simply just the link to the PCIeDevice ressource in the REST-API?)

@stmcginnis
Copy link

@tazend sorry for taking so long to get back to this. I've updated stmcginnis/gofish#210 to make all collection retrieval happen with some parallelism. Would you be able to try out these changes?

@tazend
Copy link

tazend commented Nov 27, 2022

Hi @stmcginnis, no worries - looks good! I will try to check it very soon.

@rfpronk
Copy link

rfpronk commented Dec 12, 2022

I would love this feature as well. We are getting all kinds of errors from our Dell and HPe servers from components of which we don't need the metrics anyway. So ideally the vendor fixes their firmware but pragmatically disabling it solves the problem for me.
I'm now using a custom build of the exporter and the gofish library that disables most components and that works fine.

I've also included the branch/PR that makes fetching work in parallel and that seems to work fine, but for me it just improves the speed instead of solving my problem.

@stmcginnis
Copy link

Hey @tazend, just wanted to check if you ever had a chance to try out the changes. I may go ahead and merge it and watch for any reported issues, but wanted to quick check back here first. Thanks!

@stmcginnis
Copy link

Or @rfpronk - you mention using a fork with these changes included. Can you confirm things are working as expected against your hardware?

@tazend
Copy link

tazend commented Feb 20, 2023

Hi @stmcginnis

sorry, I haven't tried out the changes yet - but I plan to do so soon. I'll let you know.

@rfpronk
Copy link

rfpronk commented Apr 28, 2023

Or @rfpronk - you mention using a fork with these changes included. Can you confirm things are working as expected against your hardware?

You mean stmcginnis/gofish#210?
If so yes, the custom build I mention above does include that PR and that works fine but only improves speed and doesn't solve the errors that I get with specific collectors (where this issue is about)

@Supermathie
Copy link

FWIW, I'm also interested in this since a scrape on our new HP ProLiant DL325 Gen11 takes minutes.

Here's my timings:

2024/11/07 19:55:25  info scraping target host      app=redfish_exporter target=serverilo
2024/11/07 19:55:27  info collector scrape started  Manager=1 app=redfish_exporter collector=ManagerCollector target=serverilo
2024/11/07 19:55:28  info collector scrape started  Chassis=DE041000 app=redfish_exporter collector=ChassisCollector target=serverilo
2024/11/07 19:55:28  info no thermal data found     Chassis=DE041000 app=redfish_exporter collector=ChassisCollector operation=chassis.Thermal() target=serverilo
2024/11/07 19:55:28  info no power data found       Chassis=DE041000 app=redfish_exporter collector=ChassisCollector operation=chassis.Power() target=serverilo
2024/11/07 19:55:28  info no network adapters data found Chassis=DE041000 app=redfish_exporter collector=ChassisCollector operation=chassis.NetworkAdapters() target=serverilo
2024/11/07 19:55:28  info no log services found     Chassis=DE041000 app=redfish_exporter collector=ChassisCollector operation=chassis.LogServices() target=serverilo
2024/11/07 19:55:28  info collector scrape completed Chassis=DE041000 app=redfish_exporter collector=ChassisCollector target=serverilo
2024/11/07 19:55:28  info collector scrape started  Chassis=1 app=redfish_exporter collector=ChassisCollector target=serverilo
2024/11/07 19:55:28  info collector scrape started  System=1 app=redfish_exporter collector=SystemCollector target=serverilo
2024/11/07 19:55:38  info no log services found     Chassis=1 app=redfish_exporter collector=ChassisCollector operation=chassis.LogServices() target=serverilo
2024/11/07 19:55:38  info collector scrape completed Chassis=1 app=redfish_exporter collector=ChassisCollector target=serverilo
2024/11/07 19:55:47  info no drive data found       System=1 app=redfish_exporter collector=SystemCollector operation=system.Drives() storage=DA000005 target=serverilo
2024/11/07 19:55:47  info no drive data found       System=1 app=redfish_exporter collector=SystemCollector operation=system.Drives() storage=DA000004 target=serverilo
2024/11/07 19:56:01  info no PCI-E device data found System=1 app=redfish_exporter collector=SystemCollector operation=system.PCIeDevices() target=serverilo
2024/11/07 19:56:09  info no simple storage data found System=1 app=redfish_exporter collector=SystemCollector operation=system.SimpleStorages() target=serverilo
2024/11/07 19:56:09  info no PCI-E device function data found System=1 app=redfish_exporter collector=SystemCollector operation=system.PCIeFunctions() target=serverilo
2024/11/07 19:59:00  info collector scrape completed Manager=1 app=redfish_exporter collector=ManagerCollector target=serverilo
2024/11/07 19:59:24  info collector scrape completed System=1 app=redfish_exporter collector=SystemCollector target=serverilo

I haven't yet tried anything, I'm literally just getting started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants