Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Events and Counters not initialized with cc-metric-collector #621

Open
Joehoch2 opened this issue Apr 23, 2024 · 5 comments
Open

[BUG] Events and Counters not initialized with cc-metric-collector #621

Joehoch2 opened this issue Apr 23, 2024 · 5 comments
Labels

Comments

@Joehoch2
Copy link

Joehoch2 commented Apr 23, 2024

Dear Likwid-Team,

I have built a new version of the cc-metric-collector and tried to update likwid to 5.3.0. When I start the cc-metric-collector with the -debug option I get the following message:

cc-metric-collector[27367]: ERROR - [/root/likwid-5.3.0/src/perfmon.c:perfmon_init:2109] No such file or directory.  
cc-metric-collector[27367]: Failed to initialize event and counter lists for Intel Xeon Broadwell EN/EP/EX processor  
cc-metric-collector[27367]: ERROR 2024/04/23 13:24:57 [LikwidCollector|/root/cc-metric-collector/collectors/likwidMetric.go:781] [failed to initialize library, error -22]

The error does not occur with version 5.2.2 of likwid. So I thought maybe this is the right contact point to get help.

To Reproduce

  • LIKWID command and/or API usage
    Unfortunately, I have not yet understood which commands or API requests are used to query the metrics for cc-metric-collector. When i run a command with likwid-perfctr it actually seems to work:
    likwid-perfctr -H -g CLOCK
    likwid-perfctr -e

  • LIKWID version and download source (Github, FTP, package manger, ...)
    5.3.0 Github als Tarball

  • Operating system
    CentOS 7.9

  • Architecture
    Broadwell EP

I hope this information is enough. If not please let me know

@Joehoch2 Joehoch2 added the bug label Apr 23, 2024
@TomTheBear
Copy link
Member

Thanks for the issue but it seems to be not in the proper location (repository). Since LIKWID seems to work (likwid-perfctr -e works), this problem is probably on the cc-metric-collector side.

The case is odd because this happens when multiple LIKWID versions are installed and the wrong one is picked at runtime but the output clearly states 5.3.0 (/root/likwid-5.3.0/src/perfmon.c:perfmon_init:2109). But for not supporting BroadwellEP, the LIKWID library has to be really ancient.

@Joehoch2
Copy link
Author

I dug deeper into this issue.
At first i remembered that when you build the cc-metric-collector it downloads a older likwid-version and copies its header files. So i changed that in the makefile to download the latest version, but with no success.
After that I compared src/perfmon.c of likwids version 5.3.0 and 5.2.2 and found the codeblock of the error:

    ret = perfmon_init_maps();
    if (ret < 0)    
     {               
         ERROR_PRINT(Failed to initialize event and counter lists for %s, cpuid_info.name);
         HPMfinalize();
         return ret;
     }

I changed the whole codeblock into perfmon_init_maps(); and it works like in version 5.2.2, but i think it is not the purpose, because the return code should be 0 if the function suceeded, I guess. I hope i get that right.

Nevertheless thank you for the response. Shall I copy that issue the other repository?

@TomTheBear
Copy link
Member

Might be fixed with ba8b2dc . It would be great if you could test it. I don't have the failing setup.

@TomTheBear
Copy link
Member

I saw a similar error lately inside of containerized environments. Are you running the cc-metric-collector inside a container?

@Joehoch2
Copy link
Author

I saw a similar error lately inside of containerized environments. Are you running the cc-metric-collector inside a container?

No, it's running outside of a container.

Might be fixed with ba8b2dc . It would be great if you could test it. I don't have the failing setup.

Unfortunately, I no longer have access to the hardware because I was changing job locations, but I will reach out to them and see if they can test it. Thanks for letting me know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants