You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug description
likwid-perfctr incorrectly reports some metrics by adding up core- or socket-local values. This happens, e.g., with:
clock frequency
CPI
runtime
operational intensity
These are "intensive" quantities, i.e., they do not scale with the size of the machine but need to be "averaged" (not literally, of course) in the proper way. In contrast, "extensive" quantities like energy consumption, memory data volume, etc, can be added across the machine to yield a useful number.
To Reproduce
LIKWID command and/or API usage
likwid-perfctr -g MEM_DP -C M0:0@M1:0 likwid-bench -t triad_avx -W N:2GB:2 on dual-socket Ice Lake 6326
Operational intensity is correct on each domain separately, but the reported value is twice as high
Same for clock, runtime, CPI (but on a HW thread basis, so the deviation is even stronger with more threads)
LIKWID version: 5.2.2
Operating system Ubuntu 22.04 LTS
Are you using the MarkerAPI (CPU code instrumentation) or the NvMarkerAPI (Nvidia GPU code instrumentation)?
yes, but that does not matter
Suggestion
Generalize the formuals by which metrics are calculated and make them configurable as to how different entities (threads, socketc, ...) are handled. For example,operational intensity could be calculated as sth like "sum(flops, all cores)/sum(traffic, all domains)". Clock could be "sum(cycles,all HW threads)/(timenoOfThreads)", CPI could be "sum(cycles,all HW threads)/(noOfThreadssum(instructions, all HW threads))" etc. This will reduce hard-coded stuff but will make config files more complex.
The text was updated successfully, but these errors were encountered:
Thanks for your suggestion. I thought about it but it will not be in the upcoming 5.3 version.
While the internal calculator would already support functions like SUM(X,Y,Z) or MIN(X,Y,Z), the integration of data from other threads can be problematic. Especially in the MarkerAPI where each thread updates its own values. One has to synchronize the threads after the counter readings to ensure valid metric values.
In order to reduce the changes to the internal calculator, one could use a two-step approach. When creating the internal group structure, we could expand the proposed syntax SUM(<countername>, <topological-info) to SUM(<countername>_<hw0>, <countername>_<hw1>, ...) with <hw*> being the responsible HW threads for the topological level. This way, we can still use the internal calculator for the final calculation. Of course, it still increases the work in each metric evaluation because we would need to fill the variables map (countername -> value) with the values of all HW threads. In case of modern systems with 100s of HW threads, this will cause quite some overhead.
Moreover, it does not change the way the statistics table is calculated and it is questionable whether it is still required at all. All threads would have the same CPI, Clock, etc. Calculating min, max, mean does not make sense for those or one has to magically transform SUM(cycles, all HW threads) to e.g. MIN(cycles, all HW threads) and re-calculate for the statistics table.
Bug description
likwid-perfctr incorrectly reports some metrics by adding up core- or socket-local values. This happens, e.g., with:
These are "intensive" quantities, i.e., they do not scale with the size of the machine but need to be "averaged" (not literally, of course) in the proper way. In contrast, "extensive" quantities like energy consumption, memory data volume, etc, can be added across the machine to yield a useful number.
To Reproduce
likwid-perfctr -g MEM_DP -C M0:0@M1:0 likwid-bench -t triad_avx -W N:2GB:2
on dual-socket Ice Lake 6326Suggestion
The text was updated successfully, but these errors were encountered: