How should we report old benchmark runs? #272
Replies: 1 comment 3 replies
-
Typing out loud ... For the current/latest view of performance across the compilers and recommended configurations, I think we just always show the latest, but can retain historical results in the flat files for transparency/if someone really wants to look. For the over time view, you highlight a few types of changes:
In all cases, any updates to the data would still be in git history for transparency. Any time we ovewrite data, we'd have a PR that would explain why. Extra rambling thoughtsNow as for implementing such workflows, no recommendation quite yet. But relative to #210, that does make me lean to having the benchmarks a separate versioned repo (but we could still embed images from it in the UCC readme). I'll think more on that, but mixing code development of |
Beta Was this translation helpful? Give feedback.
-
With the recent change #266 , we would essentially replace our old Pytket benchmark implementation (manual, lightweight set of minimal passes) with a more robust and heavyweight optimization pass. Per the discussion in that PR, IMO we should just report the PyTKET data which uses
FullPeepHoleOptimize
and/orKAKOptimize
and remove the older PyTKTET data from our plots.Presumably, as we go along however, we will continue to encounter situations where we may want to sunset or replace aspects of our benchmarking suite (e.g. the potential of switching from parallelized benchmarks to single threaded, as discussed in #251, or if we discover that a previously reported datapoint was erroneous as was discussed in the same issue).
How do we want to record these changes on our plots? Ideally we don't want our graph legend to be full of defunct old compilers, but I don't think we want to junk all data from before and specific infrastructure change was made (which is, incidentally what we do now by not plotting the data run before we set up the github actions benchmarking automation). To maintain transparency and balance these decisions going forward, what do we think is the best approach here? @Misty-W @bachase @natestemen
Also relevant to mention @bachase 's broader discussion on refactoring the benchmarking suite #235.
Beta Was this translation helpful? Give feedback.
All reactions