Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks feature requests #493

Open
3 tasks
Hrovatin opened this issue Feb 24, 2025 · 9 comments
Open
3 tasks

Benchmarks feature requests #493

Hrovatin opened this issue Feb 24, 2025 · 9 comments

Comments

@Hrovatin
Copy link
Collaborator

Hrovatin commented Feb 24, 2025

Created issue to keep track of things I would like to see for benchmarks. May add more topics in the future as needed.

  • The planned 12h limit will not suffice for benchmarks that would ben as comprehensive as in my local tests wert N MC iterations and N domains. Instead running each benchmark case in parallel (with 6/12h limit) may be nice.
  • Would it be possible to log during benchmark which benchmark class is currently running and how long it ran - as else when the benchmark terminates due to time limit it is hard to figure out why it did.
  • Since same datasets may be used across benchmarks (e.g. features on different branches) I would really wish to see easier re-use for lookups&search. I made quick&dirty implementation in my own code, but having this more general could be beneficial:
    E.g. general benchmark defining data domain for TL and this is then imported into a benchmark on new feature branch.

@AdrianSosic @Scienfitz @AVHopp @fabianliebig

@Scienfitz
Copy link
Collaborator

Scienfitz commented Feb 24, 2025

@AVHopp @fabianliebig can you chime in how we can increase the runtime or possibly achieve the parallelization requested above?

@Hrovatin
can you elaborate what you mean by the last point? Why is there a folder kernel_presets in the domains folder?

@Hrovatin
Copy link
Collaborator Author

kernel_presets is folder using botorch kernel presets for testing the botorch preset feature. As I understood we decided that I start running new feature benchmarks on branches instead of locally as I did before

@Scienfitz
Copy link
Collaborator

I dont think thats necessary. From what I understood: if we have all benchmarks implemented we will have two results from:

  • main: runs all benchmarks with current settings
  • another_branch: this branch jsut changes the default kernels in the code, it does not alter the benchmark code at all

Those two will be compared in the dashboard. No code adjustment for the benchmarks needed

@Hrovatin
Copy link
Collaborator Author

Some features add new arguments to code, so the benchmark must be changed. E.g. using botorch kernel factory was not added as default, but similar as one would use EDBO

@Scienfitz
Copy link
Collaborator

well this would result in a complicated way of being able to compare results, why would you prefer that instead of just changing the default + triggering the benchmark action on the feature branch? Then we check the result, and depending on that keep or do not keep the default. Also, even if the benchmark code changes, there is no reason to make copies and maintain the unchanged benchmarks in the same branch as they always have their comparison in the reference branch. I think it makes the third point somewhat obsolete.

@Hrovatin
Copy link
Collaborator Author

The issue for example arrises where we add many small changes, which would mean that for each we need to create a new branch and set it as default (e.g. StratifiedScaler that can be optionally used for botroch MultiTaskGP). Then branch management gets really hard, as we would in the above example need to create 2 branches with new MultiTaskGP feature, one with and one without StandardScaler. And then I would need to constantly make sure they are synchronised

@Scienfitz
Copy link
Collaborator

it is not intended to check for every small change. Once per PR / feature proposal is fine, eg once when the potential prior change is fully implemented

Im not entirely sure, but I think you can also compare them based on commits, so even if you wanted two snapshots from the same branch that should be no problem

@fabianliebig
Copy link
Collaborator

Hi @Hrovatin, many thanks for those ideas. Sorry for my late reply. I have to confess (even though we talked already) that I'm not sure if I understand the full load-bearing range of your requirements. My thought on your points are as follows:

  • Increasing runtime at least up to 24 H per job is possible and only require one additional line in jobs description. However, I can not say if more than 24 H is feasible since the GITHUB_TOKEN expires after that time period and I couldn't find clear documentation if that may impact our use case yet. If not, a runtime up to 35 day is theoretically possible.
  • Parallelization is certainly possible, from what I saw regarding the CPU utilization, we should be safe to run two benchmarks in one container. Beside that, we can also start as many container as we want since the workflow itself is completely independent, as long as the results are repeatable either by date, commit hash, name or branch. Otherwise, they override each other. I will have to look into details, but plan to come up with more concrete ideas in the upcoming week.
  • We can log the name of the benchmark right before it starts if that helps. The simulation will provide a progress bar, showing the number of performed iterations and the runtime afterwards. However, the benchmarks are executed in the order of the list, if you know how many are finished in time (by observing the progress bar of the simulation package for example) you can directly link that to the lists order.
  • You can also separate the results by each commit, might be hard to remember the hash tbh but it would be an alternative to branch management. Would it help to have some kind of a command line which can be used to separate things more clearly? YFYI: You can also change the function description (Docstring) as this will be stored and displayed in the dashboard if you need to describe a small code change for your observation.

Sorry for the long command. Please let me know if I miss something based on you requirements. We may also talk about your workflow at some point, as I have the impression that more local functionalities for the benchmarking module would also help :)

@fabianliebig
Copy link
Collaborator

I was curious and wanted to test what happens if a jobs exceed 24H. Well, the container just kept running. So I would guess it will work as long as the GITHUB_TOKEN is not used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants