Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measurement addition order #487

Closed
brandon-holt opened this issue Feb 14, 2025 · 5 comments
Closed

Measurement addition order #487

brandon-holt opened this issue Feb 14, 2025 · 5 comments
Labels
question Further information is requested

Comments

@brandon-holt
Copy link
Contributor

Is there a difference in the quality of recommendations that would come from

A) a naive model that is trained on dataset A, which then recommends datapoint X, which is tested, added to the model, and then making final recommendations

Vs

B) a naive model that is trained directly on dataset A + measured datapoint X, and then making final recommendations

Or should these be mathematically equivalent

Assume surrogate model is a gaussian process and acquisition function is expected improvement

@Scienfitz Scienfitz added the question Further information is requested label Feb 14, 2025
@Scienfitz
Copy link
Collaborator

hey @brandon-holt

In General
I think they should be mathematically equivalent, the situation is simply: When trained for providing the final recommendations, the surrogate got both A and X, in both cases. Hence, it produces the same acquisition functions etc. So both variants are equivalent.

Exotic Cases and Statefulness
There are other things that can change this however, in particular all things that make the recommendation or recommender stateful. Consider this exotic idea:
For some hypothetical reason the model believes a point in X should be measured again even though ingested into the model before. Most of the time this will not happen, but in theory it is possible if X contains an extremely promising point. If now, for some reason like noisy measurements, the model sees need to measure that point again, it will recommend a point inide X again in the final measurement.

Here the two variants could differ if you disallowed the recommendation of previously recommended points via allow_recommending_already_recommended. The first variant has X already recommended i the first round, any points in X will not be considered again for the final recommendation. While the second variant, you also added X, but it was not recommended before, hence the final recommendation might recommend a point in X again.

Computational Aspects
Moreover, there is also the computational aspect, just because the situation is mathematically equivalent, in practice the results can still differ if small numerical differences have an influence (eg when acqf values are all very close I could image the order changes)

@brandon-holt
Copy link
Contributor Author

brandon-holt commented Feb 14, 2025

Hey @Scienfitz! I see! But when searching on a more general framing of this topic:

Lets assume I have a dataset of 40 separate measurements in the searchspace. Should i add them all at once, or add them one by one, updating the acquisition function and requesting recommendations after each point is added?

My reading suggests Adding one by one allows the acquisition function to adapt dynamically after each new observation, potentially leading to a more efficient exploration-exploitation tradeoff.

• If the 40 measurements are diverse and well-distributed across the search space, adding them all at once is reasonable.
• If they are clustered or biased, sequential updating allows the model to refine its predictions dynamically.

Is this accurate?

Edit: after further reading i think im misinterpreting some things, the acquisition function should be the same at the end regardless of the order the points are added (or all at once). Just want to confirm this is true!

@Scienfitz
Copy link
Collaborator

yes I think in the situation you described in the first post, the acqf should be identical, hence identical outcome (unless some stateful aspects are present). When you already have 40 measurements, it does not make a difference if you add them all at once or in one big chunk.

But be careful, this is similar to the related question of batching Does it make a difference if I measure 40 points at once and update the campaign then OR should I do 40 times 1 point and update the model after each measurement. The latter is generally the way to go and will have a different trajectory than the former. The important difference to your original situation is that you follow the 40 single recommendations and not have the 40 points pre-selected at the beginning.

@brandon-holt
Copy link
Contributor Author

Right in My case its the same pre selected 40 points so the endpoint should be the same whatever order i add them in.

Thanks!

@AdrianSosic
Copy link
Collaborator

Exactly, no difference if the measurements have already been collected. But as @Scienfitz correctly pointed out, it absolutely makes a difference if you inform the campaign about new measurements WHILE you are still in the process of collecting the data. Here an excerpt from the campaign user guide, in case you haven't seen it yet:

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants