Measurement addition order #487

brandon-holt · 2025-02-14T12:18:46Z

Is there a difference in the quality of recommendations that would come from

A) a naive model that is trained on dataset A, which then recommends datapoint X, which is tested, added to the model, and then making final recommendations

Vs

B) a naive model that is trained directly on dataset A + measured datapoint X, and then making final recommendations

Or should these be mathematically equivalent

Assume surrogate model is a gaussian process and acquisition function is expected improvement

Scienfitz · 2025-02-14T12:53:47Z

hey @brandon-holt

In General
I think they should be mathematically equivalent, the situation is simply: When trained for providing the final recommendations, the surrogate got both A and X, in both cases. Hence, it produces the same acquisition functions etc. So both variants are equivalent.

Exotic Cases and Statefulness
There are other things that can change this however, in particular all things that make the recommendation or recommender stateful. Consider this exotic idea:
For some hypothetical reason the model believes a point in X should be measured again even though ingested into the model before. Most of the time this will not happen, but in theory it is possible if X contains an extremely promising point. If now, for some reason like noisy measurements, the model sees need to measure that point again, it will recommend a point inide X again in the final measurement.

Here the two variants could differ if you disallowed the recommendation of previously recommended points via allow_recommending_already_recommended. The first variant has X already recommended i the first round, any points in X will not be considered again for the final recommendation. While the second variant, you also added X, but it was not recommended before, hence the final recommendation might recommend a point in X again.

Computational Aspects
Moreover, there is also the computational aspect, just because the situation is mathematically equivalent, in practice the results can still differ if small numerical differences have an influence (eg when acqf values are all very close I could image the order changes)

brandon-holt · 2025-02-14T12:59:51Z

Hey @Scienfitz! I see! But when searching on a more general framing of this topic:

Lets assume I have a dataset of 40 separate measurements in the searchspace. Should i add them all at once, or add them one by one, updating the acquisition function and requesting recommendations after each point is added?

My reading suggests Adding one by one allows the acquisition function to adapt dynamically after each new observation, potentially leading to a more efficient exploration-exploitation tradeoff.

• If the 40 measurements are diverse and well-distributed across the search space, adding them all at once is reasonable.
• If they are clustered or biased, sequential updating allows the model to refine its predictions dynamically.

Is this accurate?

Edit: after further reading i think im misinterpreting some things, the acquisition function should be the same at the end regardless of the order the points are added (or all at once). Just want to confirm this is true!

Scienfitz · 2025-02-14T13:19:23Z

yes I think in the situation you described in the first post, the acqf should be identical, hence identical outcome (unless some stateful aspects are present). When you already have 40 measurements, it does not make a difference if you add them all at once or in one big chunk.

But be careful, this is similar to the related question of batching Does it make a difference if I measure 40 points at once and update the campaign then OR should I do 40 times 1 point and update the model after each measurement. The latter is generally the way to go and will have a different trajectory than the former. The important difference to your original situation is that you follow the 40 single recommendations and not have the 40 points pre-selected at the beginning.

brandon-holt · 2025-02-14T13:27:49Z

Right in My case its the same pre selected 40 points so the endpoint should be the same whatever order i add them in.

Thanks!

AdrianSosic · 2025-02-14T14:50:58Z

Exactly, no difference if the measurements have already been collected. But as @Scienfitz correctly pointed out, it absolutely makes a difference if you inform the campaign about new measurements WHILE you are still in the process of collecting the data. Here an excerpt from the campaign user guide, in case you haven't seen it yet:

Scienfitz added the question Further information is requested label Feb 14, 2025

Scienfitz closed this as completed Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measurement addition order #487

Measurement addition order #487

brandon-holt commented Feb 14, 2025

Scienfitz commented Feb 14, 2025

brandon-holt commented Feb 14, 2025 •

edited

Loading

Scienfitz commented Feb 14, 2025

brandon-holt commented Feb 14, 2025

AdrianSosic commented Feb 14, 2025

Measurement addition order #487

Measurement addition order #487

Comments

brandon-holt commented Feb 14, 2025

Scienfitz commented Feb 14, 2025

brandon-holt commented Feb 14, 2025 • edited Loading

Scienfitz commented Feb 14, 2025

brandon-holt commented Feb 14, 2025

AdrianSosic commented Feb 14, 2025

brandon-holt commented Feb 14, 2025 •

edited

Loading