-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients) #727
Comments
Wow @sgbaird, this is one exemplary Github issue! : ) I think your idea about representing parameters as [0.0, 1.0] ranges is exactly right for this case. There are two challenges though: 1. Constraints are complicated to represent here.I think the types of constraints you are looking for are not currently supported by Ax, at least not out-of-the box. Just to make sure, my understanding is that the three constraints you'd like to have are:
Constraint 3) here is easy: it will in fact be two constraints: Constraints 1) and 2) are challenging though. For constraint 2), which is exact equality constraint, we have this wishlist issue open currently: #510. 2. The real use-case involving around 100 components and 4 classes of them.This is too many parameters and constraints for standard Ax models, so this will also not work out-of-the-box. Let me discuss this with @dme65, @Balandat, and @eytan, and get back to you with whether we can find a way to represent your problem in Ax. Before we delve into this further, high-level question: how expensive will each predict function call be in your case? How many total trials could you run and how important running as few as possible vs. finding optimal outcome? I think depending on that, we might want to break up the search space into sub-search spaces with fewer components (as total of 100 is a lot). |
Hi @lena-kashtelyan, I'm glad you like the issue 😄 I really appreciate your thorough and timely response. This is great! I have some more thoughts on the constraints, I'm curious if the "components/composition" version above seems problematic (esp. with respect to symmetry concerns), and last, I tried to answer the higher-level questions you asked. ConstraintsConstraint 3 doesn't seem bad at all. Thank you! I think I have a workaround for 2, and I have a few more thoughts for 1. Prevalence sums to 1Perhaps a workaround for constraint 2 would be the following two constraints: "filler_A + filler_B + resin_A + resin_B + resin_C <= 1.000001"
"filler_A + filler_B + resin_A + resin_B + resin_C >= 0.999999" as a substitute for EDIT:
Limit the total number of components in any given formulaFor constraint 1, maybe the L2 norm constraint could be used as a proxy for "complexity" of the suggested formula. For example: Components/Composition versionDo you see any potential issues with using the "components/composition" version (i.e. sort of a hard-coded upper limit of 3 components)? Symmetry might be an issue (
Is the main issue with this kind of symmetry just that it takes longer to search the space? Or is the worry that the search space might become unsmooth? Additional Context@ramseyissa may wish to comment more about the priorities of the project, but here are some thoughts to the best of my understanding:
|
Thanks @sgbaird for raising the issue and thank you @lena-kashtelyan for the quick response and feedback! Hopefully I can clear up some of the project goals / issues that I am facing. Project Overview
Useful Information
Thank you both again and I am open to any suggestions or references. |
So if the There are also quite a few challenges regarding the setup (in terms of the dimensionality of the inputs, limiting the number of components, etc.) that would not be straightforward to resolve. But even if we could resolve these, you might actually end up getting better results by using an optimization algorithm that can just evaluate tons of different configurations. Maybe some kind of evolutionary algorithm could work well here? |
Brief drive-by comment regarding the constraints in this problem: @sgbaird correct me if i'm wrong, but you really only have 4 free parameters in this case, since the requirement that they sum to 1 removes a degree of freedom. For this reason, it might be useful for you to set a constraint:
and then for each parameterization Of course this may end up being moot per @Balandat's comment above, but if you were to end up using Ax this might be a better way to compose your search space. Does that make sense? |
@Balandat that's a good point. A brute force method or an EA might be more suitable. Do you have any suggestions on implementing the acquisition function (i.e. expected improvement) with a custom model (i.e. @bernardbeckerman that's exactly right. I hadn't thought of that. If we try something out in Ax, I think we'll use that parameterization. Thanks for the suggestions! |
I am not sure what exactly you're getting at here. If you use a brute force or EA, why would you need to implement an acquisition function? |
In the adaptive design scheme (even if this involves a brute force or EA approach), we are still looking to find a happy medium between exploitation and exploration. With brute force or EA, it would be straightforward to try to maximize the objective function, strength for example, but it's less clear to me how we would incorporate the "exploration" component which is why I mentioned an acquisition function. Perhaps a better phrasing of the question is: with brute force or EA, how would you suggest incorporating a trade-off between exploration and exploitation similar to Bayesian optimization? My questions here could very well stem from a misunderstanding on my part about the theory or not explaining the problem clearly 😅. It might also be worth mentioning that while we can run the |
I see, maybe we misunderstood the problem to some extent. It seems to me that the The way Ax is typically used is by building its own (typically Gaussian Process) surrogate model internally. It's generaly possible to swap out that model (though that will be some work). My main question is whether your |
@Balandat thank you for your response. That's correct, if I'm using the right terminology,
Our use case is very much a "suggest-next-experiment" adaptive design scheme for a limited/costly laboratory experimental dataset. For the current implementation (RandomForestRegressor), from what I can tell, nothing built-in or super straightforward for uncertainty quantification (UQ). But I am looking at modifying CrabNet (via my refactor) in the near future to work with arbitrary "ingredients lists" rather than only chemical formulas. We're open to using Ax's built-in GPR model with the existing small dataset (#743) in the short term while I'm implementing the Would love to hear your updated thoughts on how to approach this. |
Also, we did get a working implementation, but it probably needs to be heavily modified in light of recent discussion. In particular, since our use-case falls into a human-in-the-loop category (i.e. manual laboratory experiments), we probably need the Developer API. I'm kicking myself for not reading https://ax.dev/docs/api.html and looking at the simple case comparisons, and also for dismissing the human-in-the-loop tutorial as being irrelevant to me (when I think of human-in-the-loop, I think of a human participating in the surrogate function evaluation, and the word "Developer" deterred me). While the following "works" (runs to completion and spits out The training data is generated using a slightly modified version of import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
# training data
X_train = np.array([
[0.4, 0.4, 0. , 0. , 0.2],
[0.5, 0. , 0. , 0.5, 0. ],
[0.5, 0.3, 0. , 0.2, 0. ],
[0.5, 0. , 0. , 0.5, 0. ],
[0. , 0.6, 0.4, 0. , 0. ],
[0.6, 0. , 0.4, 0. , 0. ],
[0. , 0.6, 0.2, 0.2, 0. ]])
X_train = pd.DataFrame(X_train, columns=["filler_A", "filler_B", "resin_A", "resin_B", "resin_C"])
y_train = 100 * np.random.rand(len(components))
unique_components = list(X_train.columns)
# surrogate model
RFR = RandomForestRegressor(max_depth=7, random_state=0)
RFR.fit(X_train.values, y_train)
def strength(x):
x = np.reshape(x, (-1, x.shape[0]))
y = RFR.predict(x)[0]
return y
# Ax-specific code
def strength_evaluation_function(parameterization):
x = np.array(
[parameterization.get(component) for component in unique_components[:-1]]
)
last_component = 1 - sum(x)
x = np.append(x, last_component)
return {"strength": strength(x)}
parameters = [
{"name": component, "type": "range", "bounds": [0.0, 1.0]}
for component in unique_components[:-1]
]
separator = " + "
composition_constraint = separator.join(unique_components[:-1]) + " <= 1"
gs = GenerationStrategy(
steps=[
# 1. Initialization step (does not require pre-existing data and is well-suited for
# initial sampling of the search space)
GenerationStep(
model=Models.SOBOL,
num_trials=5, # How many trials should be produced from this generation step
min_trials_observed=3, # How many trials need to be completed to move to next model
max_parallelism=5, # Max parallelism for this step
model_kwargs={
"seed": 999,
"fallback_to_sample_polytope": True,
}, # Any kwargs you want passed into the model
model_gen_kwargs={}, # Any kwargs you want passed to `modelbridge.gen`
),
# 2. Bayesian optimization step (requires data obtained from previous phase and learns
# from all data available at the time of each new candidate generation call)
GenerationStep(
model=Models.GPEI,
num_trials=-1, # No limitation on how many trials should be produced from this step
max_parallelism=3, # Parallelism limit for this step, often lower than for Sobol
# More on parallelism vs. required samples in BayesOpt:
# https://ax.dev/docs/bayesopt.html#tradeoff-between-parallelism-and-total-number-of-trials
),
]
)
best_parameters, values, experiment, model = optimize(
parameters=parameters,
experiment_name="composition_test",
objective_name="strength",
evaluation_function=strength_evaluation_function,
parameter_constraints=[
composition_constraint, # compositions sum to 1
],
total_trials=30,
generation_strategy=gs,
)
# Add the last degree of freedom back in
x = np.array([best_parameters.get(component) for component in unique_components[:-1]])
last_component = 1 - sum(x)
x = np.append(x, last_component)
next_experiment = best_parameters
next_experiment[unique_components[-1]] = last_component |
I adapted what I had from a Loop into a Service API and fixed some of the theory/understanding issues on my part in the linked example: #743 (comment) such that I'm generating a real suggested I'm still struggling with the I'm also still confused about how I would replace the GPR surrogate model with my own (which has a built-in uncertainty output). |
For using my own surrogate model, should I be looking at https://ax.dev/tutorials/modular_botax.html? EDIT: and if so, can this be passed into a |
I think that's likely, but will let @Balandat chime in on that too. |
I think we're getting close. For the "limit the total number of components" constraint (outstanding), we can at least temporarily prune off one idea. That is, to use |
I dug a bit deeper into how to swap out a built-in |
Here is a roadmap of some outstanding items as well as other features/topics that came up along the way. I'll plan on updating these later if the status changes (along with an "EDIT" keyword). If I may, I'm also cc'ing the people who have been participating or tagged in these discussions to bring attention to the "birds-eye" issue. Thank you to everyone who has helped me out so far. Your responses have been incredibly useful and informative. Personally, it's been very rewarding for me to get a better understanding of BO through the lens of a practical use case while using what I consider to be an excellent platform for it. @lena-kashtelyan @Balandat @bernardbeckerman @eytan @saitcakmak @Ryan-Rhys @qingfeng10 @dme65 (@ramseyissa who is lead on the project)
|
@sgbaird, thank you for this summary, it's very helpful! |
Great meeting with @lena-kashtelyan, @Balandat, and @dme65. No need to respond until after the break. I thought it might be good to try a few visualizations of the one-hot encoding scheme. There are four compositions (
|
def best_feasible_objective( # pragma: no cover | |
optimization_config: OptimizationConfig, values: Dict[str, np.ndarray] | |
) -> np.ndarray: | |
"""Compute the best feasible objective value found by each iteration. | |
Args: | |
optimization_config: Optimization config. | |
values: Dictionary from metric name to array of value at each | |
iteration. If optimization config contains outcome constraints, values | |
for them must be present in `values`. | |
Returns: Array of cumulative best feasible value. | |
""" | |
# Get objective at each iteration | |
objective = optimization_config.objective | |
f = values[objective.metric.name] | |
# Set infeasible points to have infinitely bad values | |
infeas_val = np.Inf if objective.minimize else -np.Inf | |
for oc in optimization_config.outcome_constraints: | |
if oc.relative: | |
raise ValueError( # pragma: no cover | |
"Benchmark aggregation does not support relative constraints" | |
) | |
g = values[oc.metric.name] | |
feas = g <= oc.bound if oc.op == ComparisonOp.LEQ else g >= oc.bound | |
f[~feas] = infeas_val | |
# Get cumulative best | |
minimize = objective.minimize | |
accumulate = np.minimum.accumulate if minimize else np.maximum.accumulate | |
return accumulate(f) |
I realized that there is an issue with using Sobol sampling and the one-fewer-degrees-of-freedom style compositional constraint, and wanted to make a record of it in this issue. Take the case of: x1+x2+x3 == 1 reparameterized to: x1+x2 <= 1 where
In the context of #769, this isn't an issue if I pass an equality constraint (i.e. the original constraint) directly to |
Actually, passing a linear equality constraint may be the only way to prevent bias in the Sobol sampling relative to the "true" search space, because the Sobol sampling still needs to respect the constraints (i.e. it has to see some constraint). If I pass a linear inequality constraint rather than a linear equality constraint, the bias will occur. |
@jduerholt recently suggested the use of optimize_acqf_mixed within BoTorch based on the combinations from an NchooseK style constraint (i.e. only up to a certain number of elements allowed to be active for a given experiment). In my case, 20 choose 6 = 38760 combinations might be unwieldy without HPC resources; however, this suggestion seems like the most straightforward and rigorous option for cases where this is computationally tractable. This seems like a great option to compare against the nonlinear nchoosek constraint approximation #769. Specifically in https://github.com/sparks-baird/optimization-benchmark (sparks-baird/optimization-benchmark#7) |
For starters, my experience with Ax is running the Loop tutorial once and reading through some of the documentation such as the parameter types (i.e. fairly new). Also, I have some familiarity with Bayesian optimization.
The actual use-case is slightly different and more complicated, but I think the following is a suitable toy example. I go over the problem statement, some setup code, and possible solutions. Would love to hear some feedback.
Problem Statement
Take a composite material with the following
class: ingredient
combinations:filler_A
)filler_B
)resin_A
)resin_B
)resin_C
)Take some toy data of components and their fractional prevalences (various combinations of fillers and resins, and various numbers of components) along with their objective (training data), and some model which takes arbitrary input parameters and predicts the objective (strength) which we wish to maximize.
For constraints, I'm thinking:
abs(1-sum(composition)) <= tol
)Setup Code
To make it more concrete, it might look like the following:
Possible Solutions
One-hot-like prevalence encoding and components/composition
One-hot-like prevalence encoding
I've thought about trying to do a sort of "one-hot encoding" (assuming I'm using this term correctly), such that each component gets its own composition as a variable:
which I think would look like the following:
However, this could easily lead to compositions where all of the components have a finite prevalence and can be problematic from an experimental perspective.
components/composition
As I mentioned in the constraints, I've also thought about setting an upper limit to the number of components in a formula, which I think might look something like the following:
How would you suggest implementing this use-case in Ax? If it would help, I'd be happy to flesh this out into a full MWE or try out any suggestions. The real use-case involves ~100 different components across 4 different classes, and the idea is to (eventually) use this in an experimental adaptive design scheme.
(tag @ramz-i who is the individual in charge of this project in our research group, post here if you have anything to add)
#706
The text was updated successfully, but these errors were encountered: