Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for implementing a composition-based optimization (i.e. fractional portion of ingredients) #727

Closed
sgbaird opened this issue Nov 19, 2021 · 24 comments
Assignees
Labels
question Further information is requested

Comments

@sgbaird
Copy link
Contributor

sgbaird commented Nov 19, 2021

For starters, my experience with Ax is running the Loop tutorial once and reading through some of the documentation such as the parameter types (i.e. fairly new). Also, I have some familiarity with Bayesian optimization.

The actual use-case is slightly different and more complicated, but I think the following is a suitable toy example. I go over the problem statement, some setup code, and possible solutions. Would love to hear some feedback.

Problem Statement

Take a composite material with the following class: ingredient combinations:

  • Filler: Colloidal Silica (filler_A)
  • Filler: Milled Glass Fiber (filler_B)
  • Resin: Polyurethane (resin_A)
  • Resin: Silicone (resin_B)
  • Resin: Epoxy (resin_C)

Take some toy data of components and their fractional prevalences (various combinations of fillers and resins, and various numbers of components) along with their objective (training data), and some model which takes arbitrary input parameters and predicts the objective (strength) which we wish to maximize.

For constraints, I'm thinking:

  • limit the total number of components in any given "formula" (e.g. max of 3 components)
  • naturally, that the compositions sum to 1 (or that abs(1-sum(composition)) <= tol)
  • there has to be at least one filler and at least one resin (if feasible)

Setup Code

To make it more concrete, it might look like the following:

choices = ["filler_A", "filler_B", "resin_A", "resin_B", "resin_C", "dummy"]

data = [
        [["filler_A", "filler_B", "resin_C"], [0.4, 0.4, 0.2]],
        [["filler_A", "resin_A", "resin_B"], [0.6, 0.2, 0.2]],
        [["filler_A", "filler_B", "resin_B"], [0.5, 0.3, 0.2]],
        [["filler_A", "resin_B", "dummy"], [0.5, 0.5, 0.0]],
        [["filler_B", "resin_C", "dummy"], [0.6, 0.4, 0.0]],
        [["filler_A", "filler_B", "resin_A"], [0.2, 0.2, 0.6]],
        [["filler_B", "resin_A", "resin_B"], [0.6, 0.2, 0.2]],
        ] # made-up data

def predict(objects, composition):
    ...
    return obj

Possible Solutions

One-hot-like prevalence encoding and components/composition

One-hot-like prevalence encoding

I've thought about trying to do a sort of "one-hot encoding" (assuming I'm using this term correctly), such that each component gets its own composition as a variable:

filler_A filler_B resin_A resin_B resin_C
0.4 0.4 -- -- 0.2
0.6 0.0 0.2 0.2 --
0.5 0.3 -- 0.2 --
0.5 -- -- 0.5 --
-- 0.6 -- -- 0.4
0.2 0.2 0.6 -- --
-- 0.6 0.2 0.2 --

which I think would look like the following:

best_parameters, values, experiment, model = optimize(
    parameters=[
        {
            "name": "filler_A",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "filler_B",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_A",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_B",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "resin_C",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],
    experiment_name="composition_test",
    objective_name="strength",
    evaluation_function=predict,
    parameter_constraints=["abs(1 - (filler_A + filler_B + resin_A + resin_B + resin_C)) <= 1e-6", "filler_A + filler_B > 0", "resin_A + resin_B + resin_C > 0"], # not sure if I can use `abs` here
    total_trials=30,
)

However, this could easily lead to compositions where all of the components have a finite prevalence and can be problematic from an experimental perspective.

components/composition

As I mentioned in the constraints, I've also thought about setting an upper limit to the number of components in a formula, which I think might look something like the following:

best_parameters, values, experiment, model = optimize(
    parameters=[
        {
            "name": "object1",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "object2",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "object3",
            "type": "choice",
            "bounds": choices,
        },
        {
            "name": "composition1",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "composition2",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
        {
            "name": "composition3",
            "type": "range",
            "bounds": [0.0, 1.0],
        },
    ],
    experiment_name="composition_test",
    objective_name="strength",
    evaluation_function=predict,
    parameter_constraints=["abs(1 - (composition1 + composition2 + composition3)) <= 1e-6"],
    total_trials=30,
)

How would you suggest implementing this use-case in Ax? If it would help, I'd be happy to flesh this out into a full MWE or try out any suggestions. The real use-case involves ~100 different components across 4 different classes, and the idea is to (eventually) use this in an experimental adaptive design scheme.

(tag @ramz-i who is the individual in charge of this project in our research group, post here if you have anything to add)

#706

@lena-kashtelyan lena-kashtelyan self-assigned this Nov 19, 2021
@lena-kashtelyan lena-kashtelyan added the question Further information is requested label Nov 19, 2021
@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Nov 19, 2021

Wow @sgbaird, this is one exemplary Github issue! : )

I think your idea about representing parameters as [0.0, 1.0] ranges is exactly right for this case. There are two challenges though:

1. Constraints are complicated to represent here.

I think the types of constraints you are looking for are not currently supported by Ax, at least not out-of-the box.

Just to make sure, my understanding is that the three constraints you'd like to have are:

  1. Limit the total number of components in any given "formula" (e.g. max of 3 components),
  2. The prevalences in compositions sum to 1,
  3. There has to be at least one filler and at least one resin (if feasible).

Constraint 3) here is easy: it will in fact be two constraints: "filler_A + filler_B + ... >= 1e-6" and "resin_A + resin_B + ... >= 1e-6".

Constraints 1) and 2) are challenging though. For constraint 2), which is exact equality constraint, we have this wishlist issue open currently: #510.

2. The real use-case involving around 100 components and 4 classes of them.

This is too many parameters and constraints for standard Ax models, so this will also not work out-of-the-box.

Let me discuss this with @dme65, @Balandat, and @eytan, and get back to you with whether we can find a way to represent your problem in Ax.

Before we delve into this further, high-level question: how expensive will each predict function call be in your case? How many total trials could you run and how important running as few as possible vs. finding optimal outcome? I think depending on that, we might want to break up the search space into sub-search spaces with fewer components (as total of 100 is a lot).

@sgbaird
Copy link
Contributor Author

sgbaird commented Nov 19, 2021

Hi @lena-kashtelyan, I'm glad you like the issue 😄 I really appreciate your thorough and timely response. This is great!

I have some more thoughts on the constraints, I'm curious if the "components/composition" version above seems problematic (esp. with respect to symmetry concerns), and last, I tried to answer the higher-level questions you asked.

Constraints

Constraint 3 doesn't seem bad at all. Thank you! I think I have a workaround for 2, and I have a few more thoughts for 1.

Prevalence sums to 1

Perhaps a workaround for constraint 2 would be the following two constraints:

"filler_A + filler_B + resin_A + resin_B + resin_C <= 1.000001"
"filler_A + filler_B + resin_A + resin_B + resin_C >= 0.999999"

as a substitute for "abs(1-sum(...)) <= 1e-6"
Does this seem like it would work OK? I'm worried it might be a bit hacky and have some unintended consequences.

EDIT:
After reading #510 (comment), I see that the tighter the tolerance in the "hack" above, the tougher it is on the rejection sampling and would either result in an error or (likely) bloat the runtime.

raise SearchSpaceExhausted(
ax.exceptions.core.SearchSpaceExhausted: Rejection sampling error (specified maximum draws (10000) exhausted, without finding sufficiently many (1) candidates). This likely means that there are no new points left in the search space.

Limit the total number of components in any given formula

For constraint 1, maybe the L2 norm constraint could be used as a proxy for "complexity" of the suggested formula. For example:
norm([0.2 0.2 0.2 0.2 0.2]) --> 0.45 (higher complexity, lower score)
norm([0.4 0.4 0.2 0.0 0.0]) --> 0.60 (lower complexity, higher score)
If I'm thinking about this correctly, incorporating this into the objective function (predict) could bias the model towards less complex systems, but this probably pushes the problem into a multi-objective optimization (MOO) scheme and doesn't actually enforce an upper bound on the number of components. Rather than fiddle with the L2-norm constraint, at that point, it might make more sense to just use MOO and have the second objective be the number of components that exhibit a prevalence above some tolerance, e.g. count_components([0.4 0.4 0.2 1e-8 1e-7]) --> 3

Components/Composition version

Do you see any potential issues with using the "components/composition" version (i.e. sort of a hard-coded upper limit of 3 components)? Symmetry might be an issue (A B C --> 0.5 0.3 0.2 is degenerate relative to A C B --> 0.5 0.2 0.3), so I wondered if using an ordering constraint such as composition1 > composition2 > composition3 would remove most of the symmetry, but then there's still the issue of (A B C --> 0.41 0.39 0.2) being distant in parameter space to (B A C --> 0.41 0.39 0.2) despite actually being quite close. This seems very relevant to #710

e.g. when parameters (1,2) actually means the same as (2,1), we want to disallow one of them.

Is the main issue with this kind of symmetry just that it takes longer to search the space? Or is the worry that the search space might become unsmooth?

Additional Context

@ramseyissa may wish to comment more about the priorities of the project, but here are some thoughts to the best of my understanding:

  • A single predict function call should be very inexpensive: i.e. millions of calls should be trivial
  • It shouldn't be a problem to select a subset of the higher priority components (e.g. 10-50 components) rather than the full ~100 (curse of dimensionality - makes sense), I think that's what he's doing right now just to simplify the problem
  • My guess is that finding the optimal outcome is more important than running as few as possible since it's a few months into a 4-year project. IIRC, the experimental throughput is ~10/week and the current dataset is on the order of a few hundred. Right now @ramseyissa has been using a custom script that suggests 10 new formulas to try each week by looking at 1D slices (using a subset of components) in composition space where the data is sparse. I think he also incorporated some chemical intuition into the search, but he's looking for something more sophisticated that considers the full parameter space and the target property.

@ramseyissa
Copy link

ramseyissa commented Nov 20, 2021

Thanks @sgbaird for raising the issue and thank you @lena-kashtelyan for the quick response and feedback!

Hopefully I can clear up some of the project goals / issues that I am facing.
To begin, I am fairly new to Ax and its complete capabilities.

Project Overview

  • The dataset is close to 80 dimensions of different types of items that can be combined to make a single formulation. This high dimensional dataset is divided into 4 sections, SectionA/SectionB/SectionC/SectionD. We can think of these sections as divided up evenly for simplicity, so 20 items per section. From each section, 5 items are selected in various percentages summing to 1. To reduce the dimensionality in the dataset, I am holding SectionB/C/D constant and only looking to optimize Section A. So ideally what I would like to do, is of the 20 items in SectionA, pick 5 items at a time at various percentages and predict for this combination (the 5 items always summing to 1). Each prediction can be any combination of the 20 in SectionA (as long it chooses 5).

Useful Information

  • As @sgbaird stated, the predict function call is not of concern as it is inexpensive to run.
  • I am mainly concerned with finding the optimal prediction.
  • One dimension search through the dataset in regions that are sparse has already produced a higher performing sample. So I believe a higher level search through the dataset using BO would be ideal and can definitely help optimize this problem.

Thank you both again and I am open to any suggestions or references.

@Balandat
Copy link
Contributor

So if the predict function is cheap to evaluate and you can potentially do millions of evaluations I think that Ax might just not be the right tool for this problem, as it is primarily designed for very costly settings where you may be able to get maybe a few hundred evaluations.

There are also quite a few challenges regarding the setup (in terms of the dimensionality of the inputs, limiting the number of components, etc.) that would not be straightforward to resolve. But even if we could resolve these, you might actually end up getting better results by using an optimization algorithm that can just evaluate tons of different configurations. Maybe some kind of evolutionary algorithm could work well here?

@bernardbeckerman
Copy link
Contributor

Brief drive-by comment regarding the constraints in this problem: @sgbaird correct me if i'm wrong, but you really only have 4 free parameters in this case, since the requirement that they sum to 1 removes a degree of freedom. For this reason, it might be useful for you to set a constraint:

"filler_A + filler_B + resin_A + resin_B <= 1"

and then for each parameterization resin_C = 1 - (filler_A + filler_B + resin_A + resin_B).

Of course this may end up being moot per @Balandat's comment above, but if you were to end up using Ax this might be a better way to compose your search space. Does that make sense?

@sgbaird
Copy link
Contributor Author

sgbaird commented Nov 22, 2021

@Balandat that's a good point. A brute force method or an EA might be more suitable. Do you have any suggestions on implementing the acquisition function (i.e. expected improvement) with a custom model (i.e. predict)?

@bernardbeckerman that's exactly right. I hadn't thought of that. If we try something out in Ax, I think we'll use that parameterization.

Thanks for the suggestions!

@Balandat
Copy link
Contributor

A brute force method or an EA might be more suitable. Do you have any suggestions on implementing the acquisition function (i.e. expected improvement) with a custom model (i.e. predict)?

I am not sure what exactly you're getting at here. If you use a brute force or EA, why would you need to implement an acquisition function?

@sgbaird
Copy link
Contributor Author

sgbaird commented Nov 23, 2021

In the adaptive design scheme (even if this involves a brute force or EA approach), we are still looking to find a happy medium between exploitation and exploration. With brute force or EA, it would be straightforward to try to maximize the objective function, strength for example, but it's less clear to me how we would incorporate the "exploration" component which is why I mentioned an acquisition function. Perhaps a better phrasing of the question is: with brute force or EA, how would you suggest incorporating a trade-off between exploration and exploitation similar to Bayesian optimization? My questions here could very well stem from a misunderstanding on my part about the theory or not explaining the problem clearly 😅.

It might also be worth mentioning that while we can run the predict function millions of times (e.g. random forest or neural network regression), we are probably limited to a few hundred adaptive design iterations per year (make a new sample, characterize it, retrain the predict model, determine next experiment, repeat).

@lena-kashtelyan
Copy link
Contributor

lena-kashtelyan commented Nov 30, 2021

cc @Balandat and @dme65, who I know was also thinking through this

@Balandat
Copy link
Contributor

Balandat commented Dec 5, 2021

It might also be worth mentioning that while we can run the predict function millions of times (e.g. random forest or neural network regression), we are probably limited to a few hundred adaptive design iterations per year (make a new sample, characterize it, retrain the predict model, determine next experiment, repeat).

I see, maybe we misunderstood the problem to some extent. It seems to me that the predict function really is your surrogate model, and the thing you're actually trying to optimize is the "outer loop" of optimizing the design where an evaluation is the fabrication and characterization of a sample, is that correct?

The way Ax is typically used is by building its own (typically Gaussian Process) surrogate model internally. It's generaly possible to swap out that model (though that will be some work). My main question is whether your predict function have any uncertainty quantification (i.e. how confident it is about the prediction and what the prediction variance is)?

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 7, 2021

@Balandat thank you for your response. That's correct, if I'm using the right terminology, predict is our surrogate model (and my choice of description probably added to the misunderstanding), and yes, the "outer loop" we're interested in is the fabrication/characterization of a sample. I think there are a few things I may need more clarification on. For experimentalists using Ax, what are the typical analogs that correspond to some of the Loop tutorial functions:

  1. Single evaluation of hartmann6 (is this typically a surrogate function, e.g. Gaussian Process Regression, user-defined surrogate function e.g. random forest, or an experiment/characterization iteration?)
  2. Single evaluation of hartmann_evaluation_function
  3. Single "trial" (e.g. out of total_trials=30) within optimize function

Our use case is very much a "suggest-next-experiment" adaptive design scheme for a limited/costly laboratory experimental dataset. For the current implementation (RandomForestRegressor), from what I can tell, nothing built-in or super straightforward for uncertainty quantification (UQ). But I am looking at modifying CrabNet (via my refactor) in the near future to work with arbitrary "ingredients lists" rather than only chemical formulas. CrabNet has built-in UQ via a transformer architecture. From what I understand, this is in the form of standard deviation for regression and a confidence score if the task is classification. Does that seem sufficient for the regression case?

We're open to using Ax's built-in GPR model with the existing small dataset (#743) in the short term while I'm implementing the CrabNet modifications.

Would love to hear your updated thoughts on how to approach this.

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 7, 2021

Also, we did get a working implementation, but it probably needs to be heavily modified in light of recent discussion. In particular, since our use-case falls into a human-in-the-loop category (i.e. manual laboratory experiments), we probably need the Developer API. I'm kicking myself for not reading https://ax.dev/docs/api.html and looking at the simple case comparisons, and also for dismissing the human-in-the-loop tutorial as being irrelevant to me (when I think of human-in-the-loop, I think of a human participating in the surrogate function evaluation, and the word "Developer" deterred me). While the following "works" (runs to completion and spits out best_parameters), it's round-about to the point of being inane, and doesn't really "work" in the sense of giving us our Bayesian-optimized "next experiment to run" for laboratory chemical synthesis. However, I think it's been a good deal of progress in terms of fleshing out the problem statement (esp. the constraints).

The training data is generated using a slightly modified version of components and compositions as described in the initial post #727 (comment), as well as some dummy data for the target values. The following is a MWE that considers n-1 degrees of freedom (n being the total number of components) #727 (comment) and is capable of extending at least to ~20 components #740 #694 (comment) via a custom generation strategy. It's also based primarily on the Loop API tutorial.

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor

# training data
X_train = np.array([
       [0.4, 0.4, 0. , 0. , 0.2],
       [0.5, 0. , 0. , 0.5, 0. ],
       [0.5, 0.3, 0. , 0.2, 0. ],
       [0.5, 0. , 0. , 0.5, 0. ],
       [0. , 0.6, 0.4, 0. , 0. ],
       [0.6, 0. , 0.4, 0. , 0. ],
       [0. , 0.6, 0.2, 0.2, 0. ]])
X_train = pd.DataFrame(X_train, columns=["filler_A", "filler_B", "resin_A", "resin_B", "resin_C"])
y_train = 100 * np.random.rand(len(components))

unique_components = list(X_train.columns)

# surrogate model
RFR = RandomForestRegressor(max_depth=7, random_state=0)
RFR.fit(X_train.values, y_train)

def strength(x):
    x = np.reshape(x, (-1, x.shape[0]))
    y = RFR.predict(x)[0]
    return y

# Ax-specific code
def strength_evaluation_function(parameterization):
    x = np.array(
        [parameterization.get(component) for component in unique_components[:-1]]
    )
    last_component = 1 - sum(x)
    x = np.append(x, last_component)
    return {"strength": strength(x)}

parameters = [
    {"name": component, "type": "range", "bounds": [0.0, 1.0]}
    for component in unique_components[:-1]
]

separator = " + "
composition_constraint = separator.join(unique_components[:-1]) + " <= 1"

gs = GenerationStrategy(
    steps=[
        # 1. Initialization step (does not require pre-existing data and is well-suited for
        # initial sampling of the search space)
        GenerationStep(
            model=Models.SOBOL,
            num_trials=5,  # How many trials should be produced from this generation step
            min_trials_observed=3,  # How many trials need to be completed to move to next model
            max_parallelism=5,  # Max parallelism for this step
            model_kwargs={
                "seed": 999,
                "fallback_to_sample_polytope": True,
            },  # Any kwargs you want passed into the model
            model_gen_kwargs={},  # Any kwargs you want passed to `modelbridge.gen`
        ),
        # 2. Bayesian optimization step (requires data obtained from previous phase and learns
        # from all data available at the time of each new candidate generation call)
        GenerationStep(
            model=Models.GPEI,
            num_trials=-1,  # No limitation on how many trials should be produced from this step
            max_parallelism=3,  # Parallelism limit for this step, often lower than for Sobol
            # More on parallelism vs. required samples in BayesOpt:
            # https://ax.dev/docs/bayesopt.html#tradeoff-between-parallelism-and-total-number-of-trials
        ),
    ]
)

best_parameters, values, experiment, model = optimize(
    parameters=parameters,
    experiment_name="composition_test",
    objective_name="strength",
    evaluation_function=strength_evaluation_function,
    parameter_constraints=[
        composition_constraint,  # compositions sum to 1
    ],
    total_trials=30,
    generation_strategy=gs,
)

# Add the last degree of freedom back in
x = np.array([best_parameters.get(component) for component in unique_components[:-1]])
last_component = 1 - sum(x)
x = np.append(x, last_component)
next_experiment = best_parameters
next_experiment[unique_components[-1]] = last_component

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 7, 2021

I adapted what I had from a Loop into a Service API and fixed some of the theory/understanding issues on my part in the linked example: #743 (comment) such that I'm generating a real suggested next_experiment. The main gap in my understanding is that a single evaluation of hartmann6 in the examples is like a wet-lab synthesis/characterization iteration for us.

I'm still struggling with the n_components < max_components constraint #745.

I'm also still confused about how I would replace the GPR surrogate model with my own (which has a built-in uncertainty output).

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 7, 2021

For using my own surrogate model, should I be looking at https://ax.dev/tutorials/modular_botax.html?

EDIT: and if so, can this be passed into a GenerationStep model kwarg?

@lena-kashtelyan
Copy link
Contributor

For using my own surrogate model, should I be looking at https://ax.dev/tutorials/modular_botax.html?

I think that's likely, but will let @Balandat chime in on that too.

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 8, 2021

I think we're getting close. For the "limit the total number of components" constraint (outstanding), we can at least temporarily prune off one idea. That is, to use outcome_constraint as a hacky workaround due to being unable to use non-linear parameter constraints (#745).

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 8, 2021

I dug a bit deeper into how to swap out a built-in Surrogate model with a custom model (#748), but it seems a custom model would need to be able to sample from its posterior distribution. The extent of UQ for my custom model (CrabNet) seems to be just a scalar uncertainty associated with the prediction. I figured that discussion about using a custom model might be best on a separate issue (#748). If it turns out to help the composition-based optimization discussed in this issue, then I can bring the main takeaway back here.

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 10, 2021

Here is a roadmap of some outstanding items as well as other features/topics that came up along the way. I'll plan on updating these later if the status changes (along with an "EDIT" keyword). If I may, I'm also cc'ing the people who have been participating or tagged in these discussions to bring attention to the "birds-eye" issue. Thank you to everyone who has helped me out so far. Your responses have been incredibly useful and informative. Personally, it's been very rewarding for me to get a better understanding of BO through the lens of a practical use case while using what I consider to be an excellent platform for it.

@lena-kashtelyan @Balandat @bernardbeckerman @eytan @saitcakmak @Ryan-Rhys @qingfeng10 @dme65 (@ramseyissa who is lead on the project)

n_components < max_components constraint

"use my own surrogate model"

Adding multiple outcome measurements for fixed parameters as separate trials

Incorporate input/parameter uncertainty

For example, when you mix a bunch of components together, but there is some uncertainty in the final composition of each component (e.g. instrument resolution, losses during synthesis)

Multi-objective optimization (e.g. strength, hardness)

While I haven't mentioned this one yet, we are interested in converting this to a MOO scheme. Note: this is different than what I mentioned above where I suggested using MOO to implement the n_components < max_components constraint. In this case, the MOO is for "real" outcomes (e.g. strength, hardness)

  • ✔️/❔ following the MOO tutorial should be fairly straightforward, but I'm not sure if MOO will be incompatible with any of the above-mentioned features

@lena-kashtelyan
Copy link
Contributor

@sgbaird, thank you for this summary, it's very helpful!

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 23, 2021

Great meeting with @lena-kashtelyan, @Balandat, and @dme65. No need to respond until after the break. I thought it might be good to try a few visualizations of the one-hot encoding scheme. There are four compositions (x, y, z, and color, where color = 1 - x - y - z). Then for different levels of max_components:

max_components = 4

A simplex

max_components = 3

A hollow simplex

max_components = 2

A wireframe simplex

max_components = 1

Simplex vertices

Takeaway

The constraint that n_components < max_components progressively removes higher dimensional facets of a simplex, leaving only the lower-dimensional constructs (e.g. "edges"). These facets are generally connected to each other, but the feasible region is similar to a hollow shape or a wireframe. It should be straightforward to determine the distance to the feasible region, but this also requires locating within the code where the feasibility weighting takes place.

Maybe here (except doesn't seem to be called using gen_next_trial, didn't stop on breakpoint):

Ax/ax/core/utils.py

Lines 125 to 155 in 3f10ec9

def best_feasible_objective( # pragma: no cover
optimization_config: OptimizationConfig, values: Dict[str, np.ndarray]
) -> np.ndarray:
"""Compute the best feasible objective value found by each iteration.
Args:
optimization_config: Optimization config.
values: Dictionary from metric name to array of value at each
iteration. If optimization config contains outcome constraints, values
for them must be present in `values`.
Returns: Array of cumulative best feasible value.
"""
# Get objective at each iteration
objective = optimization_config.objective
f = values[objective.metric.name]
# Set infeasible points to have infinitely bad values
infeas_val = np.Inf if objective.minimize else -np.Inf
for oc in optimization_config.outcome_constraints:
if oc.relative:
raise ValueError( # pragma: no cover
"Benchmark aggregation does not support relative constraints"
)
g = values[oc.metric.name]
feas = g <= oc.bound if oc.op == ComparisonOp.LEQ else g >= oc.bound
f[~feas] = infeas_val
# Get cumulative best
minimize = objective.minimize
accumulate = np.minimum.accumulate if minimize else np.maximum.accumulate
return accumulate(f)

@lena-kashtelyan
Copy link
Contributor

I'll close this issue in favor of #769 for now, since that issue will host discussion on a more narrow part of this work that we intend to focus on first. We can reopen this when/if there will be need, feel free to do so at any time @sgbaird!

@sgbaird
Copy link
Contributor Author

sgbaird commented Mar 29, 2022

I realized that there is an issue with using Sobol sampling and the one-fewer-degrees-of-freedom style compositional constraint, and wanted to make a record of it in this issue.

Take the case of:

x1+x2+x3 == 1

reparameterized to:

x1+x2 <= 1

where x3 == 1 - x1 - x2. When the Sobol candidates are generated, it is completely unaware of the original representation involving x3 during the sampling. If the bounds on x1 and x2 are [0,1], then it will preferentially favor the exploration of higher values of x1 and x2 than that of the unseen x3, if I'm not mistaken.

I'm guessing to work around this, I need to manually generate the Sobol points using e.g. ax.modelbridge.factory.get_sobol with the original parameterization, remove x3, and then attach them as if they were manually generated trials (with the caveat that they will appear as MANUAL trials rather than SOBOL trials). Does that workflow seem reasonable? Let me know if you have thoughts of a better way. (EDIT: see comment below)

In the context of #769, this isn't an issue if I pass an equality constraint (i.e. the original constraint) directly to botorch rather than using SumConstraint #769 (comment), subject to the caveat mentioned that it doesn't play well with transforms (but shouldn't really matter if the search space is [0, 1]^d anyway).

@sgbaird
Copy link
Contributor Author

sgbaird commented Apr 2, 2022

Actually, passing a linear equality constraint may be the only way to prevent bias in the Sobol sampling relative to the "true" search space, because the Sobol sampling still needs to respect the constraints (i.e. it has to see some constraint). If I pass a linear inequality constraint rather than a linear equality constraint, the bias will occur.

@sgbaird
Copy link
Contributor Author

sgbaird commented Dec 3, 2022

@jduerholt recently suggested the use of optimize_acqf_mixed within BoTorch based on the combinations from an NchooseK style constraint (i.e. only up to a certain number of elements allowed to be active for a given experiment). In my case, 20 choose 6 = 38760 combinations might be unwieldy without HPC resources; however, this suggestion seems like the most straightforward and rigorous option for cases where this is computationally tractable.

This seems like a great option to compare against the nonlinear nchoosek constraint approximation #769. Specifically in https://github.com/sparks-baird/optimization-benchmark (sparks-baird/optimization-benchmark#7)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants