Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_grid() results when a postprocessor is being tuned #975

Closed
topepo opened this issue Dec 18, 2024 · 2 comments
Closed

min_grid() results when a postprocessor is being tuned #975

topepo opened this issue Dec 18, 2024 · 2 comments

Comments

@topepo
Copy link
Member

topepo commented Dec 18, 2024

I think there is an issue with min_grid() when we have postprocessors being tuned (found while working on #974).

Reminder: min_grid() is about submodels and figures out the tuning parameter combinations that need to actually fit. For example, if a boosted tree evaluates trees = c(1, 5, 10), we only need to fit the model with trees = 10 and then use that model to predict the results with the smaller number of trees.

If there are preprocessing, tuning parameters, it finds the combinations of preprocessing and model tuning parameters that should be fit.

Since we are now adding postprocessors, we are treating those as preprocessing parameters when we find the unique values of the tuning parameters that should be fit.

Here’s an example with no preprocessor, two model parameters, and one postprocessors are being tuned.

library(tidymodels)
library(tailor)
mod_tune_bst <- boost_tree(trees = tune(), min_n = tune(), mode = "regression")

adjust_tune_min <-
    tailor() %>%
    adjust_numeric_range(lower_limit = tune())
wflow <- workflow(mpg ~ ., mod_tune_bst, adjust_tune_min)

# Fill in the missing range for the lower limit
prm <-
    extract_parameter_set_dials(wflow) %>%
    update(lower_limit = lower_limit(c(0, 1)))

grid <-
    grid_regular(prm) %>%
    # This will make the submodel parameter (trees) unbalanced for some
    # combination of parameters of the other parameters.
    slice(seq(1, 240, by = 7))

fit_looping <- 
    wflow %>% 
    extract_spec_parsnip() %>% 
    min_grid(grid)

fit_looping
#> # A tibble: 4 × 4
#>   trees min_n lower_limit .submodels
#>   <int> <int>       <dbl> <list>    
#> 1     1     2         0   <list [0]>
#> 2  2000    21         0.5 <list [0]>
#> 3     1    21         1   <list [0]>
#> 4  1000    40         0   <list [0]>

Created on 2024-12-18 with reprex v2.1.0

Rows 2 and 3 should not be split out since they have the same min_n; the tibble should have three rows since we only need to fit three preprocessing and model combinations.

It makes four rows since it takes lower_limit into account. However, that parameter is used after the model fit, so it should not affect what happens before the model.

What I’m not sure about is what should happen instead, i.e., where to put lower_limit? Do we need that column in the results of min_grid() at all? etc...

More to come.

topepo added a commit that referenced this issue Dec 19, 2024
topepo added a commit that referenced this issue Dec 19, 2024
@topepo
Copy link
Member Author

topepo commented Dec 20, 2024

The solution is to remove the postprocessing parameter columns from the grid, determine the distinct candidates in the remainder, then run min_grid() on this partial grid.

The postprocessing parameters go into their respective sub-tibbles prior to this so they are not affected.

@topepo topepo closed this as completed Dec 20, 2024
Copy link

github-actions bot commented Jan 4, 2025

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant