Tune schedule #974

topepo · 2024-12-18T13:55:49Z

This draft PR is for testing a new data structure that governs how tune will efficiently process grids. Two new functions are currently exported for debugging but will not be exported in the final version:

get_tune_schedule() produces a tibble structure with nested levels that contains information about what tuning parameters should be used at different stages of execution. This PR is focused on making sure that this structure accounts for all of the tuning scenarios.
loopy() is a temporary function that emulates tune_grid_loop_iter() (without the bells and whistles) that is used to test get_tune_schedule() and also to understand how we will incorporate postprocessors into the mix.

hfrick · 2025-01-13T14:23:32Z

My high-level thoughts on get_schedule()

Introducing pre_stage
- We do not need another nesting structure since the preprocessing stage is the "top-level" stage in this context but I'm wondering if it would be a good idea nonetheless: Currently, all the other stages have a named (nested) tibble associated with them, even if the tibble is empty (i.e., has zero rows). That makes it easier to understand the object structure imo which is my main motivation for bringing this up. Bonus: it makes all the stages feel the same / have a corresponding structure.
Breaking up get_schedule()
- Into schedule_proproc(), schedule_model_stage(), schedule_predict_stage() which would all schedule "their" stage and push the rest of the candidate values into a tibble for the next stage. We'd call them recursively.
- This should make it easier to test the scheduling components separately and then their nesting within each other.
- It would also make it easier to add a schedule_postproc() if we ever want to get more complex about how to schedule in this stage. (I'm thinking about whether or not we allow more flexibility in the order of post-processing adjustments and/or add more adjustments.) Or tinker with how to schedule the preprocessing in a future with more info on the order of recipe steps?
Keeping the scheduling direction linear
- This is the thing that struck me already when reading the corresponding writeup.
- I'd prefer to start "outside" with preprocessing and end with post-stage, not go from model-stage to post-stage and then do the pre-stage.
- Part of that would be to use min_grid() only on the model parameters, without preprocessing or post-processing parameters, essentially using it as, or creating a, minimal_model_grid().

topepo · 2025-01-14T15:17:42Z

Note: let's take a look at how this structure can work with agua, which does the model fits en masse via h2o.

hfrick · 2025-01-14T16:12:53Z

Another note: We currently use the parameter set to determine which parameter, i.e., column in the grid, is associated with the preprocessing, model, or post-processing stage (via the $source column). With #660 in mind, we might want to try to get to that information without adding another dependency on the parameter set. Using tune_args(wflow) might be a solution, although, ideally, we want something that tells us the location of the tune() placeholders in the workflow, and judging from #660 that's not always what tune_args() does either. (But maybe should? - That's a different issue though.)

topepo · 2025-01-15T17:23:12Z

For agua... the current routine for processing the grid normally is here and the agua equivalent is here. Basically, agua consolidates the work that tune_grid() does via loop into a single function call.

For loopy(), I think that we would have to take a similar strategy. With loopy(), there isn't a top-level postprocessing loop within the model loop. That's because of the three cases mentioned here. For agua, we are likely to need a post-processing loop since we have all of the model results at once, and the post-actions will happen on a model-by-model basis.

TL;DR

agua will work similarly to tune, except instead of losing a loop, the loss of the model loop is offset by a post loop. It should still be very efficient.

topepo added 30 commits November 30, 2024 21:06

new code to compute the stages of modeling

2d0273b

test updates

7d49eeb

Merge branch 'main' into tune-schedule

5c90128

add temp loop code

8e51568

add code for prediction

4d39423

update predict/postproc loop

875b7cc

added note

87180fa

add classes

d0ddbac

cls name update

989a2c8

run air to reformat

174158b

air + pre-allocation of predictions

a0e13dc

combine data sets into a list

3e97ac6

air

b56cf09

add .row indicators

c9063f1

use predict_wrapper() in some places and also add more notes

566618b

use predict/multipredict in each of the three prediction/post cases

371965f

add metrics

1540212

remove namespaces

52b36dc

move tests

1b970ec

check 'type' argument

f18039b

iterate across prediction types

c73d1be

fix bug with postprocessing tuning parameters

e60a7c3

small logging update

e39a6cd

more improvements

1a542e8

update other tests for dev versions

7fda326

updated version

e03b720

refactored and extended schedule unit tests

35e9e0d

incomplete test

005479b

Portable packages must use only ASCII characters

978e7a7

changes for R CMD check

2fb8701

topepo mentioned this pull request Dec 18, 2024

min_grid() results when a postprocessor is being tuned #975

Closed

topepo added 6 commits December 19, 2024 16:25

fixes for #975

f6da85f

more tests for #975

626aaa6

changes to avoid test failures

2bf534b

function name update (undo this later)

b9f902f

skips for "no suggests" checks

07d0b27

skips for "no suggests" checks

26d6e21

topepo marked this pull request as ready for review January 9, 2025 16:50

topepo requested review from hfrick and simonpcouch January 9, 2025 16:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tune schedule #974

Tune schedule #974

topepo commented Dec 18, 2024

hfrick commented Jan 13, 2025

topepo commented Jan 14, 2025

hfrick commented Jan 14, 2025

topepo commented Jan 15, 2025

Tune schedule #974

Are you sure you want to change the base?

Tune schedule #974

Conversation

topepo commented Dec 18, 2024

hfrick commented Jan 13, 2025

topepo commented Jan 14, 2025

hfrick commented Jan 14, 2025

topepo commented Jan 15, 2025