Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

step_dummy() doesn't capture contrasts #1349

Open
EmilHvitfeldt opened this issue Jul 12, 2024 · 0 comments
Open

step_dummy() doesn't capture contrasts #1349

EmilHvitfeldt opened this issue Jul 12, 2024 · 0 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@EmilHvitfeldt
Copy link
Member

step_dummy() doesn't capture contrasts, thus won't work bake() if contrasts option if different between prep() and bake() time.

library(recipes)

rec_normal <- recipe(~ ., data = iris) %>%
  step_dummy(all_nominal_predictors()) %>%
  prep()

param <- getOption("contrasts")
go_helmert <- param
go_helmert["unordered"] <- "contr.helmert"
options(contrasts = go_helmert)

rec_helmert <- recipe(~ ., data = iris) %>%
  step_dummy(all_nominal_predictors()) %>%
  prep()

rec_helmert %>%
  bake(iris)
#> # A tibble: 150 × 6
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species_X1 Species_X2
#>           <dbl>       <dbl>        <dbl>       <dbl>      <dbl>      <dbl>
#>  1          5.1         3.5          1.4         0.2         -1         -1
#>  2          4.9         3            1.4         0.2         -1         -1
#>  3          4.7         3.2          1.3         0.2         -1         -1
#>  4          4.6         3.1          1.5         0.2         -1         -1
#>  5          5           3.6          1.4         0.2         -1         -1
#>  6          5.4         3.9          1.7         0.4         -1         -1
#>  7          4.6         3.4          1.4         0.3         -1         -1
#>  8          5           3.4          1.5         0.2         -1         -1
#>  9          4.4         2.9          1.4         0.2         -1         -1
#> 10          4.9         3.1          1.5         0.1         -1         -1
#> # ℹ 140 more rows

Everything works up and until now. But rec_normal() was prepped using contr.treatment so it breaks when used in contr.helmet

rec_normal %>%
  bake(iris)
#> Error in `vctrs::vec_locate_matches()`:
#> ! Each value of `needles` must have a match in `haystack`.
#> ✖ Location 5 of `needles` does not have a match.

I could imagine that this will break some production applications, where prepping was done using contr.poly, but the prediction environment (vetiver as an example) would have no idea.

Suggested fix

I think this bug give enough motivation to add a contrast argument to step_dummy() such that:

  • it can keep the information
  • simplify the code a little
  • long term: start deprecation of using global options in steps
@EmilHvitfeldt EmilHvitfeldt added the bug an unexpected problem or unintended behavior label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

1 participant