-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saving lm
model via vetiver object takes a lot of space
#264
Comments
This is a great question @lschneiderbauer. It's more about butcher than vetiver, so I will plan to move this issue over there. You can see what specifically we remove from an I wonder if we should consider two levels of butchering, one that retains the ability to make all kinds of predictions and one that is less conservative and only retains the ability to make a very simple prediction. In the meantime, if I were you, I would probably use the butcher infrastructure to remove the components you want before creating a vetiver model, something like this: library(butcher)
library(vetiver)
more_cars <- mtcars[rep(1:32, each = 1e4),]
cars_lm <- lm(mpg ~ ., data = more_cars)
weigh(cars_lm)
#> # A tibble: 25 × 2
#> object size
#> <chr> <dbl>
#> 1 qr.qr 54.0
#> 2 residuals 28.4
#> 3 fitted.values 28.4
#> 4 effects 5.12
#> 5 model.mpg 2.56
#> 6 model.cyl 2.56
#> 7 model.disp 2.56
#> 8 model.hp 2.56
#> 9 model.drat 2.56
#> 10 model.wt 2.56
#> # ℹ 15 more rows
axe_custom <- function(x) {
old <- x
## you probably don't want residuals either:
x <- butcher:::exchange(x, "residuals", numeric(0))
x$qr <- butcher:::exchange(x$qr, "qr", matrix(0))
x
}
axed_lm <- axe_custom(cars_lm)
weigh(axed_lm)
#> # A tibble: 25 × 2
#> object size
#> <chr> <dbl>
#> 1 fitted.values 28.4
#> 2 effects 5.12
#> 3 model.mpg 2.56
#> 4 model.cyl 2.56
#> 5 model.disp 2.56
#> 6 model.hp 2.56
#> 7 model.drat 2.56
#> 8 model.wt 2.56
#> 9 model.qsec 2.56
#> 10 model.vs 2.56
#> # ℹ 15 more rows
v <- vetiver_model(axed_lm, "custom-butchered-lm")
weigh(v)
#> # A tibble: 37 × 2
#> object size
#> <chr> <dbl>
#> 1 model.effects 5.12
#> 2 model.model.mpg 2.56
#> 3 model.model.cyl 2.56
#> 4 model.model.disp 2.56
#> 5 model.model.hp 2.56
#> 6 model.model.drat 2.56
#> 7 model.model.wt 2.56
#> 8 model.model.qsec 2.56
#> 9 model.model.vs 2.56
#> 10 model.model.am 2.56
#> # ℹ 27 more rows Created on 2023-11-30 with reprex v2.0.2 |
Oops no, I can't transfer an issue from the rstudio org to the tidymodels org. I'll open a new issue over there. |
Please feel free to add any details over at tidymodels/butcher#272 @lschneiderbauer! 🙌 |
Hi,
Thank you for putting effort into trying to make live easier for ML people. :)
I am just experimenting with the
vetiver
package to see if we can make use of it, and am stumbling over some issues.I set up a simple tidymodels workflow, fitted some data (~ 14 mio records), created a vetiver object and tried to persist it with
vetiver_pin_write()
.The problem I have is that the result takes ~ 1.4 GB on my hard disk.
Is this intentional? In our use case we really only need the (stored) model to make predictions and provide confidence intervals. For that storing the training coefficients and associated uncertainties should be enough, and I don't see why that should take 1.4 GB of space.
I tried to experiment with the
model = FALSE
parameter forlm()
, but that only reduced the filesize by half or so. It seems it has something to do with some fit$qr$qr object inside the fit model. I can manually remove that, and the filesize gets to an acceptible size, but neithervetiver
norbutcher
do so automatically.Do I have to live with the fact that the trained models will take a big amount of space or are there some measures I can take to get it to a size of the order of a couple of KB?
The text was updated successfully, but these errors were encountered: