We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
step_lencode_glm()
I think i found some evidence that we can improve the speed of step_lencode_glm() significantly
the following shows a rough benchmark. to note
x
library(embed) n_obs <- 500000 data <- tibble( outcome = rnorm(n_obs), x = factor(sample(seq_len(100), n_obs, TRUE)) ) tictoc::tic("old") res <- recipe(outcome ~ x, data = data) |> step_lencode_glm(x, outcome = vars(outcome)) |> prep() tictoc::toc() #> old: 8.327 sec elapsed tictoc::tic("new") tmp <- data |> summarise(value = mean(outcome), .by = x) tictoc::toc() #> new: 0.007 sec elapsed
The text was updated successfully, but these errors were encountered:
fast_lencode_glm <- function(x, y, wts = NULL) { data <- tibble::new_tibble( list(..level = x, values = y, wts = wts) ) if (is.null(wts)) { res <- dplyr::summarise(data, ..value = mean(values), .by = ..level) } else { res <- dplyr::summarise(data, ..value = weighted.mean(values, wts), .by = ..level) } unseen <- tibble::new_tibble( list( ..level = "..new", ..value = mean(res$..value, trim = 0.1) ) ) dplyr::bind_rows(res, unseen) }
they should be based on number of rows, to make sure they always go over the estimate
my proposal: calculate the probability to be (2*n - 1) / (2 * n) instead of 1.
n <- 100 p <- (n-1) / n log(p / (1 - p)) #> [1] 4.59512 n <- 1000 p <- (n-1) / n log(p / (1 - p)) #> [1] 6.906755 n <- 1000 p <- (n-1) / n p1 <- (2*n-1) / (2*n) log(p / (1 - p)) #> [1] 6.906755 log(p1 / (1 - p1)) #> [1] 7.600402
Sorry, something went wrong.
No branches or pull requests
I think i found some evidence that we can improve the speed of
step_lencode_glm()
significantlythe following shows a rough benchmark. to note
x
. new method has same speedThe text was updated successfully, but these errors were encountered: