Feature Request: `kfold_cv()` Wrapper Function for Terminology Consistency and Usability #554

msberends · 2024-11-05T09:43:55Z

Dear Tidymodels Development Team,

First, thank you for the excellent work on the rsample package and the entire tidymodels ecosystem. Your contributions have significantly improved accessibility and usability for data science and machine learning in R, and your consistent, high-quality work is truly appreciated.

Feature Request: `kfold_cv()` Wrapper Function

I would like to suggest adding a kfold_cv() function as a wrapper for vfold_cv(). This request is aimed at enhancing both terminology consistency and user accessibility, as "k-fold cross-validation" is overwhelmingly the more common term in the literature and among practitioners.

It could simply be implemented as:

#' @rdname vfold_cv
#' @param v,k The number of partitions of the data set
#' @export
kfold_cv(data, k = 10, repeats = 1, strata = NULL, breaks = 4, pool = 0.1, ...) {
  vfold_cv(data = data, v = k, repeats = repeats, strata = strata, breaks = breaks, pool = pool, ...)
}

Rationale

Standard Terminology Usage: The term "k-fold cross-validation" is widely recognized as the standard across numerous publications and textbooks on machine learning and statistics. Here are some authoritative sources where the term "k-fold cross-validation" is used consistently:
- Hastie, Tibshirani, and Friedman (2009) in The Elements of Statistical Learning specifically refer to "k-fold cross-validation" (p. 222, Springer) as a foundational resampling method.
- James, Witten, Hastie, and Tibshirani (2013) in An Introduction to Statistical Learning also use "k-fold cross-validation" (p. 176, Springer), reflecting the term’s adoption in foundational texts.
- Goodfellow, Bengio, and Courville (2016) in Deep Learning further emphasize "k-fold cross-validation" as a core method in machine learning (MIT Press, p. 120).
The popularity of the term "k-fold" over "v-fold" is also evident in applied contexts, including online courses, tutorials, and the documentation of other machine learning libraries such as scikit-learn in Python, where KFold is explicitly named.
User Familiarity and Accessibility: Most users, especially those new to tidymodels, might be more familiar with the term "k-fold" and may not immediately recognize "v-fold" as equivalent. This can lead to confusion, especially for those coming from other languages or tools where "k-fold" is standard. A kfold_cv() function could help bridge this gap, enhancing the user experience without altering any functionality.
Maintaining Code Consistency and Readability: Many users adapt code from textbooks or other languages, where k is typically the identifier for the number of folds. Allowing both vfold_cv() and kfold_cv() would support more readable code for these users and may help streamline transitions for those migrating workflows into R and tidymodels.

Thank you for considering this suggestion, and for your commitment to improving the tools available to the R and data science communities. Your work has set a high standard, and small enhancements like this can make an even greater impact on usability.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: `kfold_cv()` Wrapper Function for Terminology Consistency and Usability #554

Feature Request: `kfold_cv()` Wrapper Function for Terminology Consistency and Usability #554

msberends commented Nov 5, 2024

Feature Request: kfold_cv() Wrapper Function for Terminology Consistency and Usability #554

Feature Request: kfold_cv() Wrapper Function for Terminology Consistency and Usability #554

Comments

msberends commented Nov 5, 2024