object(variable) naming conventions for tidymodels.[my Suggestion]

ver. 2021-12-18

names[suggestion]	objects
ames	data(ames)
ames_split	initial_split()
ames_train	training()
ames_test	testing()
ames_cv	mc_cv() vfold_cv() bootstraps() validation_split()
ames_rec	recipes objects finalize_recipe()
ames_spec	parsnip models finalize_model()
ames_wflow	workflows objects finalize_workflow() update()
ames_fit	fit() fit_members()
ames_cv_res	fit_resample()
ames_grid	grid_*() crossing()
ames_tune_res	tune_*()
ames_last_res	last_fit()
*_param	select_best() parameters()
*_perf	yardstick result collect_metrics() rank_results()
Don't create	control_*() prep() bake() metric_set()

Please let me know if you have any suggestions for improvement.

Object naming conventions in R

Sometimes debate about object naming conventions.

https://en.wikipedia.org/wiki/Naming_convention_(programming)

You're more comfortable with free writing, right?
To be honest, I am always tired of following the rules.
But the naming convention still exists.
The reasons for this are to reduce the stress of writing code and to make it easier to understand when re-reading it later.
If the name of the object does not correspond to its content, you will have to re-read all the code to understand the relationship between the object and its content.
If you are not the only one who might read the code, it is confusing to give it any name you like.
The object name and content must be linked, like your email address and yourself.

Some naming conventions are given in the style guides published by google and tidyverse.

https://google.github.io/styleguide/Rguide.html

https://style.tidyverse.org/

Some of the rules are

Variable names should be concise
Function names are verbs
Variable names are nouns
Variable names should not contain a dot symbol

and so on.

Also, different style guides may have different recommended rules.

Use BigCamelCase for variable names [google].
Use lowercase letters and underscores(_) in variable names [tidyverse].

These are just rules, there is no right answer.
Although not listed here, some old R users may have followed the rule of using the dot symbol(.) in variable names.

What is the meaning of the "." (dot) in R?

And it is mentioned in the style guide that "it is very difficult to follow the rule".

Generally, variable names should be nouns and function names should be verbs. Strive for names that are concise and meaningful (this is not easy!).
-- The tidyverse style guide [2.1 Object names]

Rules(principles) for tidymodels

tidymodels and tidyverse, is a package led by Rstudio.
The members working on it are all familiar with the tidyverse and R rules.

The following Tidy Modeling with R: a.k.a. TMwR also contains a bit about the rules and philosophy.

https://www.tmwr.org/tidyverse.html#principles

However, when you read the code in TMwR, you may find that the objects are named differently even though their contents are the same.
Therefore, I would like to organize the relationship between names and contents in TMwR and suggest naming conventions to help you when you feel like writing tidy code.

object name and contents table

I mainly checked this by looking at TMWR, but I also referred to blogs of people who wrote tidymodels code.
The names were given in the following relationships.

names	objects	names	objects
_split _spl	initial_split()	*_train	training()
spl	smooth.spline()	*_test	testing()
resample	mc_cv()	_mod _model *_spec	parsnip models
*_folds	vfold_cv()	_rec _recipe	recipes objects
rs	bootstraps()	*_rec_trained	prep()
*_val_set	validation_split()	*_processed	bake()
*_metrics	metric_set()	_wfl _wflow	workflows objects
ctrl keep_pred ctrl_AAA	control_AAA()	*_param	parameters()
*_grid	grid_AAA	*_tune	tune_grid()
grid_AAA	tune_race_AAA	*_fit	fit()
*_res	collect_metrics() collect_predictions() fit_resample() last_fit() and mode...

I've been reading some tidymodels code that some people have written, and I've been thinking

Cross-validation has variations in the names.
The role of res is unclear.
The use of variables for detailed settings.

I think these are the reasons why named objects are so complicated.

So I'll try to simplify them by dividing them into certain groups.
In order to make the following table easier to understand, I will assume that you are using data ames.

ver. 2021-12-18

names[suggestion]	objects
ames	data(ames)
ames_split	initial_split()
ames_train	training()
ames_test	testing()
ames_cv	mc_cv() vfold_cv() bootstraps() validation_split()
ames_rec	recipes objects finalize_recipe()
ames_spec	parsnip models finalize_model()
ames_wflow	workflows objects finalize_workflow() update()
ames_fit	fit() fit_members()
ames_cv_res	fit_resample()
ames_grid	grid_*() crossing()
ames_tune_res	tune_*()
ames_last_res	last_fit()
*_param	select_best() parameters()
*_perf	yardstick result collect_metrics() rank_results()
Don't create	control_*() prep() bake() metric_set()

Here is an explanation of why the table above is the way it is.

1. In parsnip, the idea is to separate the data from the specification, so we use *_spec.

I used the term "spec" instead of "model".

2. prep and bake are temporary variables, but should be executed when used.

while modeling with tidymodels, the data created by prep and bake was used for checking the data rather than creating models,
so I did not set a naming convention.
If you are going to use them for model input, it is recommended to use _train or _test.

3. control_*() should be written directly to options.

you should type control_() directly in the "control argument".
The sentence of code will be longer, but since there are multiple types of control_(), there is no need to create complicated variables.

4. Do not create objects from metric_set(), write it directly to options.

Selecting more metrics will make your code longer, but I don't think it's a big problem, so please don't create variables and write them into the "argument".

5. Use *_cv for objects that are for cross-validation purposes, including bootstraps.

Although bootstraps can be used in a variety of ways, for the purpose of machine learning, let's use it as cross-validation data.
Although neither bootstraps nor validation_split has cv in the name of the function, they are both "rset class" and have in common that they can be entered into fit_resample, etc. As a side note, initial_split is an "rsplit class".

6. Use *_res for results using cross-validated data.

I defined how to use res. tune and fit_resample returns are the result of fit() with cross-validation data.
The results of the last_fit() has "last_fit class" and can be used with collect_metrics().
"resample_results class" and "tune_results class" can use collect_metrics() too.
In addition, it can be used as input for stacks::add_candidates() and tidyposterior::perf_mod().

7. Use *_perf if you are checking the performance.

I have differentiated the results measured using metrics that have been considered _res from "results" by using the name "performance".

Finally.

Please let me know if there are any omissions.
Since tidymodels may change the function, this is not a decision but needs to be modified.

Update history

2021-12-19: First version submitted.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

object(variable) naming conventions for tidymodels.[my Suggestion]

Object naming conventions in R

Rules(principles) for tidymodels

object name and contents table

1. In parsnip, the idea is to separate the data from the specification, so we use *_spec.

2. prep and bake are temporary variables, but should be executed when used.

3. control_*() should be written directly to options.

4. Do not create objects from metric_set(), write it directly to options.

5. Use *_cv for objects that are for cross-validation purposes, including bootstraps.

6. Use *_res for results using cross-validated data.

7. Use *_perf if you are checking the performance.

Finally.

Update history

About

Releases

Packages

amazongodman/tidymodels_naming_conventions

Folders and files

Latest commit

History

Repository files navigation

object(variable) naming conventions for tidymodels.[my Suggestion]

Object naming conventions in R

Rules(principles) for tidymodels

object name and contents table

1. In parsnip, the idea is to separate the data from the specification, so we use *_spec.

2. prep and bake are temporary variables, but should be executed when used.

3. control_*() should be written directly to options.

4. Do not create objects from metric_set(), write it directly to options.

5. Use *_cv for objects that are for cross-validation purposes, including bootstraps.

6. Use *_res for results using cross-validated data.

7. Use *_perf if you are checking the performance.

Finally.

Update history

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages