update ch 25 (#127)

* update ch 25 * Update 25-functions.Rmd add chunk options to remove warnings
r4ds · Apr 2, 2024 · ee1571d · ee1571d
1 parent d69d657
commit ee1571d
Showing 1 changed file with 81 additions and 52 deletions.
diff --git a/25-functions.Rmd b/25-functions.Rmd
@@ -6,14 +6,13 @@
 
 We are going to learn about three useful type of function:
 
-- Vector functions take one or more vectors as input and return a vector as output.
+- *Vector functions* take one or more vectors as input and return a vector as output.
 
-- Data frame functions take a data frame as input and return a data frame as output.
+- *Data frame functions* take a data frame as input and return a data frame as output.
 
-- Plot functions that take a data frame as input and return a plot as output.
+- *Plot functions* that take a data frame as input and return a plot as output.
 
-## Prerequisites
-```{r}
+```{r echo=FALSE, warning = FALSE}
 library(tidyverse) |> suppressPackageStartupMessages()
 library(nycflights13)
 ```
@@ -40,31 +39,18 @@ Key steps in creating a function:
 
 1.  Pick a **name** than makes it clear what the function does
 
-2.  **Arguments**, or inputs, go inside `function`, like so `function(arguments)`.
+2.  **Arguments**, or input variable(s), go inside `function`, like so `function(arguments)`.
 
 3.  The **code** goes inside curly braces `{ }`, after `function()`.
 
 4.  Check your function with a few inputs to make sure it's working.
 
-## Functions are for computers *and* humans
-
-Be consistent in your naming and coding of functions
-
-**Names:**
-
-Functions should be verbs (action, state, or occurrence), arguments should be nouns (people places or things).
-
-Be consistent in using snake_case or camelCase.
-
-For sets of functions, use a common prefix
-
-Don't overwrite existing function
-
-**Comments:**
-
-Use comments to explain the 'why' of the code
+```{r eval=FALSE}
+name <- function(arguments) {
+  code
+}
+```
 
-Use lines of - or = to break up code into sections
 
 ## Vector functions
 ```{r}
@@ -88,7 +74,8 @@ df |> mutate(
 
 # Can you spot out the error in the above code?
 ```
-## Writing a function
+
+## Writing a vector function
 
 ```{r,eval=FALSE}
 (a - min(a, na.rm = TRUE)) / (max(a, na.rm = TRUE) - min(a, na.rm = TRUE))
@@ -105,12 +92,14 @@ To make this a bit clearer we can replace the bit that varies with █:
 
 To turn this into a function you need three things:
 
-- **A name**. Here we’ll use rescale01 because this function rescales a vector to lie between 0 and 1.
+- **A name**. Here we’ll use `rescale01` because this function rescales a vector to lie between 0 and 1.
 
-- **The arguments**. The arguments are things that vary across calls and our analysis above tells us that we have just one. We’ll call it x because this is the conventional name for a numeric vector.
+- **The arguments**. We have just one argument that we’ll call  `x` because this is the conventional name for a numeric vector.
 
 - **The body**. The body is the code that’s repeated across all the calls.
 
+## Using the `rescale01()` function
+
 ```{r}
 rescale01 <- function(x) {
   (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
@@ -124,7 +113,9 @@ rescale01(c(-10, 0, 10))
 rescale01(c(1, 2, 3, NA, 5))
 ```
 
-Then you can rewrite the call to mutate() as:
+## Using the `rescale01()` function (cont.) 
+
+Then you can rewrite the call to `mutate()` as:
 
 ```{r}
 df |> mutate(
@@ -135,7 +126,9 @@ df |> mutate(
 )
 ```
 
-We may  want to strip percent signs, commas, and dollar signs from a string before converting it into a number:
+## Other vector functions
+
+Here, we want to strip percent signs, commas, and dollar signs from a string before converting it into a number:
 
 ```{r}
 # https://twitter.com/NVlabormarket/status/1571939851922198530
@@ -159,9 +152,15 @@ clean_number("45%")
 
 When you notice yourself copying and pasting multiple verbs multiple times, you might think about writing a data frame function. 
 
-Data frame functions work like dplyr verbs: they take a data frame as the first argument, some extra arguments that say what to do with it, and return a data frame or vector.
+Data frame functions work like dplyr verbs: 
+
+- they take a data frame as the first argument, 
+- some extra arguments that say what to do with it, 
+- and return a data frame or vector.
+
+##  The problem of indirection
 
-##  Indirection and tidy evaluation
+When you start writing functions that use dplyr verbs you rapidly hit the problem of indirection. 
 
 ```{r}
 grouped_mean <- function(df, group_var, mean_var) {
@@ -176,6 +175,8 @@ diamonds |>
   grouped_mean(cut, carat)
 ```
 
+## The problem of indirection explained
+
 - To make the problem a bit more clear, we can use a made up data frame:
 
 ```{r}
@@ -195,10 +196,13 @@ df |>
   grouped_mean(group, y)
 ```
 
-- Regardless of how we call grouped_mean() it always does df |> group_by(group_var) |> summarize(mean(mean_var)), instead of df |> group_by(group) |> summarize(mean(x)) or df |> group_by(group) |> summarize(mean(y)). This is a problem of indirection, and it arises because dplyr uses tidy evaluation to allow you to refer to the names of variables inside your data frame without any special treatment.
+- Regardless of how we call `grouped_mean()` it always does `df |> group_by(group_var) |> summarize(mean(mean_var))`, instead of `df |> group_by(group) |> summarize(mean(x))` or `df |> group_by(group) |> summarize(mean(y))`. 
+- This is a problem of *indirection*, and it arises because dplyr uses **tidy evaluation** to allow you to refer to the names of variables inside your data frame without any special treatment.
 
+## Tidy evaluation and embracing
 
-- To overcome this problem we are going to use Tidy evaluation which have a solution to this problem called embracing 🤗. Embracing a variable means to wrap it in braces so (e.g.) var becomes {{ var }}.
+- Tidy evaluation makes our data analyses very concise as you never have to say which data frame a variable comes from, but the downside  comes when we want to wrap up repeated tidyverse code into a function.
+- Our solution to overcome to this problem called **embracing** 🤗. Embracing a variable means to wrap it in braces so (e.g.) `var` becomes `{{ var }}`.
 
 ```{r}
 grouped_mean <- function(df, group_var, mean_var) {
@@ -213,11 +217,11 @@ df |>
 
 ## When to embrace?
 
-So the key challenge in writing data frame functions is figuring out which arguments need to be embraced.
+So the key challenge in writing data frame functions is figuring out which arguments need to be embraced. There are two terms to look for in the docs which correspond to the two most common sub-types of tidy evaluation:
 
-- Data-masking: this is used in functions like arrange(), filter(), and summarize() that compute with variables.
+- **Data-masking:** this is used in functions like `arrange()`, `filter()`, and `summarize()` that *compute* with variables.
 
-- Tidy-selection: this is used for functions like select(), relocate(), and rename() that select variables.
+- **Tidy-selection:** this is used for functions like `select()`, `relocate()`, and `rename()` that *select* variables.
 
 ## Common use cases
 
@@ -246,6 +250,17 @@ diamonds |>
 
 ## Plot functions
 
+```{r eval=FALSE}
+diamonds |> 
+  ggplot(aes(x = carat)) +
+  geom_histogram(binwidth = 0.1)
+
+diamonds |> 
+  ggplot(aes(x = carat)) +
+  geom_histogram(binwidth = 0.05)
+```
+
+You can take the code above and create a function, keeping in mind that `aes()` is a data-masking function and you'll need to embrace.
 ```{r}
 histogram <- function(df, var, binwidth = NULL) {
   df |> 
@@ -257,15 +272,17 @@ diamonds |>
   histogram(carat, 0.1)
 ```
 
-```{r}
+Note that because `histogram()` returns a ggplot2 plot, meaning you can still add on additional components if you want. Just remember to switch from `|>` to `+`:
+
+```{r eval=FALSE}
 diamonds |> 
   histogram(carat, 0.1) +
   labs(x = "Size (in carats)", y = "Number of diamonds")
 ```
 
-## More variables
+## Adding more variables to plot functions
 
-It’s straightforward to add more variables to the mix. For example, maybe you want an easy way to eyeball whether or not a dataset is linear by overlaying a smooth line and a straight line:
+Here, we want an easy way to eyeball whether or not a dataset is linear by overlaying a smooth line and a straight line:
 
 ```{r}
 # https://twitter.com/tyler_js_smith/status/1574377116988104704
@@ -284,6 +301,9 @@ starwars |>
 
 ## Combining with other tidyverse
 
+We can combine a dash of data manipulation with ggplot2, as seen below.
+
+You'll notice we have to use a new operator here, `:=`, because we are generating the variable name based on user-supplied data. Variable names go on the left hand side of `=`, but R’s syntax doesn’t allow anything to the left of `=` except for a single literal name. 
 ```{r}
 sorted_bars <- function(df, var) {
   df |> 
@@ -295,22 +315,11 @@ sorted_bars <- function(df, var) {
 diamonds |> 
   sorted_bars(clarity)
 ```
-- We have to use a new operator here, :=, because we are generating the variable name based on user-supplied data. Variable names go on the left hand side of =, but R’s syntax doesn’t allow anything to the left of = except for a single literal name. 
-
-```{r}
-conditional_bars <- function(df, condition, var) {
-  df |> 
-    filter({{ condition }}) |> 
-    ggplot(aes(x = {{ var }})) + 
-    geom_bar()
-}
-
-diamonds |> 
-  conditional_bars(cut == "Good", clarity)
-```
 
 ## Labeling
 
+Here, we label the output with the variable and the bin width that was used in our previous histogram using the `rlang::englue()`to go under the covers of tidy evaluation. `rlang` is a low-level package that’s used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools). `englue()` works similarly to `str_glue()`, so any value wrapped in `{ }` will be inserted into the string. 
+
 ```{r}
 histogram <- function(df, var, binwidth) {
   label <- rlang::englue("A histogram of {{var}} with binwidth {binwidth}")
@@ -325,6 +334,26 @@ diamonds |>
   histogram(carat, 0.1)
 ```
 
+## Style: Making functions readable
+
+Be consistent in your naming and coding of functions
+
+**Names:**
+
+- Functions should be verbs (action, state, or occurrence), arguments should be nouns (people places or things).
+
+- Be consistent in using snake_case or camelCase.
+
+- For sets of functions, use a common prefix
+
+- Don't overwrite existing function
+
+**Comments:**
+
+- Use comments to explain the 'why' of the code
+
+- Use lines of - or = to break up code into sections
+
 ## Summary
 
 - In this chapter we learned how to write functions for three useful scenarios: **creating a vector**, **creating a data frames**, or **creating a plot**.