Skip to content

Commit

Permalink
update ch 25 (#127)
Browse files Browse the repository at this point in the history
* update ch 25

* Update 25-functions.Rmd

add chunk options to remove warnings
  • Loading branch information
lgibson7 authored Apr 2, 2024
1 parent d69d657 commit ee1571d
Showing 1 changed file with 81 additions and 52 deletions.
133 changes: 81 additions & 52 deletions 25-functions.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,13 @@

We are going to learn about three useful type of function:

- Vector functions take one or more vectors as input and return a vector as output.
- *Vector functions* take one or more vectors as input and return a vector as output.

- Data frame functions take a data frame as input and return a data frame as output.
- *Data frame functions* take a data frame as input and return a data frame as output.

- Plot functions that take a data frame as input and return a plot as output.
- *Plot functions* that take a data frame as input and return a plot as output.

## Prerequisites
```{r}
```{r echo=FALSE, warning = FALSE}
library(tidyverse) |> suppressPackageStartupMessages()
library(nycflights13)
```
Expand All @@ -40,31 +39,18 @@ Key steps in creating a function:

1. Pick a **name** than makes it clear what the function does

2. **Arguments**, or inputs, go inside `function`, like so `function(arguments)`.
2. **Arguments**, or input variable(s), go inside `function`, like so `function(arguments)`.

3. The **code** goes inside curly braces `{ }`, after `function()`.

4. Check your function with a few inputs to make sure it's working.

## Functions are for computers *and* humans

Be consistent in your naming and coding of functions

**Names:**

Functions should be verbs (action, state, or occurrence), arguments should be nouns (people places or things).

Be consistent in using snake_case or camelCase.

For sets of functions, use a common prefix

Don't overwrite existing function

**Comments:**

Use comments to explain the 'why' of the code
```{r eval=FALSE}
name <- function(arguments) {
code
}
```

Use lines of - or = to break up code into sections

## Vector functions
```{r}
Expand All @@ -88,7 +74,8 @@ df |> mutate(
# Can you spot out the error in the above code?
```
## Writing a function

## Writing a vector function

```{r,eval=FALSE}
(a - min(a, na.rm = TRUE)) / (max(a, na.rm = TRUE) - min(a, na.rm = TRUE))
Expand All @@ -105,12 +92,14 @@ To make this a bit clearer we can replace the bit that varies with █:

To turn this into a function you need three things:

- **A name**. Here we’ll use rescale01 because this function rescales a vector to lie between 0 and 1.
- **A name**. Here we’ll use `rescale01` because this function rescales a vector to lie between 0 and 1.

- **The arguments**. The arguments are things that vary across calls and our analysis above tells us that we have just one. We’ll call it x because this is the conventional name for a numeric vector.
- **The arguments**. We have just one argument that we’ll call `x` because this is the conventional name for a numeric vector.

- **The body**. The body is the code that’s repeated across all the calls.

## Using the `rescale01()` function

```{r}
rescale01 <- function(x) {
(x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))
Expand All @@ -124,7 +113,9 @@ rescale01(c(-10, 0, 10))
rescale01(c(1, 2, 3, NA, 5))
```

Then you can rewrite the call to mutate() as:
## Using the `rescale01()` function (cont.)

Then you can rewrite the call to `mutate()` as:

```{r}
df |> mutate(
Expand All @@ -135,7 +126,9 @@ df |> mutate(
)
```

We may want to strip percent signs, commas, and dollar signs from a string before converting it into a number:
## Other vector functions

Here, we want to strip percent signs, commas, and dollar signs from a string before converting it into a number:

```{r}
# https://twitter.com/NVlabormarket/status/1571939851922198530
Expand All @@ -159,9 +152,15 @@ clean_number("45%")

When you notice yourself copying and pasting multiple verbs multiple times, you might think about writing a data frame function.

Data frame functions work like dplyr verbs: they take a data frame as the first argument, some extra arguments that say what to do with it, and return a data frame or vector.
Data frame functions work like dplyr verbs:

- they take a data frame as the first argument,
- some extra arguments that say what to do with it,
- and return a data frame or vector.

## The problem of indirection

## Indirection and tidy evaluation
When you start writing functions that use dplyr verbs you rapidly hit the problem of indirection.

```{r}
grouped_mean <- function(df, group_var, mean_var) {
Expand All @@ -176,6 +175,8 @@ diamonds |>
grouped_mean(cut, carat)
```

## The problem of indirection explained

- To make the problem a bit more clear, we can use a made up data frame:

```{r}
Expand All @@ -195,10 +196,13 @@ df |>
grouped_mean(group, y)
```

- Regardless of how we call grouped_mean() it always does df |> group_by(group_var) |> summarize(mean(mean_var)), instead of df |> group_by(group) |> summarize(mean(x)) or df |> group_by(group) |> summarize(mean(y)). This is a problem of indirection, and it arises because dplyr uses tidy evaluation to allow you to refer to the names of variables inside your data frame without any special treatment.
- Regardless of how we call `grouped_mean()` it always does `df |> group_by(group_var) |> summarize(mean(mean_var))`, instead of `df |> group_by(group) |> summarize(mean(x))` or `df |> group_by(group) |> summarize(mean(y))`.
- This is a problem of *indirection*, and it arises because dplyr uses **tidy evaluation** to allow you to refer to the names of variables inside your data frame without any special treatment.

## Tidy evaluation and embracing

- To overcome this problem we are going to use Tidy evaluation which have a solution to this problem called embracing 🤗. Embracing a variable means to wrap it in braces so (e.g.) var becomes {{ var }}.
- Tidy evaluation makes our data analyses very concise as you never have to say which data frame a variable comes from, but the downside comes when we want to wrap up repeated tidyverse code into a function.
- Our solution to overcome to this problem called **embracing** 🤗. Embracing a variable means to wrap it in braces so (e.g.) `var` becomes `{{ var }}`.

```{r}
grouped_mean <- function(df, group_var, mean_var) {
Expand All @@ -213,11 +217,11 @@ df |>

## When to embrace?

So the key challenge in writing data frame functions is figuring out which arguments need to be embraced.
So the key challenge in writing data frame functions is figuring out which arguments need to be embraced. There are two terms to look for in the docs which correspond to the two most common sub-types of tidy evaluation:

- Data-masking: this is used in functions like arrange(), filter(), and summarize() that compute with variables.
- **Data-masking:** this is used in functions like `arrange()`, `filter()`, and `summarize()` that *compute* with variables.

- Tidy-selection: this is used for functions like select(), relocate(), and rename() that select variables.
- **Tidy-selection:** this is used for functions like `select()`, `relocate()`, and `rename()` that *select* variables.

## Common use cases

Expand Down Expand Up @@ -246,6 +250,17 @@ diamonds |>

## Plot functions

```{r eval=FALSE}
diamonds |>
ggplot(aes(x = carat)) +
geom_histogram(binwidth = 0.1)
diamonds |>
ggplot(aes(x = carat)) +
geom_histogram(binwidth = 0.05)
```

You can take the code above and create a function, keeping in mind that `aes()` is a data-masking function and you'll need to embrace.
```{r}
histogram <- function(df, var, binwidth = NULL) {
df |>
Expand All @@ -257,15 +272,17 @@ diamonds |>
histogram(carat, 0.1)
```

```{r}
Note that because `histogram()` returns a ggplot2 plot, meaning you can still add on additional components if you want. Just remember to switch from `|>` to `+`:

```{r eval=FALSE}
diamonds |>
histogram(carat, 0.1) +
labs(x = "Size (in carats)", y = "Number of diamonds")
```

## More variables
## Adding more variables to plot functions

It’s straightforward to add more variables to the mix. For example, maybe you want an easy way to eyeball whether or not a dataset is linear by overlaying a smooth line and a straight line:
Here, we want an easy way to eyeball whether or not a dataset is linear by overlaying a smooth line and a straight line:

```{r}
# https://twitter.com/tyler_js_smith/status/1574377116988104704
Expand All @@ -284,6 +301,9 @@ starwars |>

## Combining with other tidyverse

We can combine a dash of data manipulation with ggplot2, as seen below.

You'll notice we have to use a new operator here, `:=`, because we are generating the variable name based on user-supplied data. Variable names go on the left hand side of `=`, but R’s syntax doesn’t allow anything to the left of `=` except for a single literal name.
```{r}
sorted_bars <- function(df, var) {
df |>
Expand All @@ -295,22 +315,11 @@ sorted_bars <- function(df, var) {
diamonds |>
sorted_bars(clarity)
```
- We have to use a new operator here, :=, because we are generating the variable name based on user-supplied data. Variable names go on the left hand side of =, but R’s syntax doesn’t allow anything to the left of = except for a single literal name.

```{r}
conditional_bars <- function(df, condition, var) {
df |>
filter({{ condition }}) |>
ggplot(aes(x = {{ var }})) +
geom_bar()
}
diamonds |>
conditional_bars(cut == "Good", clarity)
```

## Labeling

Here, we label the output with the variable and the bin width that was used in our previous histogram using the `rlang::englue()`to go under the covers of tidy evaluation. `rlang` is a low-level package that’s used by just about every other package in the tidyverse because it implements tidy evaluation (as well as many other useful tools). `englue()` works similarly to `str_glue()`, so any value wrapped in `{ }` will be inserted into the string.

```{r}
histogram <- function(df, var, binwidth) {
label <- rlang::englue("A histogram of {{var}} with binwidth {binwidth}")
Expand All @@ -325,6 +334,26 @@ diamonds |>
histogram(carat, 0.1)
```

## Style: Making functions readable

Be consistent in your naming and coding of functions

**Names:**

- Functions should be verbs (action, state, or occurrence), arguments should be nouns (people places or things).

- Be consistent in using snake_case or camelCase.

- For sets of functions, use a common prefix

- Don't overwrite existing function

**Comments:**

- Use comments to explain the 'why' of the code

- Use lines of - or = to break up code into sections

## Summary

- In this chapter we learned how to write functions for three useful scenarios: **creating a vector**, **creating a data frames**, or **creating a plot**.
Expand Down

0 comments on commit ee1571d

Please sign in to comment.