_posts/2022-07-12-european-flights/european-flights.Rmd

---
title: "How to write a function in R and apply it to a data frame using map functions from {purrr}"
description: |
  Writing a function and applying it to a data frame using the
  #TidyTuesday data set for week 28 of 2022
    (12/7/2022): "European Flights"
author:
  - name: Ronan Harrington
    url: https://github.com/rnnh/
date: 2022-07-12
repository_url: https://github.com/rnnh/TidyTuesday/
preview: european-flights_files/figure-html5/fig1-1.png
output:
  distill::distill_article:
    self_contained: false
    toc: true
---


````{r knitr, include=FALSE}
knitr::opts_chunk$set(include = TRUE)
knitr::opts_chunk$set(fig.height = 6)
knitr::opts_chunk$set(fig.width = 9)
```

## Introduction

In this post, the [European Flights](https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-12/readme.md) data set is used to illustrate defining a function in [R](https://www.r-project.org/) and applying it to a data frame using map functions from [{purrr}](https://cran.r-project.org/web/packages/purrr/index.html).
The full source for this blog post is [available on GitHub](https://github.com/rnnh/TidyTuesday).

## Setup

Loading the [R](https://www.r-project.org/) libraries and
[data set](https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-07-12/readme.md).

```{r setup}
# Loading libraries
library(tidytuesdayR)
library(tidyverse)
library(tidytext)
library(ggthemes)

# Loading data
tt <- tt_load("2022-07-12")
```

## Defining a function to tidy flight types and applying it with purrr::map

In this section, we want to tidy the different types of flight in the data set by increasing the number of rows and decreasing the number of columns.
For a given airport on a given day, instead of having multiple columns/variables for arrivals, departures and total number of flights, we want to have one column describing the flight type (e.g. arrival or departure) and one column with the value of that flight type/number of flights.
This will give the data set a [tidy structure](https://tidyr.tidyverse.org/articles/tidy-data.html).

```{r map}
# Printing a summary of the flights data frame
tt$flights
# Printing a summary of the shape of the data frame
paste("tt$flights has", nrow(tt$flights), "rows and", ncol(tt$flights),
  "columns.")

# Defining a function to tidy the flights data set
tidy_flights_per_airport <- function(input_flight_type){
  tt$flights %>% 
    # Selecting columns, including the column with the name "input_flight_type"
    ## "all_of()" is used for error handling: if a column with the name matching
    ## "input_flight_type" is not available in tt$flights, the function will return an error
    select(FLT_DATE, APT_NAME, all_of(input_flight_type)) %>% 
    # Adding a "flight_type" column, with "input_flight_type" as a string for each row
    mutate(flight_type = as.character(input_flight_type)) %>% 
    # Renaming the input "input_flight_type" column to "number_of_flights"
    rename("number_of_flights" = input_flight_type)
}

# Selecting column names with flight types (arrivals, departures, total flights)
flight_types <- colnames(tt$flights)[8:13]
# Printing the flight types
flight_types

# Applying the tidying function to the flight types vector using purrr::map()
tidy_flights_list <- map(flight_types, tidy_flights_per_airport)
```

## Binding the tidied flight type rows into a data frame with purrr::map_df

Using the map function in the previous section returned a list of tidied flight types: the "tidy_flights_per_airport()" function was applied to each item in "flight_types" individually, and the resulting tidied flight type was added to "tidy_flights_list".
In this section, the "rbind()" function is applied to "tidy_flights_list" to create a single data frame with all of the tidied flight types.

```{r map_df}
# Binding the tidy version of each flight type by row using purrr::map_df
tidy_flights <- map_df(tidy_flights_list, rbind)

# Printing a summary of the tidy flights data frame
tidy_flights
# Printing a summary of the shape of the data frame
paste("tidy_flights has", nrow(tidy_flights), "rows and", ncol(tidy_flights),
  "columns.")
```

The `tidy_flights` data frame is now in a [tidy format](https://tidyr.tidyverse.org/articles/tidy-data.html).

## Plotting the distribution of arrivals and departures across the top six airports

```{r fig1, fig.cap = "Box plots of daily arrival and depature distribution across top six airports."}
## Selecting the top 6 airports by total number of flights on the latest flight
## date
top_airports <- tidy_flights %>%
  filter(flight_type == "FLT_TOT_1") %>%
  filter(FLT_DATE == max(FLT_DATE)) %>%
  slice_max(order_by = number_of_flights, n = 6)

# Changing "flight_type" to a factor with descriptive levels
tidy_flights$flight_type <- as.factor(tidy_flights$flight_type)
levels(tidy_flights$flight_type) <- c("Arrivals", "Arrivals (Airport Operator)",
  "Departures", "Departures (Airport Operator)", "Total", "Total (Airport Operator")

# Plotting the distribution of arrivals and departures for the top airports
tidy_flights %>%
  filter(APT_NAME %in% top_airports$APT_NAME) %>%
  filter(flight_type %in% c("Arrivals", "Departures")) %>%
  ggplot(aes(x = APT_NAME, y = number_of_flights, colour = flight_type)) +
  geom_boxplot() +
  theme_solarized() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_colour_discrete() +
  labs(title = "Distribution of daily arrivals and depatures across six airports",
    x = "Airport", y = "Flights", colour = "Flight type")
```

## See also

- [Reshaping data using pivot functions](https://tidytuesday.netlify.app/posts/2022-07-05-sf-rents/)