11.pivoting.qmd

---
title: "11. Pivoting"
author: "Patrick Spauster"
format:
  html:
    code-copy: true
    toc: true
    toc-location: right
editor: visual
---

## Video Tutorial

```{=html}
<div style="max-width:608px"><div style="position:relative;padding-bottom:66.118421052632%"><iframe id="kaltura_player" src="https://cdnapisec.kaltura.com/p/1674401/sp/167440100/embedIframeJs/uiconf_id/23435151/partner_id/1674401?iframeembed=true&playerId=kaltura_player&entry_id=1_l6mfvz2o&flashvars[streamerType]=auto&amp;flashvars[localizationCode]=en&amp;flashvars[sideBarContainer.plugin]=true&amp;flashvars[sideBarContainer.position]=left&amp;flashvars[sideBarContainer.clickToClose]=true&amp;flashvars[chapters.plugin]=true&amp;flashvars[chapters.layout]=vertical&amp;flashvars[chapters.thumbnailRotator]=false&amp;flashvars[streamSelector.plugin]=true&amp;flashvars[EmbedPlayer.SpinnerTarget]=videoHolder&amp;flashvars[dualScreen.plugin]=true&amp;flashvars[LeadWithHLSOnFlash]=true&amp;flashvars[Kaltura.addCrossoriginToIframe]=true&amp;&wid=1_hktaq1d5" width="608" height="402" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" sandbox="allow-downloads allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation" frameborder="0" title="13. pivoting" style="position:absolute;top:0;left:0;width:100%;height:100%"></iframe></div></div>
```
## "Wide" and "Long" Data

A dataset can be written as wide and long. A wide format contains values that do not repeat in the first column, whereas long format contains values that do repeat in the first column. Imagine keeping statistics for a basket ball game. You could do it two ways:

![](https://www.statology.org/wp-content/uploads/2021/12/wideLong1-1-768x543.png)

Long data is **tidy** - Every column is a **variable** Every row is an **observation**. Every cell is a single **value**.

We can use pivoting to convert data between wide and long formats. Here's a visual for what pivoting does.

![](https://fromthebottomoftheheap.net/assets/img/posts/tidyr-longer-wider.gif)

## Pivoting Longer

For this demo, we want to make a timeline of rent stabilized units in NYC.

So, let's start by loading in some data from https://github.com/talos/nyc-stabilization-unit-counts.

The dataset we want is [here](https://taxbillsnyc.s3.amazonaws.com/joined.csv).

You can save this file in your project folder and read it in how we're used to. Or we can read it directly into our environment, a cool feature of read_csv() that lets you pull directly from a weblink.

```{r}
library(tidyverse)
rent_stab_raw <- read_csv("https://taxbillsnyc.s3.amazonaws.com/joined.csv")
```

This project scraped data from PDFs of property tax documents to get estimates for rent stabilized units counts in buildings across NYC. You can read up on the various field names at the Github project page:

https://github.com/talos/nyc-stabilization-unit-counts#user-content-data-usage.

For this demo, we only want to look at rent stabilized unit counts, which according to the Github doccumentation corresponds to column names that end in "uc". Let's also grab BBL (which is a unique identifier for NYC buildings) and Borough while we're at it:

```{r}
rent_stab <- rent_stab_raw %>% select(borough, ucbbl, ends_with("uc"))

# starts_with(…) and ends_with(…) are neat selector functions to help you 
# grab names that fit a certain pattern 
```

Annoyingly, the data separates unit counts for different years into different columns... to make a timeline, we need all of the yearly data to be stored in one column.

We can use the `pivot_longer` function included in tidyverse to transform our data accordingly. Here is how we apply the \`pivot_longer\` function to our data:

```{r}
rs_long <- rent_stab %>% 
  pivot_longer(
    ends_with("uc"),  # The multiple column names we want to mush into one column
    names_to = "year", # The title for the new column of names we're generating
    values_to = "units" # The title for the new column of values we're generating
  )
```

Now we have data that is "tidy" there is one row for each year for each building. So each observation is a bbl-year pair.

## Pivoting Wider

Additionally, you may have data that is in this "long" format and wish to transform it into the "wide" format we are used to. Luckily, there is an analogous function called \`pivot_wider\` that does just that:

```{r}
rs_wide <- rs_long %>%
  pivot_wider(
  names_from = year, # The current column containing our future column names
  values_from = units # The current column containing the values for our future columns
  )
```