-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path12.ggplot.qmd
148 lines (115 loc) · 6.73 KB
/
12.ggplot.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: "12. ggplot"
author: "Patrick Spauster"
format:
html:
code-copy: true
toc: true
toc-location: right
editor: visual
---
## Video Tutorial
```{=html}
<div style="max-width:608px"><div style="position:relative;padding-bottom:66.118421052632%"><iframe id="kaltura_player" src="https://cdnapisec.kaltura.com/p/1674401/sp/167440100/embedIframeJs/uiconf_id/23435151/partner_id/1674401?iframeembed=true&playerId=kaltura_player&entry_id=1_1z2obv8k&flashvars[streamerType]=auto&flashvars[localizationCode]=en&flashvars[sideBarContainer.plugin]=true&flashvars[sideBarContainer.position]=left&flashvars[sideBarContainer.clickToClose]=true&flashvars[chapters.plugin]=true&flashvars[chapters.layout]=vertical&flashvars[chapters.thumbnailRotator]=false&flashvars[streamSelector.plugin]=true&flashvars[EmbedPlayer.SpinnerTarget]=videoHolder&flashvars[dualScreen.plugin]=true&flashvars[LeadWithHLSOnFlash]=true&flashvars[Kaltura.addCrossoriginToIframe]=true&&wid=1_peanbvbw" width="608" height="402" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" sandbox="allow-downloads allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation" frameborder="0" title="14. ggplot" style="position:absolute;top:0;left:0;width:100%;height:100%"></iframe></div></div>
```
## Preparing Data for ggplot
Data for ggplot needs to be in "long" format, meaning that all the **values** need to be in the same column and all the variables need to be in another column.
Let's use that rent stabilization data to create a plot. In this case the data is wide by year, meaning there is a column for each year with different values. We want a dataset where each year has its own row - where each row is a year-building combination - that we can plot.
```{r warning=F, message=F}
library(tidyverse)
library(janitor)
rent_stab_long <- read_csv("https://taxbillsnyc.s3.amazonaws.com/joined.csv") %>%
select(borough, ucbbl, ends_with("uc")) %>%
pivot_longer(
ends_with("uc"), # The multiple column names we want to mush into one column
names_to = "year", # The title for the new column of names we're generating
values_to = "units" # The title for the new column of values we're generating
)
```
## ggplot syntax
ggplots use similar syntax to regular R operations - they are groups of functions beginning with `ggplot()`. Instead of the pipe, ggplot uses `+` between different functions to build layers on top of each other.
Typically, ggplot will start with `ggplot(dataframe) +` a geometric function like `geom_col()` and an aesthetics or `aes` argument that indicates which variables to plot.
![](images/ggcheatsheet1.jpeg)
![](images/ggcheatsheet2.jpeg)
After that, ggplot has a ton of options to specify the labels, scales, axes, themes, legends,and more. It's best shown through examples
## Sample Line Chart
Line charts are often a good fit for time-series data. Let's summarize the long data by year and borough and count the number of units.
```{r}
rs_long_manhattan_summary <- rent_stab_long %>%
filter(borough %in% c("MN","BK") # Filter only Manhattan and Brooklyn values
& !is.na(units)) %>% # Filter out null unit count values
mutate(year = as.numeric(gsub("uc","", year))) %>% # Remove "uc" from year values
select(year, borough, units) %>%
# Grouping by 2 columns means each row will have a unique pair of the two columns' values.
# Our rows will look like: 2007 MN, 2007 BK, 2008 MN...
group_by(year, borough) %>%
summarise(total_units = sum(units))
```
Now let's plot it. We start with the data, then add aesthetics, our chart type (line), axis specifications, and labels.
```{r}
rs_over_time_graph <- ggplot(rs_long_manhattan_summary) +
# Note these arguments inside 'geom_line' :
geom_line(aes(x=year, y=total_units, color=borough)) +
# Restyle the Y-axis labels:
scale_y_continuous(
limits = c(0,300000),
labels = scales::unit_format(scale = 1/1000, unit="K")) +
scale_x_continuous(breaks = seq(2007, 2017, by = 1))+
# Restyle the Legend:
scale_fill_discrete(
name="Borough",
breaks=c("BK", "MN"),
labels=c("Brooklyn", "Manhattan")) +
labs(
title = "Total Rent Stabilized Units over Time",
subtitle = "Manhattan and Brooklyn, 2007 to 2017",
x = "Year",
y = "Total Rent Stabilized Units",
caption = "Source: taxbills.nyc"
)+
theme_minimal()
rs_over_time_graph
```
## Sample Bar Chart + Faceting
Let's make some bar charts out of the table of \# of children by the languages they speak at home by borough.
We'll learn how we pulled this data in an upcoming lesson, but for now just download [this csv](https://github.com/pspauster/learning_R/blob/master/langs_by_boro.csv). Or run `read_csv("https://raw.githubusercontent.com/pspauster/learning_R/master/langs_by_boro.csv")`
```{r message=F, warning=F}
langs_by_boro_for_graphing <- read_csv("langs_by_boro.csv") %>%
mutate(
labels = # Add a new column with neat category labels
case_when(
variable == 'englishkids' ~ 'English',
variable == 'spanishkids' ~ 'Spanish',
variable == 'indoeurkids' ~ 'Indo-European',
variable == 'apikids' ~ 'Asian & Pacific Islander',
variable == 'otherkids' ~ 'Other Languages'
),
labels = fct_reorder(labels, estimate)
) %>% # Overwrite the variable column of langs_by_boro with a function that tells R to order the variable column by the estimate column when it gets plotted (like a sort). If I wanted it ordered in the other direction I would put estimate inside of desc().
filter(variable != 'totalkids')
```
Produce a horizontal bar chart
```{r}
# Note - this builds progressively; you can run all the code before any + sign as a step toward the final result.
barchart <- ggplot(langs_by_boro_for_graphing) +
aes(x = estimate, y = labels) +
geom_col() +
scale_x_continuous(labels = scales::comma) + # format count labels with commas and thousands
theme_minimal() +
theme(panel.grid.major.y = element_blank()) +
labs(
title = 'What languages do NYC kids speak at home?',
subtitle = 'Number of children ages 5-17 by the language spoken at home',
x = NULL,
y = NULL,
caption = "Source: Census American Community Survey 2020 5-Year Estimates, Table B16007"
)
barchart
```
Now let's facet it by borough
```{r fig.height=6}
barchart +
facet_wrap( ~ name, ncol = 1)
```
## Additional Resources
This [guide from the Urban Institute](https://urbaninstitute.github.io/r-at-urban/graphics-guide.html) has a number of helpful tutorials for how to create a wide number of graphs with ggplot.