-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path16.tidycensus.qmd
215 lines (163 loc) · 7.49 KB
/
16.tidycensus.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
---
title: "16. Census Data with tidycensus"
author: "Patrick Spauster"
format:
html:
code-copy: true
toc: true
toc-location: right
editor: visual
---
## Video Tutorial
```{=html}
<div style="max-width:608px"><div style="position:relative;padding-bottom:66.118421052632%"><iframe id="kaltura_player" src="https://cdnapisec.kaltura.com/p/1674401/sp/167440100/embedIframeJs/uiconf_id/23435151/partner_id/1674401?iframeembed=true&playerId=kaltura_player&entry_id=1_csrnp2qh&flashvars[streamerType]=auto&flashvars[localizationCode]=en&flashvars[sideBarContainer.plugin]=true&flashvars[sideBarContainer.position]=left&flashvars[sideBarContainer.clickToClose]=true&flashvars[chapters.plugin]=true&flashvars[chapters.layout]=vertical&flashvars[chapters.thumbnailRotator]=false&flashvars[streamSelector.plugin]=true&flashvars[EmbedPlayer.SpinnerTarget]=videoHolder&flashvars[dualScreen.plugin]=true&flashvars[LeadWithHLSOnFlash]=true&flashvars[Kaltura.addCrossoriginToIframe]=true&&wid=1_g8pwrokb" width="608" height="402" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" sandbox="allow-downloads allow-forms allow-same-origin allow-scripts allow-top-navigation allow-pointer-lock allow-popups allow-modals allow-orientation-lock allow-popups-to-escape-sandbox allow-presentation allow-top-navigation-by-user-activation" frameborder="0" title="16. Tidycensus" style="position:absolute;top:0;left:0;width:100%;height:100%"></iframe></div></div>
```
## Intro to TidyCensus
Tidycensus is incredible powerful and gives you access to a ton of census data. Once you get it down you'll be able to use it to quickly grab a bunch of data.
https://walker-data.com/census-r/an-introduction-to-tidycensus.html
```{r warning=F, message=F}
library(tidycensus)
library(tidyverse)
library(janitor)
```
Request your own API key here: https://api.census.gov/data/key_signup.html
### Install the API key:
```{r}
census_api_key("8524147f6edf7fe4b7c85681397fe5acd6993d62")
```
You can use this key for practicing this demo, but please request your own for your projects and future use.
When you get your own key install it, so you won't have to call this function again with `census_api_key("key", install = TRUE)`
### Browse available variables:
Use `load variables` to browse and find the variables of interest
```{r}
v20 <- load_variables(2020, "acs5", cache = TRUE)
```
AGE BY LANGUAGE SPOKEN AT HOME FOR THE POPULATION 5 YEARS AND OVER
- B16007_001 Estimate!!Total:
- B16007_002 Estimate!!Total:!!5 to 17 years:
- B16007_003 Estimate!!Total:!!5 to 17 years:!!Speak only English
- B16007_004 Estimate!!Total:!!5 to 17 years:!!Speak Spanish
- B16007_005 Estimate!!Total:!!5 to 17 years:!!Speak other Indo-European languages
- B16007_006 Estimate!!Total:!!5 to 17 years:!!Speak Asian and Pacific Island languages
- B16007_007 Estimate!!Total:!!5 to 17 years:!!Speak other languages
## Pull data
I use the [basic usage of tidycensus webpage](https://walker-data.com/tidycensus/articles/basic-usage.html) to find the right argument names to use.
```{r}
langs_by_puma <- get_acs(
geography = "public use microdata area",
variables = c( totalkids = "B16007_002",
englishkids = "B16007_003",
spanishkids = "B16007_004",
indoeurkids = "B16007_005",
apikids = "B16007_006",
otherkids = "B16007_007" ),
state = "New York",
year = 2020, survey = "acs5" ) %>%
filter(str_detect(NAME, "NYC")) %>%
mutate(moeshare = moe / estimate)
```
```{r}
langs_by_boro <- get_acs(
geography = "county",
variables = c(
totalkids = "B16007_002",
englishkids = "B16007_003",
spanishkids = "B16007_004",
indoeurkids = "B16007_005",
apikids = "B16007_006",
otherkids = "B16007_007"
),
state = "New York",
year = 2020,
survey = "acs5"
) %>%
filter(
NAME == "Kings County, New York" |
NAME == "Queens County, New York" |
NAME == "New York County, New York" |
NAME == "Bronx County, New York" |
NAME == "Richmond County, New York"
) %>%
clean_names()
```
Write langs_by_boro as a CSV file into the same folder
```{r}
write_csv(langs_by_boro, 'langs_by_boro.csv')
#this output might look familiar from our ggplot lesson!
```
Tidycensus also has a "wide" option to make calculating percentages, for example, easier.
```{r}
get_acs(
geography = "county",
variables = c(
totalkids = "B16007_002",
englishkids = "B16007_003",
spanishkids = "B16007_004",
indoeurkids = "B16007_005",
apikids = "B16007_006",
otherkids = "B16007_007"
),
state = "New York",
year = 2020,
survey = "acs5",
output = "wide" #look here!
) %>%
filter(
NAME == "Kings County, New York" |
NAME == "Queens County, New York" |
NAME == "New York County, New York" |
NAME == "Bronx County, New York" |
NAME == "Richmond County, New York"
) %>%
clean_names() %>%
mutate(
pct_englishkids = englishkids_e / totalkids_e, #_u is for the estimate and _m is the margin or error
pct_spanishkids = spanishkids_e / totalkids_e,
pct_indoeurkids = indoeurkids_e / totalkids_e,
pct_aapikids = apikids_e / totalkids_e,
pct_otherkids = otherkids_e / totalkids_e
) %>%
select(name, starts_with("pct"))
```
## Crosswalks and matching census geographies
Let's say I wanted to compare a number of statistics by Brooklyn neighborhood. But oh no! The census doesn't provide data at the neighborhood level. Fear not, you can use something called a crosswalk to match data from different geographies. Census data makes it easy - they have standard `GEOID` columns that allow you to match data at different geographies.
Census data is powerful because you can join it with data on whatever you are interested in at almost any geographic level.
First I'll grab some statistics at the tract level in Brooklyn.
```{r}
pop_data <- get_acs(
geography = "tract",
variables = c(total_population = "B01003_001",
median_household_income = "B19019_001",
race_eth_denom = "B03002_001",
white_nonhsp = "B03002_003"
),
year = 2021,
state = "New York",
county = "Kings"
)
```
Then I'll read in this [crosswalk between neighborhoods and tracts](https://data.cityofnewyork.us/City-Government/2020-Census-Tracts-to-2020-NTAs-and-CDTAs-Equivale/hm78-6dwm) from the NYC Open data portal. By joining the two on GEOID I can summarize at the neighborhood level
```{r}
nta_tract <- read_csv("https://data.cityofnewyork.us/resource/hm78-6dwm.csv",
col_types = cols(geoid = col_character())) %>%
filter(boroname == "Brooklyn")
pop_data_joined <- nta_tract %>%
full_join(pop_data, by = c("geoid" = "GEOID"))
pop_data_joined %>%
filter(variable == "total_population") %>%
group_by(ntaname) %>%
summarize(total_population = sum(estimate))
```
What if I just wanted to compare one neighborhood to the rest of Brooklyn? Using the `pull()` function which rips a list from a dataframe variable I can mutate my way to that summary.
```{r}
gowanus_tracts <- nta_tract %>%
mutate(geoid = as.character(geoid)) %>%
filter(ntaname == "Carroll Gardens-Cobble Hill-Gowanus-Red Hook") %>%
pull(geoid)
pop_data_clean <- pop_data %>%
mutate(gowanus = if_else(GEOID %in% gowanus_tracts, "Gowanus", "Rest of BK"))
pop_data_clean %>%
filter(variable == "total_population") %>%
group_by(gowanus) %>%
summarize(total = sum(estimate))
```