diff --git a/modules/Factors/lab/Factors_Lab.Rmd b/modules/Factors/lab/Factors_Lab.Rmd index 5d301e82..eb02060c 100644 --- a/modules/Factors/lab/Factors_Lab.Rmd +++ b/modules/Factors/lab/Factors_Lab.Rmd @@ -13,17 +13,20 @@ library(tidyverse) ### 1.0 -Load the Youth Tobacco Survey data and `select` "Sample_Size", "Education", and "LocationAbbr". Name this data "yts". +Load the CalEnviroScreen dataset and use `select` to choose the `CaliforniaCounty`, `ImpWaterBodies`, and `ZIP` variables. Then subset this data using `filter` to include only the California counties Napa and San Francisco. Name this data "ces". + +`ImpWaterBodies`: measure of the number of pollutants across all impaired water bodies within a given distance of populated areas. ```{r} -yts <- - read_csv("https://daseh.org/data/Youth_Tobacco_Survey_YTS_Data.csv") %>% - select(Sample_Size, Education, LocationAbbr) +ces <- + read_csv("https://daseh.org/data/CalEnviroScreen_data.csv") %>% + select(CaliforniaCounty, ImpWaterBodies, ZIP) %>% + filter(CaliforniaCounty == c("Amador", "Napa", "Ventura", "San Francisco")) ``` ### 1.1 -Create a boxplot showing the difference in "Sample_Size" between Middle School and High School "Education". **Hint**: Use `aes(x = Education, y = Sample_Size)` and `geom_boxplot()`. +Create a boxplot showing the difference in groundwater contamination threats (`ImpWaterBodies`) among Amador, Napa, San Francisco, and Ventura counties (`CaliforniaCounty`). **Hint**: Use `aes(x = CaliforniaCounty, y = ImpWaterBodies)` and `geom_boxplot()`. ```{r 1.1response} @@ -31,7 +34,7 @@ Create a boxplot showing the difference in "Sample_Size" between Middle School a ### 1.2 -Use `count` to count up the number of observations of data for each "Education" group. +Use `count` to count up the number of observations of data for each `CaliforniaCounty` group. ```{r 1.2response} @@ -39,7 +42,7 @@ Use `count` to count up the number of observations of data for each "Education" ### 1.3 -Make "Education" a factor using the `mutate` and `factor` functions. Use the `levels` argument inside `factor` to reorder "Education". Reorder this variable so that "Middle School" comes before "High School". Assign the output the name "yts_fct". +Make `CaliforniaCounty` a factor using the `mutate` and `factor` functions. Use the `levels` argument inside `factor` to reorder `CaliforniaCounty`. Reorder this variable so the order is now San Francisco, Ventura, Napa, and Amador. Assign the output the name "ces_fct". ```{r 1.3response} @@ -47,7 +50,7 @@ Make "Education" a factor using the `mutate` and `factor` functions. Use the `le ### 1.4 -Repeat question 1.1 and 1.2 using the "yts_fct" data. You should see different ordering in the plot and `count` table. +Repeat question 1.1 and 1.2 using the "ces_fct" data. You should see different ordering in the plot and `count` table. ```{r 1.4response} @@ -57,8 +60,7 @@ Repeat question 1.1 and 1.2 using the "yts_fct" data. You should see different o # Practice on Your Own! ### P.1 - -Convert "LocationAbbr" (state) in "yts_fct" into a factor using the `mutate` and `factor` functions. Do not add a `levels =` argument. +Subset `ces_fct` so that it only includes data from Ventura county. Then convert `ZIP` (zip code) into a factor using the `mutate` and `factor` functions. Do not add a `levels =` argument. ```{r P.1response} @@ -66,11 +68,11 @@ Convert "LocationAbbr" (state) in "yts_fct" into a factor using the `mutate` and ### P.2 -We want to create a new column that contains the group-level median sample size. +We want to create a new column that contains the group-level median values for `ImpWaterBodies`. -- Using the "yts_fct" data, `group_by` "LocationAbbr". -- Then, use `mutate` to create a new column "med_sample_size" that is the median "Sample_Size". -- **Hint**: Since you have already done `group_by`, a median "Sample_Size" will automatically be created for each unique level in "LocationAbbr". Use the `median` function with `na.rm = TRUE`. +- Using the "ces_Ventura" data, group the data by `ZIP` using `group_by` +- Then, use `mutate` to create a new column `med_ImpWaterBodies` that is the median of `ImpWaterBodies`. +- **Hint**: Since you have already done `group_by`, a median `ImpWaterBodies` will automatically be created for each unique level in `ZIP`. Use the `median` function with `na.rm = TRUE`. ```{r P.2response} @@ -78,18 +80,18 @@ We want to create a new column that contains the group-level median sample size. ### P.3 -We want to plot the "LocationAbbr" (state) by the "med_sample_size" column we created above. Using the `forcats` package, create a plot that: +We want to make a plot of the `med_ImpWaterBodies` column we created above in the `ces_Ventura`, separated by `ZIP`. Using the `forcats` package, create a plot that: -- Has "LocationAbbr" on the x-axis -- Uses the `mapping` argument and the `fct_reorder` function to order the x-axis by "med_sample_size" -- Has "Sample_Size" on the y-axis +- Has `ZIP` on the x-axis +- Uses the `mapping` argument and the `fct_reorder` function to order the x-axis by `med_ImpWaterBodies` +- Has `med_ImpWaterBodies` on the y-axis - Is a boxplot (`geom_boxplot`) -- Has the x axis label of `State` +- Has the x axis label of "Zipcode" (Don't worry if you get a warning about not being able to plot `NA` values.) Save your plot using `ggsave()` with a width of 10 and height of 3. -Which state has the largest median sample size? +Which zipcode has the largest median measure of water pollution? ```{r P.3response} diff --git a/modules/Factors/lab/Factors_Lab_Key.Rmd b/modules/Factors/lab/Factors_Lab_Key.Rmd index 1dd97b8c..75721109 100644 --- a/modules/Factors/lab/Factors_Lab_Key.Rmd +++ b/modules/Factors/lab/Factors_Lab_Key.Rmd @@ -13,114 +13,118 @@ library(tidyverse) ### 1.0 -Load the Youth Tobacco Survey data and `select` "Sample_Size", "Education", and "LocationAbbr". Name this data "yts". +Load the CalEnviroScreen dataset and use `select` to choose the `CaliforniaCounty`, `ImpWaterBodies`, and `ZIP` variables. Then subset this data using `filter` to include only the California counties Napa and San Francisco. Name this data "ces". + +`ImpWaterBodies`: measure of the number of pollutants across all impaired water bodies within a given distance of populated areas. ```{r} -yts <- - read_csv("https://daseh.org/data/Youth_Tobacco_Survey_YTS_Data.csv") %>% - select(Sample_Size, Education, LocationAbbr) +ces <- + read_csv("https://daseh.org/data/CalEnviroScreen_data.csv") %>% + select(CaliforniaCounty, ImpWaterBodies, ZIP) %>% + filter(CaliforniaCounty == c("Amador", "Napa", "Ventura", "San Francisco")) ``` ### 1.1 -Create a boxplot showing the difference in "Sample_Size" between Middle School and High School "Education". **Hint**: Use `aes(x = Education, y = Sample_Size)` and `geom_boxplot()`. +Create a boxplot showing the difference in groundwater contamination threats (`ImpWaterBodies`) among Amador, Napa, San Francisco, and Ventura counties (`CaliforniaCounty`). **Hint**: Use `aes(x = CaliforniaCounty, y = ImpWaterBodies)` and `geom_boxplot()`. ```{r 1.1response} -yts %>% - ggplot(mapping = aes(x = Education, y = Sample_Size)) + +ces %>% + ggplot(mapping = aes(x = CaliforniaCounty, y = ImpWaterBodies)) + geom_boxplot() ``` ### 1.2 -Use `count` to count up the number of observations of data for each "Education" group. +Use `count` to count up the number of observations of data for each `CaliforniaCounty` group. ```{r 1.2response} -yts %>% - count(Education) +ces %>% + count(CaliforniaCounty) ``` ### 1.3 -Make "Education" a factor using the `mutate` and `factor` functions. Use the `levels` argument inside `factor` to reorder "Education". Reorder this variable so that "Middle School" comes before "High School". Assign the output the name "yts_fct". +Make `CaliforniaCounty` a factor using the `mutate` and `factor` functions. Use the `levels` argument inside `factor` to reorder `CaliforniaCounty`. Reorder this variable so the order is now San Francisco, Ventura, Napa, and Amador. Assign the output the name "ces_fct". ```{r 1.3response} -yts_fct <- - yts %>% mutate(Education = factor(Education, - levels = c("Middle School", "High School") +ces_fct <- + ces %>% mutate(CaliforniaCounty = factor(CaliforniaCounty, + levels = c("San Francisco", "Ventura", "Napa", "Amador") )) ``` ### 1.4 -Repeat question 1.1 and 1.2 using the "yts_fct" data. You should see different ordering in the plot and `count` table. +Repeat question 1.1 and 1.2 using the "ces_fct" data. You should see different ordering in the plot and `count` table. ```{r 1.4response} -yts_fct %>% - ggplot(mapping = aes(x = Education, y = Sample_Size)) + +ces_fct %>% + ggplot(mapping = aes(x = CaliforniaCounty, y = ImpWaterBodies)) + geom_boxplot() -yts_fct %>% - count(Education) +ces_fct %>% + count(CaliforniaCounty) ``` # Practice on Your Own! ### P.1 - -Convert "LocationAbbr" (state) in "yts_fct" into a factor using the `mutate` and `factor` functions. Do not add a `levels =` argument. +Subset `ces_fct` so that it only includes data from Ventura county. Then convert `ZIP` (zip code) into a factor using the `mutate` and `factor` functions. Do not add a `levels =` argument. ```{r P.1response} -yts_fct <- yts_fct %>% mutate(LocationAbbr = factor(LocationAbbr)) +ces_Ventura <- ces_fct %>% + filter(CaliforniaCounty == "Ventura") %>% + mutate(ZIP = factor(ZIP)) ``` ### P.2 -We want to create a new column that contains the group-level median sample size. +We want to create a new column that contains the group-level median values for `ImpWaterBodies`. -- Using the "yts_fct" data, `group_by` "LocationAbbr". -- Then, use `mutate` to create a new column "med_sample_size" that is the median "Sample_Size". -- **Hint**: Since you have already done `group_by`, a median "Sample_Size" will automatically be created for each unique level in "LocationAbbr". Use the `median` function with `na.rm = TRUE`. +- Using the "ces_Ventura" data, group the data by `ZIP` using `group_by` +- Then, use `mutate` to create a new column `med_ImpWaterBodies` that is the median of `ImpWaterBodies`. +- **Hint**: Since you have already done `group_by`, a median `ImpWaterBodies` will automatically be created for each unique level in `ZIP`. Use the `median` function with `na.rm = TRUE`. ```{r P.2response} -yts_fct <- yts_fct %>% - group_by(LocationAbbr) %>% - mutate(med_sample_size = median(Sample_Size, na.rm = TRUE)) +ces_Ventura <- ces_Ventura %>% + group_by(ZIP) %>% + mutate(med_ImpWaterBodies = median(ImpWaterBodies, na.rm = TRUE)) ``` ### P.3 -We want to plot the "LocationAbbr" (state) by the "med_sample_size" column we created above. Using the `forcats` package, create a plot that: +We want to make a plot of the `med_ImpWaterBodies` column we created above in the `ces_Ventura`, separated by `ZIP`. Using the `forcats` package, create a plot that: -- Has "LocationAbbr" on the x-axis -- Uses the `mapping` argument and the `fct_reorder` function to order the x-axis by "med_sample_size" -- Has "Sample_Size" on the y-axis +- Has `ZIP` on the x-axis +- Uses the `mapping` argument and the `fct_reorder` function to order the x-axis by `med_ImpWaterBodies` +- Has `med_ImpWaterBodies` on the y-axis - Is a boxplot (`geom_boxplot`) -- Has the x axis label of `State` +- Has the x axis label of "Zipcode" (Don't worry if you get a warning about not being able to plot `NA` values.) Save your plot using `ggsave()` with a width of 10 and height of 3. -Which state has the largest median sample size? +Which zipcode has the largest median measure of water pollution? ```{r P.3response} library(forcats) -yts_fct_plot <- yts_fct %>% +ces_Ventura_plot <- ces_Ventura %>% drop_na() %>% ggplot(mapping = aes( x = fct_reorder( - LocationAbbr, med_sample_size + ZIP, med_ImpWaterBodies ), - y = Sample_Size + y = med_ImpWaterBodies )) + geom_boxplot() + - labs(x = "State") + labs(x = "Zipcode") ggsave( - filename = "yts_fct.png", # will save in working directory - plot = yts_fct_plot, + filename = "ces_Ventura.png", # will save in working directory + plot = ces_Ventura_plot, width = 10, height = 3 ) ``` diff --git a/resources/dictionary.txt b/resources/dictionary.txt index 7f29b493..1d90a257 100644 --- a/resources/dictionary.txt +++ b/resources/dictionary.txt @@ -1,309 +1,614 @@ -aes -airquality -al -Alameda -AlexsLemonade -Alightings -Altmann -anewma -annualDosage -anonymize -ava -ay -ayeimanol -baltimore -bday -BikeBaltimore -bioinformatics -Biostatistics -Biostats -Birla -birthweight -boardings -Boardings -Bonferroni -bou -bp -BSPH -bw -byquarter -calenviroscreen -CalEnviroScreen -CDS -CEBS -CES -ces -CESPctlMoreThan -ChatGPT -cheatsheet -circ -Circulator -claragranell -Claremont -Clif -climatological -Cmd -CMD -codeexample -codesmall -collegial -CoursePlus -CoV -covid -Covid -cran -csavone -css -csv -Ctrl -ctrl -custimization -customizable -cwrigh -DaSEH -DaSEH's -daseh -dasehr -Dataquest -Dayananda -de -DenverSummerHeat -DESeq -df -Diedrich -dieselPM -DieselPM -dotdash -dplyr -dropdown -dropdowns -econd -edu -Epi -ERs -esc -esquisse -Esquisse -et -EthicsPoint -eval -exe -FALSEs -FASAP -fct -Fewings -fhdsl -Fleetwood -forcats -fredhutch -freepik -fwang -Gardiner -gdp -geoms -Gerd -gg -ggplot -ggpubr -ggthemes -github -GitHub -HAA -Hadley -Haloacetic -haloacetic -healthdata -hoffman -Hotline -hrbr -http -HTTPS -https -Humphries -HW -hydrocodone -HydroShare -ide -ifelse -Ihaka -IMG -inclusivity -inute -io -Irizarry -IsBadBuy -IsOnlineSale -it's -JHED -jhmi -JHSAP -JHSPH -JHU -jhu -jhur -jitter -Juneteenth -Kaggle -knitr -Lawlor -Leanpub -lefthand -lightblue -lightgreen -Linetype -lmfit -LocationAbbr -logfit -longdash -LowBirthWeight -LowBirthWeightPctl -lubridate -macosx -McKenna -mclaire -MHS -microplastics -misspecification -Moderna -MonthlyPrice -mort -MSc -mtcars -na -naniar -NCDC -NCEI -nd -NIEHS -Nissans -NISSANs -nizovatina -NOAA -NOAA's -NonCommercial -nonconfidential -NWSS -obert -ocs -OCSdata -olumn -Ombudsperson -onth -opencasestudies -opioid -oss -ot -OTTR -oxycodone -Padmashri -pch -Pctl -Pilani -Pixabay -PLoS -plotly -png -Posit's -POSIXct -pre -programmatically -psarava -psuedo -px -quartiles -Rcmdr -Rdata -rds -readr -recode -Recode -recoding -Recoding -REDCap -reddit -relevel -Replicability -Replicable -reportee -Reportee -REpro -reproducibility -Reproducibility -rladies -rmarkdown -RMarkdown -Rmd -Rproj -rpubs -rstudio -RStudio -RStudio's -Rtools -Rupshikha -Sagar -Saravanan -SARS -Savonen -sessionInfo -setosa -ShareAlike -skimr -SRC -StackOverflow -Stata -stringr -subclasses -Subclasses -subdir -Substringing -summarization -Summarization -superfund -tabset -TAs -th -ThemePark -thon -tibble -tidycensus -tidyr -tidyselect -tidyverse -Tidyverse -Toxicogenomics -TrafficPctl -TRUEs -TSV -twodash -ug -UHS -Un -Uncheck -Ungroup -Unsplash -uri -UseR -useR -vacc -vailable -varepsilon -vectorize -Veh -VehBCost -VehicleAge -VehOdo -VehYear -wastewater -wb -WhitePerc -Wickham -Wickham's -wikipedia -Wilk -www -xls -xlsx -XLSX -youtube -yts +aes +airquality +al +AlexsLemonade +Alightings +Altmann +Amador +anewma +annualDosage +anonymize +ava +ay +ayeimanol +baltimore +bday +BikeBaltimore +bioinformatics +Biostatistics +Biostats +Birla +birthweight +boardings +Boardings +Bonferroni +bou +bp +BSPH +bw +byquarter +calenviroscreen +CalEnviroScreen +calenviroscreen +CDS +CES +ces +CESPctlMoreThan +ChatGPT +cheatsheet +circ +Circulator +claragranell +Claremont +Clif +Cmd +CMD +codeexample +codesmall +collegial +CoursePlus +covid +Covid +CoV +cran +csavone +css +csv +Ctrl +ctrl +custimization +customizable +cwrigh +daseh +dasehr +DaSEH +DaSEH's +Dataquest +Dayananda +de +DenverSummerHeat +DESeq +df +Diedrich +dieselPM +DieselPM +dotdash +dplyr +dropdown +dropdowns +econd +edu +Epi +ERs +esc +esquisse +Esquisse +et +EthicsPoint +eval +exe +FALSEs +FASAP +fct +Fewings +fhdsl +Fleetwood +forcats +fredhutch +freepik +fwang +Gardiner +gdp +geoms +Gerd +gg +ggplot +ggpubr +ggthemes +github +GitHub +GitHub +HAA +Hadley +Haloacetic +haloacetic +healthdata +hoffman +Hotline +hrbr +http +HTTPS +https +Humphries +HW +hydrocodone +ide +ifelse +Ihaka +IMG +inclusivity +inute +io +Irizarry +IsBadBuy +IsOnlineSale +JHED +jhmi +JHSAP +JHSPH +JHU +jhu +jhur +jitter +Juneteenth +Kaggle +knitr +Lawlor +Leanpub +lefthand +lightblue +lightgreen +Linetype +lmfit +LowBirthWeight +LowBirthWeightPctl +LocationAbbr +logfit +longdash +lubridate +macosx +McKenna +mclaire +MHS +microplastics +misspecification +Moderna +MonthlyPrice +mort +MSc +mtcars +na +naniar +nd +Napa +Nissans +NISSANs +nizovatina +NonCommercial +nonconfidential +NWSS +obert +ocs +OCSdata +oldname +olumn +Ombudsperson +onth +opencasestudies +opioid +oss +ot +OTTR +oxycodone +Padmashri +pch +Pctl +Pilani +Pixabay +PLoS +plotly +png +POSIXct +pre +programmatically +psarava +px +quartiles +Rcmdr +Rdata +rds +readr +recode +Recode +recoding +Recoding +REDCap +reddit +relevel +Replicability +Replicable +reportee +Reportee +REpro +REpro +reproducibility +Reproducibility +rladies +rmarkdown +RMarkdown +Rmd +Rproj +rpubs +rstudio +RStudio +RStudio's +Rtools +Rupshikha +Sagar +Saravanan +SARS +Savonen +sessionInfo +setosa +sessionInfo +ShareAlike +skimr +SRC +StackOverflow +Stata +stringr +subclasses +Subclasses +subdir +Substringing +summarization +Summarization +tabset +TAs +th +ThemePark +thon +tibble +tidyr +tidyselect +tidyverse +Tidyverse +TrafficPctl +TRUEs +TSV +twodash +UHS +ug +Un +Uncheck +Ungroup +uri +Unsplash +UseR +useR +vacc +vailable +varepsilon +vectorize +Veh +VehBCost +VehicleAge +VehOdo +VehYear +Ventura +wastewater +wb +WhitePerc +Wickham +Wickham's +wikipedia +Wilk +www +xls +xlsx +XLSX +youtube +yts +YTS +zipcode +Zipcode +======= +aes +airquality +al +Alameda +AlexsLemonade +Alightings +Altmann +anewma +annualDosage +anonymize +ava +ay +ayeimanol +baltimore +bday +BikeBaltimore +bioinformatics +Biostatistics +Biostats +Birla +birthweight +boardings +Boardings +Bonferroni +bou +bp +BSPH +bw +byquarter +calenviroscreen +CalEnviroScreen +CDS +CEBS +CES +ces +CESPctlMoreThan +ChatGPT +cheatsheet +circ +Circulator +claragranell +Claremont +Clif +climatological +Cmd +CMD +codeexample +codesmall +collegial +CoursePlus +CoV +covid +Covid +cran +csavone +css +csv +Ctrl +ctrl +custimization +customizable +cwrigh +DaSEH +DaSEH's +daseh +dasehr +Dataquest +Dayananda +de +DenverSummerHeat +DESeq +df +Diedrich +dieselPM +DieselPM +dotdash +dplyr +dropdown +dropdowns +econd +edu +Epi +ERs +esc +esquisse +Esquisse +et +EthicsPoint +eval +exe +FALSEs +FASAP +fct +Fewings +fhdsl +Fleetwood +forcats +fredhutch +freepik +fwang +Gardiner +gdp +geoms +Gerd +gg +ggplot +ggpubr +ggthemes +github +GitHub +HAA +Hadley +Haloacetic +haloacetic +healthdata +hoffman +Hotline +hrbr +http +HTTPS +https +Humphries +HW +hydrocodone +HydroShare +ide +ifelse +Ihaka +IMG +inclusivity +inute +io +Irizarry +IsBadBuy +IsOnlineSale +it's +JHED +jhmi +JHSAP +JHSPH +JHU +jhu +jhur +jitter +Juneteenth +Kaggle +knitr +Lawlor +Leanpub +lefthand +lightblue +lightgreen +Linetype +lmfit +LocationAbbr +logfit +longdash +LowBirthWeight +LowBirthWeightPctl +lubridate +macosx +McKenna +mclaire +MHS +microplastics +misspecification +Moderna +MonthlyPrice +mort +MSc +mtcars +na +naniar +NCDC +NCEI +nd +NIEHS +Nissans +NISSANs +nizovatina +NOAA +NOAA's +NonCommercial +nonconfidential +NWSS +obert +ocs +OCSdata +olumn +Ombudsperson +onth +opencasestudies +opioid +oss +ot +OTTR +oxycodone +Padmashri +pch +Pctl +Pilani +Pixabay +PLoS +plotly +png +Posit's +POSIXct +pre +programmatically +psarava +psuedo +px +quartiles +Rcmdr +Rdata +rds +readr +recode +Recode +recoding +Recoding +REDCap +reddit +relevel +Replicability +Replicable +reportee +Reportee +REpro +reproducibility +Reproducibility +rladies +rmarkdown +RMarkdown +Rmd +Rproj +rpubs +rstudio +RStudio +RStudio's +Rtools +Rupshikha +Sagar +Saravanan +SARS +Savonen +sessionInfo +setosa +ShareAlike +skimr +SRC +StackOverflow +Stata +stringr +subclasses +Subclasses +subdir +Substringing +summarization +Summarization +superfund +tabset +TAs +th +ThemePark +thon +tibble +tidycensus +tidyr +tidyselect +tidyverse +Tidyverse +Toxicogenomics +TrafficPctl +TRUEs +TSV +twodash +ug +UHS +Un +Uncheck +Ungroup +Unsplash +uri +UseR +useR +vacc +vailable +varepsilon +vectorize +Veh +VehBCost +VehicleAge +VehOdo +VehYear +wastewater +wb +WhitePerc +Wickham +Wickham's +wikipedia +Wilk +www +xls +xlsx +XLSX +youtube +yts YTS \ No newline at end of file