-
Notifications
You must be signed in to change notification settings - Fork 40
/
vocabulary.Rmd
191 lines (168 loc) · 7.16 KB
/
vocabulary.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# (APPENDIX) Appendices {-}
# Appendix A: Vocabulary
You will be responsible for knowing the following functions and vocabulary for the weekly quizzes.
## Quiz 1---R Preliminaries
- Grading policies for the course
- Course requirements / policies for in-class quizzes
- Free and open source software
- "Free as in beer" versus "free as in speech"
- Advantages and disadvantages of interpreted languages compared to
"Point-and-click" software
- Advantages and of interpreted languages compared to compiled
languages and assembly languages
- Difference between R and RStudio
- R session
- R console
- Function arguments (including required versus optional arguments)
- Accessing a function's helpfile using `?`
- Mathematical operators: `+`, `-`, `*`, `-`
- R objects and object names
- "gets arrow": `<-`
- Rules and style guidelines for naming objects
- `ls()`
- R scripts
- `#` comment character
- R packages
- CRAN
- Installing packages
- `install.packages()`
- Loading a package
- `library()`
- Types of package documentation: vignettes and helpfiles
- `vignette()`, option `package = `
- Vectors
- `c()`
- Two of the basic classes of vectors: character and numeric
- `class()`
- Square bracket indexing for vectors: `[...]`
- Dataframes
- `tibble()` function from the `dplyr` package
- `select` and `slice` functions to extract values from dataframes
- `read_csv`, option `skip = `
- `str()`
- `summary()`
- `dim()`
- `ncol()`
- `nrow()`
- Nate Silver
- FiveThirtyEight
- `NA` for missing values
- `$` to get a column from a dataframe
- `paste()`, option `sep =`
- `paste0()`
- `=` vs. `<-` for assignment expressions
- `package::function()` notation
## Quiz 2---Entering / cleaning data #1
- What kinds of data can be read into R?
- delimited files (csv, tsv)
- fixed width files
- delimiter
- `read_delim`, options `delim = `, `skip = `, `n_max = `, `col_names = `
- `read_fwf`
- `read_csv`, options `skip = `, `n_max = `, `col_names = `
- `readxl` package and its `read_excel()` function
- `haven` package and its `read_sas()` function
- `NA`
- Computer directory structure
- working directory
- `getwd()`
- `list.files()`
- relative pathnames
- absolute pathnames
- shorthand for pathnames: `.`, `..`, `../data`, etc.
- Reading in data from either a local or online flat file
- `paste()`, option `sep = `
- `paste0()`
- How to read flat files of data that are online directly into R
- `dplyr` package
- `rename()`
- Why you might want to rename column names (e.g., uppercase, long, unusual characters)
- `select()`
- `slice()`
- `mutate()`
- `filter()`
- `arrange()`, including with `desc()`
- `%>%`, advantages of piping
## Quiz 3
- Main types of vector classes in R: character, numeric, factor, date, logical
- `lubridate` functions, including `ymd`, `ymd_hm`, `mdy`, `wday`, and `mday`
- Common logical operators in R (`==`, `!=`, `%in%`, `is.na()`, `&`, `|`)
- `data()` (with and without the name of a dataset as an option)
- `library()` (with and without an argument in the parentheses)
- logical vectors, including running `sum` on a logical vector
- What the bang operator (`!`) does to a logical operator
- The tidyverse
- `min()`
- `max()`
- `mean()`
- `median()`
- `summarize()`
- Special functions to use with `summarize()`: `n()`, `n_distinct()`, `first()`, `last()`
- Using `group_by()` before using `summarize()`
- The three basic elements of a `ggplot` plot: data, aesthetics, and geoms
- `aes` function and common aesthetics, including `color`, `shape`, `x`, `y`, `alpha`, `size`, and `fill`
- Mapping an aesthetic to a column in the data versus setting it to a constant value
- Some common geoms: `geom_histogram`, `geom_points`, `geom_lines`, `geom_boxplot()`
- The difference between "statistical" geoms (e.g., `geom_boxplot`, `geom_smooth`) and "non-statistical" (e.g., `geom_point`, `geom_line`)
- Common additions to `ggplot` objects: `ggtitle`, `labs`, `xlim`, `ylim`, `expand_limits`
## Quiz 4
- Guidelines for good graphics
- Data density / data-to-ink ratio
- Small multiples
- Edward Tufte
- Hadley Wickham
- Where to put the `+` in ggplot statements to avoid problems (ends of lines instead of starts of new lines)
- Can you save a ggplot object as an R object that you can reference later? If so, how would you add elements on to that object? How would you print it when you were ready to print the graph to your RStudio graphics window?
- `geom_hline()`, `geom_vline()`
- `geom_text()`
- `facet_grid()`, `facet_wrap()`
- `grid.arrange()` from the `gridExtra` package
- `ggthemes` package, including `theme_few()` and `theme_tufte()`
- Setting point color for `geom_point()` both as a constant (all points red) and as a way to show the level of a factor for each observation
- `size`, `alpha`, `color`
- Re-naming and re-ordering factors
- **Note:** If you read this and find and bring in an example of a "small multiples" graph (from a newspaper, a website, an academic paper), you can get one extra point on this quiz
## Quiz 5
- Reproducible research, including what it is and advantages to aiming to make your research reproducible
- R style guidelines on variable names, `<-` vs. `=`, line length, spacing, semicolons, commenting, indentation, and code grouping
- Markup languages (concept and examples)
- Basic conventions for Markdown (bold, italics, links, headers, lists)
- Literate programming
- What working directory R uses for code in an .Rmd document
- Basic syntax for RMarkdown chunks, including how to name them
- Options for RMarkdown chunks: `echo`, `eval`, `messages`, `warnings`, `include`, `fig.width`, `fig.height`, `results`
- Difference between global options and chunk options, and which takes precendence
- What inline code is and how to write it in RMarkdown
- How to set global options
- Why style is important in coding
- RPubs
## Quiz 6
- Three characteristics of tidy data
- Five common problems with tidy data and how to resolve them (make sure you understand the examples shown, which you can find out more about in the Hadley Wickham paper I reference in the slides)
- `group_by` with `mutate`, `slice`, and `arrange`
- `separate` and `unite`
- `pivot_longer` and `pivot_wider`
- The `*_join` family of functions (`left_join`, `right_join`, `inner_join`, `full_join`, `anti_join`, `semi_join`)
## Quiz 7
- regular expressions
- difference in the output between `str_extract` and `str_detect`
- `str_to_lower`, `str_to_upper`, `str_to_title`, `str_trim`
- lists
- indexing from lists (`[[` and `$`)
- exploring lists (`class`, `names`, `str` functions)
- exploring lists with `jsonedit` from the `listviewer` package
- `tidy` function from `broom` applied to the output from a statistical test function
- `forcats` package, including `fct_recode`, `fct_infreq`, `fct_reorder`, and `fct_lump`
- Bioconductor
- "accessor" functions for Bioconductor (e.g., `get_variable`)
- `select` with `starts_with`, `select_if`
## Quiz 8
- Review of previous weeks (especially tidyverse packages: `dplyr`, `tidyr`, `ggplot2`,
`stringr`, and `forcats`)
- Writing a formula with `~` syntax
- Regression modeling with `lm`, `glm`
- Using functions from `broom` to tidy model output (`augment`, `tidy`, `glance`)
## Quiz 9
- Nesting with `group_by` and `nest` and unnesting with `unnest`
- A list-column in a dataframe
- Review of previous quizzes