-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathindex.qmd
163 lines (101 loc) · 9.97 KB
/
index.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
# About this book {.unnumbered}
::: callout-note
**This work is under constant revision**. It is written using [Quarto](https://quarto.org/docs/guide/) and the source is available on [Github](https://github.com/utdata/rwdir).
This version is being revised for spring 2025.
:::
Reporting with Data in R is a series of lessons and instructions used for courses in the School of Journalism and Media, Moody College of Communication at the University of Texas at Austin. This book is the brainchild of Associate Professor of Practice [Christian McDonald](https://moody.utexas.edu/faculty/christian-mcdonald), but there are two massive contributors: Assistant Professor [Jo Lukito](https://moody.utexas.edu/faculty/jo-lukito) began collaborating and teaching sections of the Reporting with Data in spring 2022 and Anastasia Goodwin taught a section in fall 2023 and they have both have provided valuable edits to this work. I am indebted to both of them.
## Some words from Prof. McDonald {.unnumbered}
I'm a strong proponent of what I call Scripted Journalism, a method of committing data journalism that is programmatic, repeatable, transparent and annotated. There are a myriad of programming languages that further this, including Python ( with [pandas](https://pandas.pydata.org/) and [Jupyter](https://jupyter.org/)) and JavaScript ([Observable](https://beta.observablehq.com/)), but we'll be using [R](https://www.r-project.org/), [Quarto](https://quarto.org/docs/guide/) and [RStudio](https://www.rstudio.com/).
R is a super powerful, open-source, data-centric programming language that is deep with features and an awesome community of users who build upon it. No matter the challenge before you in your data storytelling, there is probably a package available to help you solve that challenge. Probably more than one.
There is always more than one way to do things in R. This book is a [Tidyverse](https://www.tidyverse.org/)-oriented, opinionated collection of lessons intended to teach students new to programming and R for the expressed act of committing journalism. I consider the audience a beginner, hense I strive to make steps as simple as possible, which means I may not go into detail about alternative (and possibly better) ways to accomplish tasks in favor of staying in the Tidyverse and reducing options to simplify understanding. I rarely discuss differences from base R; Tidyverse is my default.
Programming languages evolve constantly, and R has seen some significant changes in the past few years, many of them driven by Posit, the company that makes Rstudio and maintains the [Tidyverse](https://www.tidyverse.org/) packages.
- The introduction of [Quarto](https://quarto.org/docs/get-started/hello/rstudio.html) in mid-2022. This modern implementation of RMarkdown has driven major changes in how I present data analysis in general and this book specifically, for the better.
- The introduction of the base R pipe `|>` in 2021. Posit developers began using the `|>` in favor of the magrittr pipe `%>%` in 2022, and this book follows their lead. The two implementations work interchangeably and you'll see plenty of `%>%` in the wild.
- The use of YAML code chunk options. I first noticed this style when I started using Quarto in 2023. Adoption of this style will take time, I'm sure, but I'm a fan.
## Conventions and styles in this book {.unnumbered}
I will try to be consistent in the way I write documentation and lessons, but I am human and sometimes I break my own rules. In general, keep the following in mind:
### Things to do {.unnumbered}
Things to DO are in ordered lists:
1. Do this thing.
2. Then do this thing.
Explanations are usually in text, like this very paragraph.
Sometimes details will be explained in lists:
- This is the first thing I want you to know.
- This is the second. You don't have to DO these things, just know about them.
### Code blocks {.unnumbered}
This book often runs the code that is shown, so you'll see the code and the result of that code below it.
```{r}
1 + 1
```
#### Copying code blocks
When you see R code in the instructions, you can roll your cursor over the right-corner and **click on the copy icon** to copy the code clock content to your clipboard.
![](images/index-copy-clipboard.png)
You can then paste the code inside your R chunk.
That said, typing code yourself has many, many benefits. You learn better when you type yourself, make mistakes and have to fix them. **I encourage you to always type short code snippets.** Leave the copying to long ones.
#### Hidden code
Sometimes I want to include code in the book but not display it so you can try the to write the code yourself first. When I do this, it will look like this:
```{r}
#| code-fold: true
#| code-summary: "Click here to show the code"
1 + 1
```
If you click on the triangle or the words that follow, you'll reveal the code. Click again to hide it.
#### Annotated code
Sometimes when I am explaining code it is helpful to match lines of code to the explanation about them, which I do through annotated code.
```{r}
mtcars |> # <1>
head() # <2>
```
1. First I take the Motor Trend Car Road Tests data set AND THEN ...
2. I pipe into the head() command, which gives us the "top" of the data.
When there are annotations like this you have to be careful if you are copying code from the book. Either copy it one line at a time or use the copy icon noted above.
#### Fenced code
Sometimes I need to show code chunk options that are added, like when explaining how to name chunks. In those cases, you may see the code chunk with all the tick marks, etc. like this:
```{r block-named}
#| echo: fenced
1 + 1
```
or
```{r}
#| echo: fenced
#| label: block-named-yaml
1 + 1
```
You can still copy/paste these blocks, but you'll get the entire code block, not just the contents.
### Notes, some important {.unnumbered}
I will use callouts to set off a less important aside:
::: callout
Markdown was developed by JOHN GRUBER, as outlined on his [Daring Fireball blog](https://daringfireball.net/projects/markdown/).
:::
But sometimes those asides are important. I usually indicate that:
::: callout-important
You really should learn how to use [Markdown](https://rmarkdown.rstudio.com/) as you will use it the whole semester, and hopefully for the rest of your life.
:::
## About the authors {.unnumbered}
### Christian McDonald {.unnumbered}
I'm a career journalist who most recently served as data and projects editor at the Austin American-Statesman before joining the University of Texas at Austin faculty full-time in fall 2018. I've taught data-related course at UT since 2013. I also serve as the innovation director of the [Dallas Morning News Journalism Innovation Endowment](https://journalism.utexas.edu/innovation).
- The UT Data Github: [utdata](https://github.com/utdata)
- Threads: [@critmcdonald](https://www.threads.net/@critmcdonald) \| Mastodon [crit](https://newsie.social/@crit) \| Bluesky: @crit
- Email: [christian.mcdonald\@utexas.edu](mailto:[email protected]){.email}
Dr. Lukito is such an important part of this this text, this class and my life I'd like to give her a moment to introduce herself:
### Jo Lukito {.unnumbered}
I'm an aspiring-journalist-turned-academic who studies journalism and digital media. To make a long story short (tl;dr): I trained to be a journalist as an undergraduate student, but just fell in love with **researching and supporting** journalism. I completed my Ph.D in 2020, and my dissertation focused on international trade reporting (which relies on plenty o' data). I also do a ton of social media research (especially in politics and disinformation), so if you're interested in the social media beat, I'm your gal!
- Prof. Jo Lukito's git: [jlukito](https://github.com/jlukito)
- Twitter: [JosephineLukito](https://twitter.com/JosephineLukito)
- Email: [jlukito\@utexas.edu](mailto:[email protected]){.email}
- Website: <https://www.jlukito.com/>
## License {.unnumbered}
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" alt="Creative Commons License" style="border-width:0"/></a><br /><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
Let's just say this is free on the internet. I don't make any money from it and you shouldn't either.
## Other resources {.unnumbered}
This text stands upon the shoulders of giants and by design does not cover all aspects of using R. Here are some other useful books, tutorials and sites dedicated to R. There are other task-specific tutorials and articles sprinkled throughout the book in the Resources section of select chapters.
- [R for Data Science](https://r4ds.hadley.nz/)
- The [Tidyverse](https://www.tidyverse.org/) site, which has tons of documentation and help.
- The [RStudio Cheatsheets](https://posit.co/resources/cheatsheets/).
- [Posit Recipes](https://posit.cloud/learn/recipes) include a lot of code snippets and short tutorials based on tidyverse principles.
- [ggplot2: Elegant Graphics for Data Analysis](https://ggplot2-book.org/index.html)
- [R Graphics Cookbook](https://r-graphics.org/index.html)
- [The R Graph Gallery](https://r-graph-gallery.com/) another place to see examples.
- [Practical R for Journalism](https://www.crcpress.com/Practical-R-for-Mass-Communication-and-Journalism/Machlis/p/book/9781138726918) by Sharon Machlis, an editor with PC World and related publications. Sharon is a longtime proponent of using R in journalism.
- [Sports Data Analysis and Visualization](http://mattwaite.github.io/sports/) and [Data Journalism with R and the Tidyverse](http://mattwaite.github.io/datajournalism/) by Matt Waite, a professor at the University of Nebraska-Lincoln.
- [R for Journalists](http://learn.r-journalism.com/en/) site by Andrew Tran, a reporter at the Washington Post and University of Texas alum. A series of videos and tutorials on using R in a journalism setting.