diff --git a/_bookdown.yml b/_bookdown.yml
index 0185d18c..55035535 100644
--- a/_bookdown.yml
+++ b/_bookdown.yml
@@ -3,5 +3,5 @@ language:
ui:
chapter_name: "Chapter "
delete_merged_file: true
-output_dir: "docs/v1"
+output_dir: "docs"
new_session: yes
diff --git a/docs/01-install.md b/docs/01-install.md
new file mode 100644
index 00000000..7a72dbd2
--- /dev/null
+++ b/docs/01-install.md
@@ -0,0 +1,46 @@
+# Install Party {#install}
+
+Let's get this party started.
+
+> NOTE: R and RStudio are already installed on lab computers.
+
+## Mac vs PC
+
+I use a Mac in class and in my examples. I'm a big fan of using keyboard commands to do operations in any program, but I reference this from a Mac perspective. So if I say use *Cmd+S* or *Command+S* to save, that might be *Cntl+S* or *Control+S* on a PC. The letters may not be the same on a PC, but you can usually figure it out by look at menu items in RStudio to figure out the PC command.
+
+We will install R and RStudio. It might take some time depending on your Internet connection.
+
+**If you are doing this on your own** you might follow [this tutorial](https://learnr-examples.shinyapps.io/ex-setup-r/). But below you'll find the basic steps.
+
+## Installing R
+
+Our first task is to install the R programming language onto your computer.
+
+- Go to the .
+- Click on the link for your operating system.
+- The following steps will differ slightly based on your operating system.
+ + For Macs, you want the "latest package" unless you have an "M1" Mac (Nov. 2020 or newer), in which case choose the **arm64.pkg** version.
+ + For Windows, you want the "base" package. You'll need to decide whether you want the 32- or 64-bit version. (Unless you've got a pretty old system, chances are you'll want 64-bit.)
+
+Here's hoping it will be self explanatory after that.
+
+You'll never "launch" R as a program in a traditional sense, but you need it on your computers. We'll use RStudio, which is next.
+
+## Installing RStudio
+
+[RStudio](https://www.rstudio.com/) is an "integrated development environment" -- or IDE -- for programming in R. Basically, it's the program you will use when doing work for this class.
+
+- Go to .
+- Scroll down to the versions and find **RStudio Desktop** and click on the **Download** button.
+- It should take you down the page to the version you need for your computer.
+- Install it. Should be like installing any other program on your computer.
+
+## Class project folder
+
+To keep things consistent and help with troubleshooting, I'd like you to save your work in the same location all the time.
+
+- On both Mac and Windows, every user has a "Documents" folder. Open that folder. (If you don't know where it is, ask me to help you find it.)
+- Create a new folder called "rwd". Use all lowercase letters.
+
+When we create new "Projects", I want you to always save them in the `Documents/rwd` folder. This just keeps us all on the same page.
+
diff --git a/docs/02-intro.md b/docs/02-intro.md
new file mode 100644
index 00000000..e1fe52cb
--- /dev/null
+++ b/docs/02-intro.md
@@ -0,0 +1,206 @@
+# Introduction to R {#intro}
+
+## RStudio tour
+
+When you launch RStudio, you'll get a screen that looks like this:
+
+![RStudio launch screen](images/intro-start.png)
+
+## Updating preferences
+
+There is a preference in RStudio that I would like you to change. By default, the program wants to save a the state of your work (all the variables and such) when you close a project, but that is not good practice. We'll change that.
+
+1. Go to the **RStudio** menu and choose **Preferences**
+1. Under the **General** tab, uncheck the first four boxes.
+1. On the option "Save Workspace to .Rdata on exit", change that to **Never**.
+1. Click *OK* to close the box.
+
+![RStudio preferences](images/rstudio-prefs.png)
+
+## Starting a new Project
+
+When we work in RStudio, we will create "Projects" to hold all the files related to one another. This sets the "working directory", which is a sort of home base for the project.
+
+1. Click on the second button that has a green `+R` sign.
+1. That brings up a box to create the project with several options. You want **New Directory** (unless you already have a Project directory, which you don't for this.)
+1. For **Project Type**, choose **New Project**.
+1. Next, for the **Directory name**, choose a new name for your project folder. For this project, use "firstname-first-project" but use YOUR firstname.
+1. For the subdirectory, you want to use the **Browse** button to find your new `rwd` folder we created earlier.
+
+I want you to be anal about naming your folders. It's a good programming habit.
+
+- Use lowercase characters.
+- Don't use spaces. Use dashes.
+- For this class, start with your first name.
+
+![Rstudio project name, directory](images/intro-newproject.png)
+
+When you hit **Create Project**, your RStudio window will refresh and you'll see the `yourfirstname-first-project.Rproj` file in your Files list.
+
+## Using R Notebooks
+
+For this class, we will almost always use [RNotebooks](https://rmarkdown.rstudio.com/lesson-10.html). This format allows us to write text in between our blocks of code. The text is written in a language called [RMarkdown](https://rmarkdown.rstudio.com/lesson-1.html), a juiced-up version of the common documentation syntax used by programmers, Markdown. We'll learn that in a moment.
+
+### Create your first notebook
+
+1. Click on the button at the top-left of RStudio that has just the green `+` sign.
+1. Choose the item **R Notebook**.
+
+This will open a new file with some boilerplate R Markdown code.
+
+1. At the top between the `---` marks, is the **metadata**. This is written using YAML, and what is inside are commands for the R Notebook. Don't sweat the YAML syntax too much right now, as we won't be editing it often.
+1. Next, you'll see a couple of paragraphs of text that describes how to use an RNotebook. It is written in RMarkdown, and has some inline links and bold commands, which you will learn,
+1. Then you will see an R code chunk that looks like the figure below.
+
+![R code chunk](images/intro-rcodechunk.png)
+
+Let's take a closer look at this:
+
+- The three back tick characters (the key found at the top left on your keyboard) followed by the `{r}` indicate that this is a chunk of R code. The last three back ticks say the code chunk is over.
+- The `{r}` bit can have some parameters added to it. We'll get into that later.
+- The line `plot(cars)` is R programming code. We'll see what those commands do in a bit.
+- The green right-arrow to the far right is a play button to run the code that is inside the chunk.
+- The green down-arrow and bar to the left of that runs all the code in the Notebook up to that point. That is useful as you make changes in your code and want to rerun what is above the chunk in question.
+
+### Save the .Rmd file
+
+1. Do *Cmd+S* or hit the floppy disk icon to save the file.
+1. It will ask you what you want to name this file. Call it `01-first-file.Rmd`.
+
+When you do this, you may see another new file created in your Files directory. It's the pretty version of the notebook which we'll see in a minute.
+
+In the metadata portion of the file, give your notebook a better title.
+
+1. Replace "R Notebook" in the `title: "R Notebook"` code to be "Christian's first notebook", but use your name.
+
+### Run the notebook
+
+There is only one chunk to run in this notebook, so:
+
+1. Click on the green right-arrow to run the code. The keyboard command (from somewhere within the chunk) is *Cmd+Shift+Return*.
+
+You should get something like this:
+
+![Cars plot](images/intro-defaultplot.png)
+
+What you've done here is create a plot chart of a piece of sample data that is already inside R. (FWIW, It is the speed of cars and the distances taken to stop. Note that the data were recorded in the 1920s.)
+
+But that wasn't a whole lot of code to see there is a relationship with speed vs stopping distance, eh?
+
+This is a "base R" plot. We'll be using the tidyverse ggplot methods later in the semester.
+
+### A note about RMarkdown
+
+We always want to annotate our code to explain what we are doing. To do that, we use a syntax called [RMarkdown](https://rmarkdown.rstudio.com/authoring_basics.html), which is an R-specific version of Markdown. We use this syntax because it both makes sense in text but also makes a very pretty version in HTML when we "knit" our project. You can see how it to [write RMarkdown here](https://rmarkdown.rstudio.com/authoring_basics.html).
+
+This entire book is written in RMarkdown.
+
+Here is an example:
+
+```rmarkdown
+## My dating age
+
+The following section details the [socially-acceptable maximum age of anyone you should date](https://www.psychologytoday.com/us/blog/meet-catch-and-keep/201405/who-is-too-young-or-too-old-you-date).
+
+The math works like this:
+
+- Take your age
+- subtract 7
+- Double the result
+```
+
+- The `##` line is a headline. Add more `###` and you get a smaller headline, like subheads.
+- There is a full blank return between each element, including paragraphs of text.
+- In the first paragraph we have embedded a hyperlink. We put the words we want to show inside square brackets and the URL in parenthesis DIRECTLY after the closing square bracket: `[words to link](https://the_url.org)`.
+- The `-` at the beginning of a line creates a bullet list. (You can also use `*`). Those lines need to be one after another without blank lines.
+
+1. Go ahead and copy the code above and add it as text in the notebook so you can see it works later.
+
+### Adding new code chunks
+
+The text after the chart describes how to insert a new code chunk. Let's do that.
+
+1. Add a couple of returns before the paragraph of text about code chunks.
+1. Use the keys *Cmd+Option+i* to add the chunk.
+1. Your cursor will be inserted into the middle of the chunk. Type in this code in the space provided:
+
+
+```r
+# update 54 to your age
+age <- 54
+(age - 7) * 2
+```
+
+```
+## [1] 94
+```
+
+1. Change for "54" to your real age.
+1. With your cursor somewhere in the code block, use the key command *Cmd+Shift+Return*, which is the key command to RUN ALL LINES of code chunk.
+
+> NOTE: To run an individual line, use *Cmd+Return* while on that line.
+
+Congratulations! The answer given at the bottom of that code chunk is the [socially-acceptable maximum age of anyone you should date](https://www.psychologytoday.com/us/blog/meet-catch-and-keep/201405/who-is-too-young-or-too-old-you-date).
+
+Throwing aside whether the formula is sound, let's break down the code.
+
+- `# update 54 to your age` is a comment. It's a way to explain what is happening in the code without being considered part of the code. We create comments by starting with `#`. You can also add a comment at the end of a line.
+- `age <- 54` is assigning a number (`54`) to an R object/variable called (`age`). A variable is a placeholder. It can hold numbers, text or even groups of numbers. Variables are key to programming because they allow you to change a value as you go along.
+- The next part is simple math: `(age - 7) * 2` takes the value of `age` and subtracts `7`, then multiplies by `2`.
+- When you run it, you get the result of the math equazion, `[1] 94` in my case. That means there was one observation, and the value was "94". For the record, my wife is _much_ younger than that.
+
+Now you can play with the number assigned to the age variable to test out different ages. Do that.
+
+### Practice adding code chunks
+
+Now, on your own, add a similar section that calculates the **minimum** age of someone you should date, but using the formula `(age / 2) + 7`.
+
+1. Add a RMarkdown headline and text describing what you are doing.
+1. Create a code chunk that that calculates the formula based on your age.
+1. Include a comment within the code block.
+
+### Preview the report
+
+The rest of the boilerplate text here describes how you can *Preview* and *Knit* a notebook. Let's do that now.
+
+- Press *Cmd+Shift+K* to open a Preview.
+
+This will open a new window and show you the "pretty" notebook that we are building.
+
+Preview is a little different than *Knit*, which runs all the code, then creates the new knitted HTML document. It's **Knit to HMTL** that you'll want to do before turning in your assignments. That is explained below.
+
+### The toolbar
+
+One last thing to point out before we turn this in: The toolbar that runs across the top of the R Notebook file window. The image below explains some of the more useful tools, but you _REALLY_ should learn and use keyboard commands when they are available.
+
+![R Notebook toolbar](images/intro-toolbar.png)
+
+### Knit the final workbook
+
+1. Save your File with *Cmd+S*.
+1. Click on the dropdown next to the **Run** menu item and choose _Restart R and Run All Chunks_. We do this to make sure everything still works.
+1. Use the **Knit** button in the toolbar to choose **Knit to HTML**.
+
+This will open your knitted file. Isn't it pretty?
+
+## Turning in our projects
+
+If you now look in your Files pane, you'll see you have four files in our project. (Note the only one you actually edited was the `.Rmd` file.)
+
+![Files list](images/intro-files.png)
+
+The best way to turn in all of those files into Canvas is to compress them into a single `.zip` file that you can upload to the assignment.
+
+1. In your computer's Finder, open the `Documents/rwd` folder.
+1. Follow the directions for your operating system linked below to create a compressed version of your `yourname-final-project` folder.
+1. [Compress files on a Mac](https://www.macinstruct.com/tutorials/how-to-compress-zip-files-and-folders-on-a-mac/).
+1. [Compress flies on Windows](https://www.laptopmag.com/articles/how-to-zip-files-windows-10).
+1. Upload the resulting `.zip` file to the assignment for this week in Canvas.
+
+
+If you find you make changes to your R files after you've zipped your folder, you'll need to delete the `zip` file and compress it again.
+
+Because we are building "repeatable" code, I'll be able to download your `.zip` files, uncompress them, and the re-run them to get the same results.
+
+Well done! You've completed the first level and earned the _Beginner_ badge.
+
diff --git a/docs/03-counts-import.md b/docs/03-counts-import.md
new file mode 100644
index 00000000..007bf27b
--- /dev/null
+++ b/docs/03-counts-import.md
@@ -0,0 +1,732 @@
+# Summarize with count - import {#counts-import}
+
+> “If you’re doing data analysis every day, the time it takes to learn a programming language pays off pretty quickly because you can automate more and more of what you do.” --Hadley Wickham, chief scientist at RStudio
+
+## Learning goals of this lesson
+
+- Practice organized project setup.
+- Learn a little about data types available to R.
+- Learn about R packages, how to install and import them.
+- Learn how to download and import CSV files using the [readr](https://readr.tidyverse.org/) package.
+- Introduce the Data Frame/Tibble.
+- Introduce the tidyverse ` %>% `.
+- Learn how to modify data types (date) and `select()` columns.
+
+We'll be exploring the Billboard Hot 100 charts along the way. Eventually you find the answers to a bunch of questions in this data and write about it.
+
+## Basic steps of this lesson
+
+Before we get into our storytelling, we have to get our data and make sure it is in good shape for analysis. This is pretty standard for any new project. Here are the major steps we'll cover in detail for this lesson (and many more to come):
+
+- Create your project structure
+- Find the data and get it
+- Import the data into your project
+- Clean up data types and columns
+- Export cleaned data for later analysis
+
+## Create a new project
+
+We did this once Chapter 2, but here are the basic steps:
+
+1. Launch RStudio
+1. Make sure you don't have an existing project open. Use File > Close project if you do.
+1. Use the `+R` button to create a **New Project** in a **New Directory**
+1. Name the project `yourfirstname-billboard` and put it in your `~/Documents/rwd` folder.
+1. Use the `+` button and use **R Notebook** to start a new notebook.
+1. Change the title to "Billboard Hot 100 Import".
+1. Delete the other boilerplate text.
+1. Save the file as `01-import.Rmd`.
+
+### Describe the goals of the notebook
+
+
+We'll add our first bit of RMarkdown just after the meta data to explain what we are doing. Add this text to your notebook:
+
+```rmarkdown
+## Goals of this notebook
+
+Steps to prepare our data:
+
+- Download the data
+- Import into R
+- Clean up data types and columns
+- Export for next notebook
+```
+
+We want to start each notebook with a list like this so our future selves and others know what the heck we are trying to accomplish.
+
+We will also write text like this for each new "section" or goal in the notebook.
+
+### The R Package environment
+
+We have to back up from the step-by-step nature of this lesson and talk a little about the R programming language.
+
+R is an open-source language, which means that other programmers can contribute to how it works. It is what makes R beautiful.
+
+What happens is developers will find it difficult to do a certain task, so they will write an R "Package" of code that helps them with that task. They share that code with the community, and suddenly the R garage has an ["ultimate set of tools"](https://youtu.be/Y1En6FKd5Pk?t=24) that would make Spicoli's dad proud.
+
+One set of these tools is Hadley Wickham's [Tidyverse](https://www.tidyverse.org/), a set of packages for data science. These are the tools we will use most in this course. While not required reading, I highly recommend Wickham's book [R for data science](https://r4ds.had.co.nz/index.html), which is free.
+
+There are also a series of useful [cheatsheets](https://www.rstudio.com/resources/cheatsheets/) that can help you as you use the packages and functions from the tidyverse. We'll refer to these throughout the course.
+
+### Installing and using packages
+
+There are two steps to using an R package:
+
+- **Install the package** using `install.packages("package_name")`. You only have to do this once for each computer, so I usually do it using the R Console instead of in notebook.
+- **Include the library** using `library(package_name)`. This has to be done for each Notebook or script that uses it, so it is usually one of the first things in the notebook.
+
+> Note that you have to use "quotes" around the package name when you are installing, but you DON'T use quotes when you load the library.
+
+We're going to install several packages we will use in this project. To do this, we are going to use the **Console**, which we haven't talked about much yet.
+
+![The Console and Terminal](images/import-console.png){width=600px}
+
+1. Use the image above to orient yourself to the R Console and Terminal.
+1. In the Console, type in:
+
+```r
+install.packages("tidyverse")
+```
+
+As you type into the Console, you'll see some type-assist hints on what you need. You can use the arrow keys to select one and hit the _tab_ key to complete that command, then enter the values you need. If it asks you to install "from source", type `Yes` and hit return.
+
+You'll see a bunch of response in the Console.
+
+1. We need two other packages as well, so also do:
+
+```r
+install.packages("janitor")
+install.packages("lubridate")
+
+```
+
+We'll use janitor to clean up our data column names, among other things. A good reference to learn more is the [janitor vignette](https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html).
+
+We'll use [lubridate](https://lubridate.tidyverse.org/) to fix some dates, which are a special complicated thing in programming. Lubridate is part of the tidyverse universe, but we have to install and load it separately.
+
+You only have to install the packages once on your computer (though you have to load them into each new notebook, which is explained below).
+
+### Load the libraries
+
+Next, we're going to tell our R Notebook to use these three libraries.
+
+1. After the metadata at the top of your notebook, use *Cmd+option+i* to insert an R code chunk.
+1. In that chunk, type in the two libraries and run the code block with *Cmd+Shift+Return*.
+
+This is the code you need:
+
+
+```r
+library(tidyverse)
+library(janitor)
+library(lubridate)
+```
+
+Your output will look something like this:
+
+![Libraries imported](images/import-libraries.png){width=600px}
+
+### Create a directory for your data
+
+I want you to create a folder called `data-raw` in your project folder. We are creating this folder because we want to keep a pristine version of our original data that we never change or overwrite. This is a basic data journalism principle: _Thou shalt not change raw data_.
+
+In your Files pane at the bottom-right of Rstudio, there is a **New Folder** icon.
+
+1. Click on the **New Folder** icon.
+1. Name your new folder `data-raw`. This is where we'll put raw data. We never write data to this folder.
+1. Also create another new folder called `data-processed`. This is were we write data. We separate them so we don't accidentally overwrite raw data.
+
+Once you've done that, they should show up in the file explorer in the Files pane. Click the refresh button if you don't see them. (The circlish thing at top right of the screenshot below. You might have to widen the pane to see it.)
+
+![Directory made](images/bb-new-folder.png){width=400px}
+
+Your `.Rproj` file name is likely different (and that s OK) and you can ignore the `.gitignore` I have there.
+
+### Let's get our data
+
+Now that we have a folder for our data, we can download our data into it. The data was scraped and saved on [data.world](https://data.world/kcmillersean/billboard-hot-100-1958-2017) by Sean Miller, but you can just download my copy of the data using the `download.file` function in R.
+
+For the purposes of this assignment, we will "source" the data as being from Billboard Media, as that is who initially provided it. I've worked with data fairly extensively, and it is sound.
+
+1. Add a Markdown headline `## Downloading data` and on a new line text that indicates you are downloading data. You would typically include a link and explain what it is, etc, often linking to the original source.
+1. Create an R chunk and include the following (hint: use the copy icon at the top right):
+
+```r
+# hot 100 download
+download.file("https://github.com/utdata/rwd-billboard-data/blob/main/data-process/hot-100/hot100-orig.csv?raw=true", "data-raw/hot-stuff.csv")
+```
+
+This `download.file` function takes at least two arguments:
+
+- The URL of the file you are downloading
+- The path and name of where you want to save it.
+
+Note those two arguments are in quotes. The path includes the folder name you are saving the file to, which we called `hot-stuff.csv`.
+
+When you run this, it should save the file and then give you output similar to this:
+
+```text
+trying URL 'https://github.com/utdata/rwd-billboard-data/blob/main/data-process/hot-100/hot100-orig.csv?raw=true'
+Content type 'text/plain; charset=utf-8' length 45795374 bytes (43.7 MB)
+==================================================
+downloaded 43.7 MB
+```
+
+That's a pretty big file.
+
+## About data sources
+
+Depending on the data source, importing can be brilliantly easy or a major pain in the rear. It all depends on how well-formatted is the data.
+
+In this class, we will primarily use data from CSVs (Comma Separated Value), Excel files and APIs (Application Programming Interface).
+
+- **CSVs** are a kind of lowest-common-denominator for data. Most any database or program can import or export them. It's just text with a `,` between each value.
+- **Excel** files are good, but are often messy because humans get involved. They often have multiple header rows, columns used in multiple ways, notes added, etc. Just know you might have to clean them up before or after importing them.
+- **APIs** are systems designed to respond to programming. In the data world, we often use the APIs by writing a query to ask a system to return a selection of data. By definition, the data is well structured. You can often determine the file type of the output as part of the API call, including ...
+- **JSON** (or JavaScript Object Notation) is the data format preferred by JavaScript. R can read it, too. It is often the output format of APIs, and prevalent enough that you need to understand how it works. We'll get into that later in semester.
+
+Don't get me wrong ... there are plenty of other data types and connections available through R, but those are the ones we'll deal with most in this book.
+
+## Our project data
+
+Now that we've downloaded the data and talked about what data is, lets talk about our Billboard data specifically.
+
+The data includes the Billboard's Weekly Hot 100 singles charts from its inception on 8/2/1958 through 2020. It is a modified version of data compiled by SEAN MILLER and posted on data.world. We are using a copy I have saved.
+
+When you write about this data (and you will), you should source it as **the Billboard Hot 100 from Billboard Media**, since that is where the data comes from via an API.
+
+## Data dictionary
+
+This data contains weekly Hot 100 singles chart from Billboard.com. **Each row of data represents a song and the corresponding position on that week's chart.** Included in each row are the following elements:
+
+- Billboard Chart URL: Website for the chart
+- WeekID: Basically the date
+- Song name
+- Performer name
+- SongID: Concatenation of song & performer
+- Current week on chart
+- Instance: This is used to separate breaks on the chart for a given song. For example, an instance of 6 tells you that this is the sixth time this song has fallen off and then appeared on the chart
+- Previous week position
+- Peak Position: As of the current week
+- Weeks on Chart: As of the current week
+
+Let's import it so we can _see_ the data.
+
+## Import the data
+
+Since we are doing a new thing, we should note that with a Markdown headline and text.
+
+1. Add a Markdown headline: `## Import data`
+1. Add some text to explain that we are importing the Billboard Hot 100 data.
+1. After your description, add a new code chunk (*Cmd+Option+i*).
+
+We'll be using the `read_csv()` function from the tidyverse [readr](https://readr.tidyverse.org/) package, which is different from `read.csv` that comes with base R. `read_csv()` is mo betta.
+
+Inside the function we put in the path to our data, inside quotes. If you start typing in that path and hit tab, it will complete the path. (Easier to show than explain).
+
+1. Add the follow code into your chunk and run it.
+
+```r
+read_csv("data-raw/hot-stuff.csv")
+```
+
+> Note the path is in quotes.
+
+You get two results printed to your screen.
+
+The first result called **"R Console"** shows what columns were imported and the data types. It's important to review these to make sure things happened the way that expected. In this case it noted which columns came in as text (`chr`), or numbers (`dbl`).
+
+Note: **Red** colored text in this output is NOT an indication of a problem.
+
+![RConsole output](images/bb-import-show-cols.png){width=600}
+
+The second result **spec_tbl_df** prints out the data like a table. The data object is a [tibble](https://tibble.tidyverse.org/), which is a fancy tidyverse version of a "data frame".
+
+> I will use the term tibble and data frame interchangably. Think of tibbles and data frames like a well-structured spreadsheet. They are organized rows of data (called observations) with columns (called variables) where every column is a specific data type.
+
+![Data output](images/bb-import-show-data.png){width=600}
+
+When we look at the data output into RStudio, there are several things to note:
+
+- Below each column name is an indication of the data type. This is important.
+- You can use the arrow icon on the right to page through the additional columns.
+- You can use the paging numbers and controls at the bottom to page through the rows of data.
+- The number of rows and columns is displayed.
+
+Of special note here, we have only printed this data to the screen. We have not saved it in any way, but that is next.
+
+## Assign our import to a tibble
+
+As of right now, we've only printed the data to our screen. We haven't "saved" it at all. Next we need to assign it to an **R object** so it can be named thing in our project environment so we can reuse it. We don't want to re-import the data every time we use the data.
+
+The syntax to create an object in R can seem weird at first, but the convention is to name the object first, then insert stuff into it. So, to create an object, the structure is this:
+
+```r
+# this is pseudo code. don't run it.
+new_object <- stuff_going_into_object
+```
+
+Let's make a object called `hot100` and fill it with our imported tibble.
+
+1. Edit your existing code chunk to look like this. You can add the `<-` by using _Option+-_ as in holding down the Option key and then pressing the hyphen:
+
+
+```r
+hot100 <- read_csv("data-raw/hot-stuff.csv")
+```
+
+Run that chunk and several things happen:
+
+- We no longer see the result of the data printed to the screen. That's because we created a tibble instead of printing it to the screen. You do get the RConsole output.
+- In the **Environment** tab at the top-right of RStudio, you'll see the `hot100` object listed.
+ + Click on the blue play button next to ratings and it will expand to show you a summary of the columns.
+ + Click on the name and it will open a "View" of the data in another window, so you can look at it in spreadsheet form. You can even sort and filter it.
+- Once you've looked at the data, close the data view with the little `x` next to the tab name.
+
+### Print a peek to the screen
+
+Since we can't see the data after we assign it, let's print the object to the screen so we can refer to it.
+
+1. Edit your import chunk to add the last two lines of this, including the one with the `#`:
+
+```r
+hot100 <- read_csv("data-raw/hot100.csv")
+
+# peek at the data
+hot100
+```
+
+> You can use the green play button at the right of the chunk, or preferrably have your cursor inside the chunk and do _Cmd+Shift+Return_ to run all lines. (_Cmd+Return_ runs only the current line.)
+
+This prints your saved tibble to the screen.
+
+The line with the `#` is a comment _within_ the code chunk. Commenting what your code is important to your future self, and sometimes we do that within the code chunk instead of markdown if it will be more clear.
+
+### Glimpse the data
+
+There is another way to peek at the data that I prefer because it is more compact and shows you all the columns and data examples without scrolling: `glimpse()`.
+
+1. In your existing chunk, edit the last line to add the `glimpse()` function as noted below.
+
+I'm showing the return here as well. Afterward I'll explain the pipe: ` %>% `.
+
+```r
+hot100 <- read_csv("data-raw/hot-stuff.csv")
+
+# peek at the data
+hot100 %>% glimpse()
+```
+
+
+```
+## Rows: 327,895
+## Columns: 10
+## $ url "http://www.billboard.com/charts/hot-100/1965-0…
+## $ WeekID "7/17/1965", "7/24/1965", "7/31/1965", "8/7/196…
+## $ Week.Position 34, 22, 14, 10, 8, 8, 14, 36, 97, 90, 97, 97, 9…
+## $ Song "Don't Just Stand There", "Don't Just Stand The…
+## $ Performer "Patty Duke", "Patty Duke", "Patty Duke", "Patt…
+## $ SongID "Don't Just Stand TherePatty Duke", "Don't Just…
+## $ Instance 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+## $ Previous.Week.Position 45, 34, 22, 14, 10, 8, 8, 14, NA, 97, 90, 97, 9…
+## $ Peak.Position 34, 22, 14, 10, 8, 8, 8, 8, 97, 90, 90, 90, 90,…
+## $ Weeks.on.Chart 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 1, …
+```
+
+Here you get the RConsole printout (hidden here for clarity), plus the glimpse shows there are 300,000+ rows and 10 columns in our data. Each column is then listed out with its data type and the first several values in that column.
+
+### About the pipe %>%
+
+We need to break down this code a little: `hot100 %>% glimpse()`.
+
+We are starting with the tibble `hot100`, but then we follow it with ` %>% `, which is called a pipe. It is a tidyverse tool that allows us to take the **results** of an object or function and pass them into another function. Think of it like a sentence that says **"AND THEN" the next thing**.
+
+It might look like there are no arguments inside `glimpse()`, but what we are actually doing is passing the `hot100` tibble into it.
+
+You can't start a new line with a pipe. If you are breaking into multiple lines, but the ` %>% ` at the end.
+
+> IMPORTANT: There is a keyboard command for the pipe ` %>% `: **Cmd+Shift+m**. Learn that one.
+
+### What is clean data
+
+The "Checking Your Data" section of this [DataCamp tutorial](https://www.datacamp.com/community/tutorials/r-data-import-tutorial) has a good outline of what makes good data, but in general it should:
+
+- Have a single header row with well-formed column names.
+ + One column name for each column. No merged cells.
+ + Short names are better than long ones.
+ + Spaces in names make them harder to work with. Use and `_` or `.` between words. I prefer `_` and lowercase text.
+- Remove notes or comments from the files.
+- Each column should have the same kind of data: numbers vs words, etc.
+- Each row should be a single thing called an "observation". The columns should describe attributes of that observation.
+
+Data rarely comes clean like that. There can be many challenges importing and cleaning data. We'll face some of those challenges here. In our case our columns names could use help, and our field `WeekID` is not really a date, but text characters. We'll tackle those issues next.
+
+## Cleaning column names
+
+So, given those notes above, we should clean up our column names. This is why we have included the janitor package, which includes a neat function called `clean_names()`
+
+1. Edit the first line of your chunk to add a pipe and the clean_names function: ` %>% clean_names()`
+
+
+```r
+hot100 <- read_csv("data-raw/hot-stuff.csv") %>% clean_names()
+
+# peek at the data
+hot100 %>% glimpse()
+```
+
+```
+## Rows: 327,895
+## Columns: 10
+## $ url "http://www.billboard.com/charts/hot-100/1965-0…
+## $ week_id "7/17/1965", "7/24/1965", "7/31/1965", "8/7/196…
+## $ week_position 34, 22, 14, 10, 8, 8, 14, 36, 97, 90, 97, 97, 9…
+## $ song "Don't Just Stand There", "Don't Just Stand The…
+## $ performer "Patty Duke", "Patty Duke", "Patty Duke", "Patt…
+## $ song_id "Don't Just Stand TherePatty Duke", "Don't Just…
+## $ instance 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+## $ previous_week_position 45, 34, 22, 14, 10, 8, 8, 14, NA, 97, 90, 97, 9…
+## $ peak_position 34, 22, 14, 10, 8, 8, 8, 8, 97, 90, 90, 90, 90,…
+## $ weeks_on_chart 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 1, …
+```
+
+This function has cleaned up your names, making them all lowercase and using `_` instead of periods between words. Believe me when I say this is helpful later to auto-complete column names when you are writing code.
+
+## Fixing the date
+
+Dates are a tricky datatype because you can do math with them. To use them properly in R we need to convert them from the text we have here to a _real date_.
+
+Converting dates can be a pain, but the tidyverse universe has a package called [lubridate](https://lubridate.tidyverse.org/) that can help us with that.
+
+Since we are doing something new, we want to start a new section in our notebook and explain what we are doing.
+
+1. Add a headline: `## Fix our dates`.
+1. Add some text that you are using lubridate to create a new column with a real date.
+1. Add a new code chunk. Remember _Cmd+Option+i_ will do that.
+
+We are going to start by creating a new data frame that is the same as our current on, and then add a glimpse so we can see the results as we build upon it.
+
+1. Add the following inside your code chunk.
+
+
+```r
+# part we will build on
+hot100_date <- hot100
+
+# peek at the result
+hot100_date %>% glimpse()
+```
+
+```
+## Rows: 327,895
+## Columns: 10
+## $ url "http://www.billboard.com/charts/hot-100/1965-0…
+## $ week_id "7/17/1965", "7/24/1965", "7/31/1965", "8/7/196…
+## $ week_position 34, 22, 14, 10, 8, 8, 14, 36, 97, 90, 97, 97, 9…
+## $ song "Don't Just Stand There", "Don't Just Stand The…
+## $ performer "Patty Duke", "Patty Duke", "Patty Duke", "Patt…
+## $ song_id "Don't Just Stand TherePatty Duke", "Don't Just…
+## $ instance 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+## $ previous_week_position 45, 34, 22, 14, 10, 8, 8, 14, NA, 97, 90, 97, 9…
+## $ peak_position 34, 22, 14, 10, 8, 8, 8, 8, 97, 90, 90, 90, 90,…
+## $ weeks_on_chart 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 1, …
+```
+
+A refresher to break this down:
+
+- I have a comment starting with `#` to explain the first part of the code
+- We are taking the `hot100` object and pushing it into a new object called `hot100_date`.
+- I have a blank line for clarity
+- Another comment
+- We glimpse the new `hot100_date` object so we can see changes as we work on it.
+
+> To be clear, we haven't changed any data yet. We just created a new tibble like the old tibble.
+
+### Working with mutate()
+
+We are going to replace our current date field `week_id` with a converted date. We use a dplyr function [`mutate()`](https://dplyr.tidyverse.org/reference/mutate.html) to do this, with some help from lubridate.
+
+> [dplyr](https://dplyr.tidyverse.org/) is the tidyverse package of functions to manipulate data. We'll use it a lot. It is loaded with the `library(tidyverse)`.
+
+Let's explain how mutate works first. Mutate changes every value in a column. We can either create a new column or overwrite an existing one.
+
+Within the mutate function, we name the new thing first (confusing I know) and the set the value of that new thing.
+
+```r
+# this is psuedo code. Don't run it.
+data %>%
+ mutate(
+ newcol = oldcol
+ )
+```
+
+That new value could be arrived at through math or any combination of other functions. In our case, we want to convert our old text-based date to a _real date_, and then assign it back to the "new" column, **but really we are overwriting the existing one**.
+
+Some notes about the above:
+
+- It might seem weird to list the new thing first when we are changing it, but that is how R works in this case. You'll see that pattern elsewhere, like we have already with assigning data into tibbles.
+- We need to be careful when we overwrite data. In this case I feel comfortable doing so because we are creating a new tibble at the same time, so I still have my original data in my project.
+- I strategically used returns to make the code more readable. This code would work the same if it were all on the same line, but writing it this way helps me understand it. RStudio will help you indent properly this as you type. (Easier to show than explain.)
+
+1. Edit your chunk to add the changes below and run it. I **implore** you to _type_ the changes so you see how RStudio helps you write it. Use tab completion, etc.
+
+
+```r
+# part we will build on
+hot100_date <- hot100 %>%
+ mutate(
+ week_id = mdy(week_id)
+ )
+
+# peek at the result
+hot100_date %>% glimpse()
+```
+
+```
+## Rows: 327,895
+## Columns: 10
+## $ url "http://www.billboard.com/charts/hot-100/1965-0…
+## $ week_id 1965-07-17, 1965-07-24, 1965-07-31, 1965-08-07…
+## $ week_position 34, 22, 14, 10, 8, 8, 14, 36, 97, 90, 97, 97, 9…
+## $ song "Don't Just Stand There", "Don't Just Stand The…
+## $ performer "Patty Duke", "Patty Duke", "Patty Duke", "Patt…
+## $ song_id "Don't Just Stand TherePatty Duke", "Don't Just…
+## $ instance 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+## $ previous_week_position 45, 34, 22, 14, 10, 8, 8, 14, NA, 97, 90, 97, 9…
+## $ peak_position 34, 22, 14, 10, 8, 8, 8, 8, 97, 90, 90, 90, 90,…
+## $ weeks_on_chart 4, 5, 6, 7, 8, 9, 10, 11, 1, 2, 3, 4, 5, 6, 1, …
+```
+
+What we did there with your `mutate()` function was name our **new** column but we used the name of an existing one `week_id` so it will really **replace** the data. We replaced it with a lubridate function which I'll explain next.
+
+Lubridate allows us to parse text and then turn it into a date if we supply the order of the date values in the original data.
+
+- Our original date was something like "07/17/1965". That is month, followed by day, followed by year.
+- The lubridate function `mdy()` converts that text into a _real_ date, which properly shows as YYYY-MM-DD, or year then month then day. Lubridate is smart enough to figure out if you have `/` or `-` between your values in the original date.
+
+If your original text is in a different date order, then you look up what function you need. I typically use the **cheatsheet** that you'll find on the [lubridate page](https://lubridate.tidyverse.org/). You'll find them in the PARSE DATE-TIMES section.
+
+## Arrange the data
+
+If you inspect our newish `week_id` in your glimpse return, you'll notice the first record starts in "1965-07-17" but our data goes back to 1958. We want to sort our data by the oldest records first using `arrange()`.
+
+We will use the `%>%` and then the arrange function, feeding it our data (implied with the pipe) and the columns we wish to sort by.
+
+1. Edit your chunk to the following to add the `arrange()` function:
+
+
+```r
+# part we will build on
+hot100_date <- hot100 %>%
+ mutate(
+ week_id = mdy(week_id)
+ ) %>%
+ arrange(week_id, week_position)
+
+# peek at the result
+hot100_date %>% glimpse()
+```
+
+```
+## Rows: 327,895
+## Columns: 10
+## $ url "http://www.billboard.com/charts/hot-100/1958-0…
+## $ week_id 1958-08-02, 1958-08-02, 1958-08-02, 1958-08-02…
+## $ week_position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
+## $ song "Poor Little Fool", "Patricia", "Splish Splash"…
+## $ performer "Ricky Nelson", "Perez Prado And His Orchestra"…
+## $ song_id "Poor Little FoolRicky Nelson", "PatriciaPerez …
+## $ instance 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+## $ previous_week_position NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
+## $ peak_position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
+## $ weeks_on_chart 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+```
+
+Now when you look at the glimpse, the first record is from "1958-08-02" and the first `week_position` is "1", which is the top of the chart.
+
+Just to see this all clearly in table form, we'll print the top of the table to our screen so we can see it.
+
+1. Add a line of text in your notebook explaining your are looking at the table.
+1. Add a new code chunk and add the following.
+
+> The result will look different in your notebook vs this book.
+
+
+```r
+hot100_date %>% head(10)
+```
+
+```
+## # A tibble: 10 × 10
+## url week_id week_position song performer song_id instance
+##
+## 1 http://ww… 1958-08-02 1 Poor L… Ricky Nelson Poor Littl… 1
+## 2 http://ww… 1958-08-02 2 Patric… Perez Prado… PatriciaPe… 1
+## 3 http://ww… 1958-08-02 3 Splish… Bobby Darin Splish Spl… 1
+## 4 http://ww… 1958-08-02 4 Hard H… Elvis Presl… Hard Heade… 1
+## 5 http://ww… 1958-08-02 5 When Kalin Twins WhenKalin … 1
+## 6 http://ww… 1958-08-02 6 Rebel-… Duane Eddy … Rebel-'rou… 1
+## 7 http://ww… 1958-08-02 7 Yakety… The Coasters Yakety Yak… 1
+## 8 http://ww… 1958-08-02 8 My Tru… Jack Scott My True Lo… 1
+## 9 http://ww… 1958-08-02 9 Willie… The Johnny … Willie And… 1
+## 10 http://ww… 1958-08-02 10 Fever Peggy Lee FeverPeggy… 1
+## # … with 3 more variables: previous_week_position , peak_position ,
+## # weeks_on_chart
+```
+
+This just prints the first 10 lines of the data.
+
+1. Use the arrows to look at the other columns of the data (which you can't see in the book).
+
+## Selecting columns
+
+We don't need all of these columns for our analysis, so we are going to **select** only the ones we need. This will make our exported data file smaller. To understand the concept, you can review the [Select](https://vimeo.com/showcase/7320305) video in the Basic Data Journalism Functions series.
+
+
+
+
+It boils down to this: We are selecting only the columns we need. In doing so, we will drop `url`, `song_id` and `instance`.
+
+1. Add a Markdown headline: `## Selecting columns`.
+1. Explain in text we are tightening the date to only the columns we need.
+1. Add the code below and then I'll explain it.
+
+
+```r
+hot100_tight <- hot100_date %>%
+ select(
+ -url,
+ -song_id,
+ -instance
+ )
+
+hot100_tight %>% glimpse()
+```
+
+```
+## Rows: 327,895
+## Columns: 7
+## $ week_id 1958-08-02, 1958-08-02, 1958-08-02, 1958-08-02…
+## $ week_position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
+## $ song "Poor Little Fool", "Patricia", "Splish Splash"…
+## $ performer "Ricky Nelson", "Perez Prado And His Orchestra"…
+## $ previous_week_position NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
+## $ peak_position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
+## $ weeks_on_chart 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+```
+
+In our code we:
+
+- Name our new tibble
+- Assign to a result of the hot100_date tibble
+- In that tibble, we use the select statement to remove (using `-`) certain columns.
+
+Alternately, we could just name the columns we want to keep without the `-` sign. But there were fewer to remove than keep.
+
+> To be clear, there are other ways to use [`select()`](https://dplyr.tidyverse.org/reference/select.html) this that use less code, but I want to be as straigtforward as possible at this point. It's [pretty powerful](https://dplyr.tidyverse.org/reference/select.html).
+
+## Exporting data
+
+### Single-notebook philosophy
+
+I have a pretty strong opinion that you should be able to open any RNotebook in your project and run it from top to bottom without it breaking. In short, one notebook should not be dependent on the previous running of another open notebook.
+
+This is why I had you name this notebook `01-import.Rmd` with a number 1 at the beginning. We'll number our notebooks in the order they should be run. It's an indication that before we can use the notebook `02-analysis` (next lesson!) that the `01-import.Rmd` notebook has to be run first.
+
+> I use `01-` instead of just `1-` in case there are more than nine notebooks. I want them to appear in order in my files directory. I'm anal retentive.
+
+So we will create an exported file from our first notebook that can be used in the second one. Once we create that file, the second notebook can be opened and run at any time.
+
+Why make this so complicated?
+
+The answer is **consistency**. When you follow the same structure with each project, you quickly know how to dive into that project at a later date. If everyone on your team uses the same structure, you can dive into your teammates code because you already know how it is organized. If we separate our importing/cleaning into it's own file to be used by many other notebooks, we can fix future cleaning problems in ONE place instead of many places.
+
+One last example to belabor the point: It can save time. I've had import/cleaning notebooks that took 20 minutes to process. Imagine if I had to run that every time I wanted to rebuild my analysis notebook. Instead, the import notebook spits out clean file that can be imported in a fraction of that time.
+
+This was all a long-winded way of saying we are going to export our data now.
+
+### Exporting as rds
+
+We are able to pass cleaned data between notebooks because of a native R data format called `.rds`. When we export in this format it saves not only rows and columns, but also the data types. (If we exported as CSV, we would potentially have to re-fix the date or other data types when we imported again.)
+
+We will use another readr function called `write_rds()` to create our file to pass along to the next notebook, saving the data into the `data-processed` folder we created earlier. We are separating it from our data-raw folder because "Thou shalt not change raw data" even by accident. By always writing data to this different folder, we help avoid accidentally overwriting our original data.
+
+1. Create a Markdown headline `## Exports` and write a description that you are exporting files to .rds.
+1. Add a new code chunk and add the following code:
+
+
+
+
+```r
+hot100_tight %>%
+ write_rds("data-processed/01-hot100.rds")
+```
+
+So, we are starting with the `hot100_tight` tibble that we saved earlier. We then pipe ` %>% ` the result of that into a new function `write_rds()`. In addition to the data, the function needs to where to save the file, so in quotes we give the path to where and what we want to call the file: `"data-processed/hot100.rds"`.
+
+Remember, we are saving in data-processed because we never export into data-raw. We are naming the file starting with `01-` to indicate to our future selves that this output came from our first notebook. We then name it, and use the `.rds` extension.
+
+## Naming chunks
+
+I didn't want to break our flow of work to explain this earlier, but I want you to name all your chunks so you can use a nice feature in RStudio to jump up and down your notebook.
+
+Let me show you and example of why first. Look at the bottom of your window above the console and you'll see a dropdown window. Click on that.
+
+Here is mine, but yours wlll be different:
+
+![RStudio bookmarks](images/bb-rstudio-bookmarks.png){width=400}
+
+You'll notice that my chunks have names, but yours probably don't. It's pretty helpful have these names so you know what the chunk does. You can use this menu to skip up and down the notebook.
+
+How to name a chunk? Well, I can't show you in code because it is not rendered in the book, but here is a picture:
+
+![Named chunks](images/bb-name-chunks.png){width=500}
+
+See where I have `{r download}`? I named it that because that is what the chunk does.
+
+- Chunk names can't have spaces. Use a single word or `-` or `_` between words.
+- There are other configurations we can do here, but that is for later.
+
+1. Go back through your notebook and name all your chunks.
+1. Under the **Run** menu, choose _Restart R and run all chunks_.
+
+Make sure that your Notebook ran all the way from top to bottom. The order of stuff in the notebook matters and you can make errors as you edit up and down the notebook. You **always** want to do this before you finish a notebook.
+
+## Knit your page
+
+Lastly, we want to Knit your notebook so you can see the pretty HTML verison.
+
+1. Next to the **Preview** menu in the notebook tool bar, click the little dropdown to see the knitting options.
+1. Choose **Knit to HTML**.
+
+![Knit to HTML](images/bb-knit-to-html.png){width=300}
+
+After you do this, the menu will probably change to just **Knit** and you can just click on it to knit again.
+
+This will open a nice reader-friendly version of your notebook. You could send that file (called `01-import.html`) to your editor and they could open it in a web browser.
+
+> I use these knit files to publish my work on Github, but it is a bit more involved to do all that so we'll skip it at least for now.)
+
+## Review of what we've learned so far
+
+Most of this lesson has been about importing and combining data, with some data mutating thrown in for fun. (OK, I have an odd sense of what fun is.) Importing data into R (or any data science program) can sometimes be quite challenging, depending on the circumstances. Here we were working with well-formed data, but we still used quite a few tools from the tidyverse universe like readr (read_csv, write_rds) and dplyr (select, mutate).
+
+Here are the functions we used and what they do. Most are linked to documentation sites:
+
+- `install.packages()` downloads an R package to your computer. Typically executed from within the Console and only once per computer. We installed the [tidyverse](https://www.tidyverse.org/packages/), [janitor](https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html) and [lubridate](https://lubridate.tidyverse.org/) packages.
+- `library()` loads a package. You need it for each package in each notebook, like `library(tidyverse)`.
+- [`read_csv()`](https://readr.tidyverse.org/reference/read_delim.html) imports a csv file. You want that one, not `read.csv`.
+- `clean_names()` is a function in the [janitor](https://cran.r-project.org/web/packages/janitor/vignettes/janitor.html) package that standardizes column names.
+- [`glimpse()`](https://www.rdocumentation.org/packages/dplyr/versions/0.3/topics/glimpse) is a view of your data where you can see all of the column names, their data type and a few examples of the data.
+- `head()` prints the first 6 rows of your data unless you specify a different integer within the function.
+- [`mutate()`](https://dplyr.tidyverse.org/reference/mutate.html) changes data. You can create new columns or overwrite existing ones.
+- `mdy()` is a [lubridate](https://lubridate.tidyverse.org/) function to convert text into a date. There are other functions for different date orders.
+- [`select()`](https://dplyr.tidyverse.org/reference/select.html) selects columns in your tibble. You can list all the columns to keep, or use `-` to remove columns. There are many variations.
+- [`write_rds()`](https://readr.tidyverse.org/reference/read_rds.html) writes data out to a file in a format that preserves data types.
+
+## What's next
+
+Importing data is just the first step of exploring a data set. We'll work through the next chapter before we turn in any work on this.
+
+Please reach out to me if you have questions on what you've done so far. These are important skills you'll use on future projects.
diff --git a/docs/04-counts-analysis.md b/docs/04-counts-analysis.md
new file mode 100644
index 00000000..d2fd44a7
--- /dev/null
+++ b/docs/04-counts-analysis.md
@@ -0,0 +1,964 @@
+# Summarize with count - analysis {#count-analysis}
+
+This chapter continues the Billboard Hot 100 project. In the previous chapter we downloaded, imported and cleaned the data. We'll be working in the same project.
+
+## Goals of this lesson
+
+- To use group by/summarize/arrange combination to count rows.
+- To use filter to both focus data for summaries, and to logically end summary lists.
+- Introduce the shortcut `count()` function, along with complex filters.
+
+## The questions we'll answer
+
+Now that we have the Billboard Hot 100 charts data in our project it's time to find the answers to the following questions:
+
+- Which performers had the most appearances on the Hot 100 chart at any position?
+- Which performer/song combination has been on the charts the most number of weeks at any position?
+- Which performer/song combination was No. 1 for the most number of weeks?
+- Which performer had the most songs reach No. 1?
+- Which performer had the most songs reach No. 1 in the most recent five years?
+- Which performer had the most Top 10 hits overall?
+
+> What are your guesses for the questions above? No peeking!
+
+Before we can get into the analysis, we need to set up a new notebook.
+
+## Setting up an analysis notebook
+
+At the end of the last notebook we exported our clean data as an `.rds` file. We'll now create a new notebook and import that data. It will be much easier this time.
+
+1. If you don't already have it open, go ahead and open your Billboard project.
+1. If your import notebook is still open, go ahead and close it.
+1. Use the `+` menu to start a new **RNotebook*.
+1. Update the title as "Billboard analysis" and then remove all the boilerplate text below the YAML metadata.
+1. Save the file as `02-analysis.Rmd` in your project folder.
+1. Check your Environment tab (top right) and make sure the Data pane is empty. We don't want to have any leftover data. If there is, then go under the **Run** menu and choose **Restart R and Clear Output**.
+
+Since we are starting a new notebook, we need to set up a few things. First up we want to list our goals.
+
+1. Add a headline and text describing the goals of this notebook. You are exploring the Billboard Hot 100 charts data.
+1. Go ahead and copy all the questions outlined above into your notebook.
+1. Start each line with a `-` or `*` followed by a space.
+1. Now add a headline (two hashes) called Setup.
+1. Add a chunk, also name it "setup" and add the tidyverse library.
+1. Run the chunk to load the library.
+
+
+```r
+library(tidyverse)
+```
+
+### Import the data on your own
+
+In this next part I want you to think about how you've did the import in the last notebook and I want you to:
+
+1. Write a section to import the data using `read_rds()` and put it into a tibble called `hot100`.
+
+Yes, it is true that we haven't talked about [`read_rds()`](https://readr.tidyverse.org/reference/read_rds.html) yet but it works exactly the same way as `read_csv()`, so you should try to figure it out.
+
+Here are some hints and guides:
+
+- Start a new section with a headline and text to say what you are doing
+- Don't forget to name your code chunk (this should all be getting familiar).
+- `read_rds()` works the same was as `read_csv()`. The path _should_ be `data-processed/01-hot100.rds` if you did the previous notebook properly.
+- Remember the tibble needs to be named first and read data pushed into it.
+- Add a glimpse to the chunk so you can refer to the data.
+
+
+ Try real hard first before clicking here for the answer
+
+
+```r
+hot100 <- read_rds("data-processed/01-hot100.rds")
+
+# peek at the data
+hot100 %>% glimpse()
+```
+
+```
+## Rows: 327,895
+## Columns: 7
+## $ week_id 1958-08-02, 1958-08-02, 1958-08-02, 1958-08-02…
+## $ week_position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
+## $ song "Poor Little Fool", "Patricia", "Splish Splash"…
+## $ performer "Ricky Nelson", "Perez Prado And His Orchestra"…
+## $ previous_week_position NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
+## $ peak_position 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
+## $ weeks_on_chart 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
+```
+
+
+
+## Introducing dplyr
+
+One of the packages within the tidyverse is [dplyr](https://dplyr.tidyverse.org/). Dplyr allows us to transform our data frames in ways that let us explore the data and prepare it for visualizing. It's the R equivalent of common Excel functions like sort, filter and pivoting.
+
+> There is a cheatsheet on the [dplyr](https://dplyr.tidyverse.org/) that you might find useful.
+
+![dplyr. Images courtesy Hadley and Charlotte Wickham](images/transform-dplyractions.png){width=600px}
+
+We've used `select()`, `mutate()` and `arrange()` already, but we'll introduce more dplyr functions in this chapter.
+
+## Most appearances
+
+Our first question: Which performers had the most appearances on the Hot 100 chart at any position?
+
+### Group & Aggregate
+
+Before we dive into the code, let's review this video about "Group and Aggregate" to get a handle on the concept.
+
+
+
+
+Let's work through the logic of what we need to do to get our answer before I explain exactly how.
+
+- Each row in the data is one song on the chart.
+- Each of those rows has the `performer` which is the person(s) who performed it.
+- To figure out how many times a performer is in the data, we need to count the rows with the same performer.
+
+We'll use the tidyverse's version of Group and Aggregate to get this answer. It is actually two different functions within dplyr that often work together: `summarize()` and it's companion `group_by()`.
+
+### Summarize
+
+> `summarize()` and `summarise()` are the same function, as R supports both the American and UK spelling of summarize. I don't care which you use.
+
+We'll start with [`summarize()`](https://dplyr.tidyverse.org/reference/summarise.html) first because it can stand alone.
+
+The `summarize()` function **computes tables _about_ your data**. Our logic above has us wanting a "summary" of how many times certain performers appear in data, hence we use this function.
+
+Here is an example in a different context:
+
+![Learn about your data with summarize()](images/transform-summarise.png){width=500px}
+
+Much like the `mutate()` function we used earlier, we list the name of the new column first, then assign to it the function we want to accomplish using `=`.
+
+The example above is giving us two summaries: It is applying a function `mean()` (or average) on all the values in the `lifeExp` column, and then again with `min()`, the lowest life expectancy in the data.
+
+Let me show you a similar example with our data answer this question:
+
+Let's find the average "peak_position" of all songs on the charts through history:
+
+
+```r
+hot100 %>%
+ summarize(mean_position = mean(peak_position))
+```
+
+```
+## # A tibble: 1 × 1
+## mean_position
+##
+## 1 41.4
+```
+
+Meaning the average song on the charts tops out at No. 41.
+
+> This is an admittedly simplistic view of the average `peak_position` since the same song will be listed multiple times with possibly new `peak_position`s, but hopefully you get the idea.
+
+But in our case we want to **count** the number of rows, and there is a function for that: `n()`. (Think "number of observations or rows".)
+
+Let's write the code and run it on our code, then I'll explain:
+
+1. Set up a new section with a headline, text and empty code chunk.
+1. Inside the code chunk, add the following:
+
+
+```r
+hot100 %>%
+ summarize(appearances = n())
+```
+
+```
+## # A tibble: 1 × 1
+## appearances
+##
+## 1 327895
+```
+
+- We start with the tibble first and then pipe into `summarize()`.
+- Within the function, we define our summary:
+ + We name the new column "appearances" because that is a descriptive column name for our result.
+ + We set that new column to count the **n**umber of rows.
+
+Basically we are summarizing the total number of rows in the data.
+
+AN ASIDE: I often break up the inside a `summarize()` into new lines so they are easier to read.
+
+```r
+# you don't have to do this here, but know
+# it is helpful when you have more than one summary
+hot100 %>%
+ summarize(
+ appearances = n()
+ )
+
+```
+
+But your are asking: Professor, we want to count the performers, right?
+
+This is where `summarize()`'s close friend `group_by()` comes in.
+
+### Group by
+
+Here is a weird thing about [`group_by()`](https://dplyr.tidyverse.org/reference/group_by.html): It is always followed by another function. It really just pre-sorts data into groups so that whatever function is applied after happens within each individual group.
+
+If add a `group_by()` on our performers before our summarize function, it will put all of the "Aerosmith" rows together, then all the "Bad Company" rows together, etc. and then we count the rows _within_ those groups.
+
+1. Modify your code block to add the group_by:
+
+
+```r
+hot100 %>%
+ group_by(performer) %>%
+ summarize(appearances = n())
+```
+
+```
+## # A tibble: 10,061 × 2
+## performer appearances
+##
+## 1 "? (Question Mark) & The Mysterians" 33
+## 2 "'N Sync" 172
+## 3 "'N Sync & Gloria Estefan" 20
+## 4 "'N Sync Featuring Nelly" 20
+## 5 "'Til Tuesday" 53
+## 6 "\"Groove\" Holmes" 14
+## 7 "\"Little\" Jimmy Dickens" 10
+## 8 "\"Pookie\" Hudson" 1
+## 9 "\"Weird Al\" Yankovic" 91
+## 10 "(+44)" 1
+## # … with 10,051 more rows
+```
+
+What we get in return is a **summarize**d table that shows all 10,000+ different performers that have been on the charts, and **n**umber of rows in which they appear in the data.
+
+That's great, but who had the most?
+
+### Arrange the results
+
+Remember in our import notebook when we had to sort the songs by date. We'll use the same `arrange()` function here, but we'll change the result to **desc**ending order, because journalists almost always want to know the _most_ of something.
+
+1. Add the pipe and arrange function below and run it, then I'll explain.
+
+
+```r
+hot100 %>%
+ group_by(performer) %>%
+ summarize(appearances = n()) %>%
+ arrange(appearances %>% desc())
+```
+
+```
+## # A tibble: 10,061 × 2
+## performer appearances
+##
+## 1 Taylor Swift 1022
+## 2 Elton John 889
+## 3 Madonna 857
+## 4 Kenny Chesney 758
+## 5 Drake 746
+## 6 Tim McGraw 731
+## 7 Keith Urban 673
+## 8 Stevie Wonder 659
+## 9 Rod Stewart 657
+## 10 Mariah Carey 621
+## # … with 10,051 more rows
+```
+
+- We added the `arrange()` function and fed it the column of "appearances". If we left it with just that, then it would list the smallest values first.
+- _Within the arrange function_ we piped the "appearances" part into another function: `desc()` to change the order.
+
+So if you read that line in English it would be "arrange by (appearances AND THEN descending order)".
+
+You may also see this as `arrange(desc(appearances))`.
+
+### Get the top of the list
+
+We've printed 10,000 rows of data into our notebook when we really only wanted the Top 10 or so. You might think it doesn't matter, but your knitted HTML file will store all that data and can make it a big file (like in megabytes), so I try to avoid that when I can.
+
+We can use the `head()` command again to get our Top 10.
+
+1. Pipe the result into `head()` function set to 10 rows.
+
+
+```r
+hot100 %>%
+ group_by(performer) %>%
+ summarize(appearances = n()) %>%
+ arrange(appearances %>% desc()) %>%
+ head(10)
+```
+
+```
+## # A tibble: 10 × 2
+## performer appearances
+##
+## 1 Taylor Swift 1022
+## 2 Elton John 889
+## 3 Madonna 857
+## 4 Kenny Chesney 758
+## 5 Drake 746
+## 6 Tim McGraw 731
+## 7 Keith Urban 673
+## 8 Stevie Wonder 659
+## 9 Rod Stewart 657
+## 10 Mariah Carey 621
+```
+
+If I was to explain the code above in English, I would descibe it as this:
+
+- We start with the hot100 data AND THEN
+- we group the data by performer AND THEN
+- we summarize it by counting the number of rows in each group, calling the count "appearances" AND THEN
+- we arrange the result by appearances in descending order AND THEN
+- we kept just the first 10 rows
+
+Since we have our answer here and we're not using the result later, we don't need to create a new tibble or anything.
+
+> AN IMPORTANT NOTE: The list we've created here is based on unique `performer` names, and as such considers collaborations separately. For instance, Drake is near the top of the list but those are only songs he performed alone and not the many, many collaborations he has had with other performers. So, songs by "Drake" are counted separately than "Drake featuring Future" and "Future featuring Drake". You'll need to make this clear when you write your data drop in a later assignment.
+
+So, **Taylor Swift** ... is that who you guessed? A little history here, Swift past Elton John in the summer of 2019. Elton John has been around a long time, but Swift's popularity at a young age, plus changes in how Billboard counts plays in the modern era (like streaming) has rocketed her to the top. (Sorry, Rocket Man).
+
+## Performer/song with most appearances
+
+Our quest here is this: **Which performer/song combination has been on the charts the most number of weeks at any position?**
+
+This is very similar to our quest to find the artist with the most appearances, but we have to consider `performer` and `song` together because different artists can perform songs of the same name. For example, 17 different performers have a song called "Hold On" on the Hot 100 (at least through 2020).
+
+1. Start a new section (headline, text describing goal and a new code chunk.)
+1. Add the code below to the chunk and run it and then I'll outline it below.
+
+
+```r
+hot100 %>% # start with the data, and then ...
+ group_by(performer, song) %>% # group by performer and song, and then ..
+ summarize(appearances = n()) %>% # name the column, then fill it with the number of rows ...
+ arrange(appearances %>% desc()) # arrange by appearances in descending order
+```
+
+```
+## `summarise()` has grouped output by 'performer'. You can override using the `.groups` argument.
+```
+
+```
+## # A tibble: 29,389 × 3
+## # Groups: performer [10,061]
+## performer song appearances
+##
+## 1 Imagine Dragons Radioactive 87
+## 2 AWOLNATION Sail 79
+## 3 Jason Mraz I'm Yours 76
+## 4 The Weeknd Blinding Lights 76
+## 5 LeAnn Rimes How Do I Live 69
+## 6 LMFAO Featuring Lauren Bennett & GoonRock Party Rock Anthem 68
+## 7 OneRepublic Counting Stars 68
+## 8 Adele Rolling In The Deep 65
+## 9 Jewel Foolish Games/You Were… 65
+## 10 Carrie Underwood Before He Cheats 64
+## # … with 29,379 more rows
+```
+
+The logic is actually straightforward:
+
+- We want to count combinations over two columns: `song, performer`. When you group_by more then one column, it will group rows where the values are the same in all columns. i.e. all rows with both "Rush" as a performer and _Tom Sawyer_ as a song. Rows with "Rush" and _Red Barchetta_ will be considered in a different group.
+- With `summarize()`, we name the new column first (we chose `appearances`), then describe what should fill it. In this case we filled the column using the `n()`, which counts the number of rows in each group.
+- Once you have a summary table, we sort it by appearances and set it to **desc**ending order, which puts the highest value on the top.
+
+We will _often_ use `group_by()`, `summarize()` and `arrange()` together, which is why I'll refer to this as the GSA trio. They are like three close friends that always want to be together.
+
+### Introducing filter()
+
+I showed you `head()` in the previous quest and that is useful to quickly cut off a list, but it does so indiscriminately. In this case, if we use the default `head()` that retains six rows, it would cut right in the middle of a tie at 68 records. (at least with data through 2020). A better strategy is to cut off the list at a logical place using `filter()`. Let's dive into this new function:
+
+Filtering is one of those Basic Data Journalism Functions:
+
+
+
+
+The dplyr function `filter()` reduces the number of rows in our data based on one or more criteria.
+
+The syntax works like this:
+
+```r
+# this is psuedo code. don't run it
+data %>%
+ filter(variable comparison value)
+
+# example
+hot100 %>%
+ filter(performer == "Judas Priest")
+```
+
+The `filter()` function typically works in this order:
+
+- What is the variable (or column) you are searching in.
+- What is the comparison you want to do. Equal to? Greater than?
+- What is the observation (or value in the data) you are looking for?
+
+Note the two equals signs `==` in our Judas Priest example above. It's important to use two of them when you are looking for "equals", as a single `=` will not work, as that means something else in R.
+
+#### Comparisons: Logical tests
+
+There are a number of these logical test for the comparison:
+
+| Operator | Definition |
+|:------------------|:-------------------------|
+| x **<** y | Less than |
+| x **>** y | Greater than |
+| x **==** y | Equal to |
+| x **<=** y | Less than or equal to |
+| x **>=** y | Greater than or equal to |
+| x **!-** y | Not equal to |
+| x **%in%** c(y,z) | In a group |
+| **is.na(**x**)** | Is NA |
+| **!is.na(**x**)** | Is not NA |
+
+Where you apply a filter matters. If we filter before group by/summarize/arrange (GSA) we are focusing the data before we summarize. If we filter after the GSA, we are affecting only the results of the summarize function, which is what we want to do here.
+
+#### Filter to a logical cutoff
+
+In this case, I want you to use filter _after_ the GSA actions to include **only results with 65 or more appearances**.
+
+1. Edit your current chunk to add a filter as noted in the example below. I'll explain it afte.
+
+
+```r
+hot100 %>%
+ group_by(performer, song) %>%
+ summarize(appearances = n()) %>%
+ arrange(appearances %>% desc()) %>%
+ filter(appearances >= 65) # this is the new line
+```
+
+```
+## `summarise()` has grouped output by 'performer'. You can override using the `.groups` argument.
+```
+
+```
+## # A tibble: 9 × 3
+## # Groups: performer [9]
+## performer song appearances
+##
+## 1 Imagine Dragons Radioactive 87
+## 2 AWOLNATION Sail 79
+## 3 Jason Mraz I'm Yours 76
+## 4 The Weeknd Blinding Lights 76
+## 5 LeAnn Rimes How Do I Live 69
+## 6 LMFAO Featuring Lauren Bennett & GoonRock Party Rock Anthem 68
+## 7 OneRepublic Counting Stars 68
+## 8 Adele Rolling In The Deep 65
+## 9 Jewel Foolish Games/You Were … 65
+```
+
+Let's break down that last line:
+
+- `filter()` is the function.
+- The first argument in the function is the column we are looking in, `appearances` in our case.
+- We then provide a comparison operator `>=` to get "greater than or equal to".
+- We then give the value to compare, `65` in our case.
+
+## Song/Performer with most weeks at No. 1
+
+We introduced `filter()` in rhe last quest to limit the summary. For this quest you'll need to filter the data _before_ the group by/summarize/arrange trio.
+
+Let's review the quest: **Which performer/song combination was No. 1 for the most number of weeks?**
+
+While this quest is very similar to the one above, it _really_ helps to think about the logic of what you nneed and then build the query one line at a time to make each line works.
+
+Let's talk through the logic:
+
+- We are starting with our `hot100` data.
+- Do we want to consider all the data? In this case, no: We only want songs that have a `week_position` of 1. This means we will **filter** before any summarizing.
+- Then we want to count the number of rows with the same **performer** and **song** combinations. This means we need to `group_by` both `performer` and `song`.
+- Since we are **counting row**, we need use `n()` as our summarize function, which counts the **number** or rows in each group.
+
+So let's step through this with code:
+
+1. Create a section with a headline, text and code chunk
+2. Start with the `hot100` data and then pipe into `filter()`.
+1. Within the filter, set the `week_position` to be `==` to `1`.
+1. Run the result and check it
+
+
+```r
+hot100 %>%
+ filter(week_position == 1)
+```
+
+```
+## # A tibble: 3,279 × 7
+## week_id week_position song performer previous_week_p… peak_position
+##
+## 1 1958-08-02 1 Poor Lit… Ricky Nels… NA 1
+## 2 1958-08-09 1 Poor Lit… Ricky Nels… 1 1
+## 3 1958-08-16 1 Nel Blu … Domenico M… 2 1
+## 4 1958-08-23 1 Little S… The Elegan… 2 1
+## 5 1958-08-30 1 Nel Blu … Domenico M… 2 1
+## 6 1958-09-06 1 Nel Blu … Domenico M… 1 1
+## 7 1958-09-13 1 Nel Blu … Domenico M… 1 1
+## 8 1958-09-20 1 Nel Blu … Domenico M… 1 1
+## 9 1958-09-27 1 It's All… Tommy Edwa… 3 1
+## 10 1958-10-04 1 It's All… Tommy Edwa… 1 1
+## # … with 3,269 more rows, and 1 more variable: weeks_on_chart
+```
+
+The result should show _only_ songs with a `1` for `week_position`.
+
+The rest of our logic is just like our last quest. We need to group by the `song` and `performer` and then `summarize` using `n()` to count the rows.
+
+1. Edit your existing chunk to add the `group_by` and `summarize` functions. Name your new column `appearances` and set it to count the rows with `n()`.
+
+> While I say to write and run your code one line at a time, `group_by()` won't actually show you any different results, so I usually write `group_by()` and `summarize()` together.
+
+
+ Try this on your own before you peek
+
+```r
+hot100 %>%
+ filter(week_position == 1) %>%
+ group_by(performer, song) %>%
+ summarize(appearances = n())
+```
+
+```
+## `summarise()` has grouped output by 'performer'. You can override using the `.groups` argument.
+```
+
+```
+## # A tibble: 1,124 × 3
+## # Groups: performer [744]
+## performer song appearances
+##
+## 1 ? (Question Mark) & The Mysterians 96 Tears 1
+## 2 'N Sync It's Gonna Be Me 2
+## 3 24kGoldn Featuring iann dior Mood 8
+## 4 2Pac Featuring K-Ci And JoJo How Do U Want It/California L… 2
+## 5 50 Cent In Da Club 9
+## 6 50 Cent Featuring Nate Dogg 21 Questions 4
+## 7 50 Cent Featuring Olivia Candy Shop 9
+## 8 6ix9ine & Nicki Minaj Trollz 1
+## 9 A Taste Of Honey Boogie Oogie Oogie 3
+## 10 a-ha Take On Me 1
+## # … with 1,114 more rows
+```
+
+
+Look at your results to make sure you have the three columns you expect: performer, song and appearances.
+
+This doesn't quite get us where we want because it is alphabetically by the perfomer. You need to **arrange** the data to show us the most appearances at the top.
+
+1. Edit your chunk to add the `arrange()` function to sort by `appearances` in `desc()` order. This is just like our last quest.
+
+
+ Maybe check your last chunk on how you did this
+
+
+```r
+hot100 %>%
+ filter(week_position == 1) %>%
+ group_by(performer, song) %>%
+ summarize(appearances = n()) %>%
+ arrange(appearances %>% desc())
+```
+
+```
+## `summarise()` has grouped output by 'performer'. You can override using the `.groups` argument.
+```
+
+```
+## # A tibble: 1,124 × 3
+## # Groups: performer [744]
+## performer song appearances
+##
+## 1 Lil Nas X Featuring Billy Ray Cyrus Old Town Road 19
+## 2 Luis Fonsi & Daddy Yankee Featuring Justin Bieber Despacito 16
+## 3 Mariah Carey & Boyz II Men One Sweet Day 16
+## 4 Boyz II Men I'll Make Love… 14
+## 5 Elton John Candle In The … 14
+## 6 Los Del Rio Macarena (Bays… 14
+## 7 Mariah Carey We Belong Toge… 14
+## 8 Mark Ronson Featuring Bruno Mars Uptown Funk! 14
+## 9 The Black Eyed Peas I Gotta Feeling 14
+## 10 Whitney Houston I Will Always … 14
+## # … with 1,114 more rows
+```
+
+
+You have your answer now (you go, Lil Nas) but we are listing more than 1,000 rows. Let's cut this off at a logical place like we did in our last quest.
+
+1. Use `filter()` to cut your summary off at `appearances` of 14 or greater.
+
+
+ You've done this before ... try it on your own!
+
+
+```r
+hot100 %>%
+ filter(week_position == 1) %>%
+ group_by(performer, song) %>%
+ summarize(appearances = n()) %>%
+ arrange(appearances %>% desc()) %>%
+ filter(appearances >= 14)
+```
+
+```
+## `summarise()` has grouped output by 'performer'. You can override using the `.groups` argument.
+```
+
+```
+## # A tibble: 10 × 3
+## # Groups: performer [10]
+## performer song appearances
+##
+## 1 Lil Nas X Featuring Billy Ray Cyrus Old Town Road 19
+## 2 Luis Fonsi & Daddy Yankee Featuring Justin Bieber Despacito 16
+## 3 Mariah Carey & Boyz II Men One Sweet Day 16
+## 4 Boyz II Men I'll Make Love… 14
+## 5 Elton John Candle In The … 14
+## 6 Los Del Rio Macarena (Bays… 14
+## 7 Mariah Carey We Belong Toge… 14
+## 8 Mark Ronson Featuring Bruno Mars Uptown Funk! 14
+## 9 The Black Eyed Peas I Gotta Feeling 14
+## 10 Whitney Houston I Will Always … 14
+```
+
+
+Now you have the answers to the performer/song with the most weeks at No. 1 with a logical cutoff. If you add to the data, that logic will still hold and not cut off arbitrarily at a certain number of records.
+
+## Performer with most songs to reach No. 1
+
+Our new quest is this: **Which performer had the most songs reach No. 1?** The answer might be easier to guess if you know music history, but perhaps not.
+
+This sounds similar to our last quest, but there is a **distinct** difference. (That's a bad joke that will reveal itself here in a bit.)
+
+Again, let's think through the logic of what we have to do to get our answer:
+
+- We need to consider only No. 1 songs. (filter!)
+- Because a song could be No. 1 for more than one week, we need to consider the same song/performer combination only once. (We'll introduce a new function for this.)
+- Once we have all the unique No. 1 songs in a list, then we can group by **performer** and count how many times many times they are on the list.
+
+Let's start by getting the No. 1 songs. You've did this in the last quest.
+
+1. Create a new section with a headline, text and code chunk.
+1. Start with the `hot100` data and filter it so you only have `week_position` of 1.
+
+
+ Try on your own. You got this!%
+ filter(week_position == 1)
+```
+
+```
+## # A tibble: 3,279 × 7
+## week_id week_position song performer previous_week_p… peak_position
+##
+## 1 1958-08-02 1 Poor Lit… Ricky Nels… NA 1
+## 2 1958-08-09 1 Poor Lit… Ricky Nels… 1 1
+## 3 1958-08-16 1 Nel Blu … Domenico M… 2 1
+## 4 1958-08-23 1 Little S… The Elegan… 2 1
+## 5 1958-08-30 1 Nel Blu … Domenico M… 2 1
+## 6 1958-09-06 1 Nel Blu … Domenico M… 1 1
+## 7 1958-09-13 1 Nel Blu … Domenico M… 1 1
+## 8 1958-09-20 1 Nel Blu … Domenico M… 1 1
+## 9 1958-09-27 1 It's All… Tommy Edwa… 3 1
+## 10 1958-10-04 1 It's All… Tommy Edwa… 1 1
+## # … with 3,269 more rows, and 1 more variable: weeks_on_chart
+```
+
+
+Now look at the result. Note how "Poor Little Fool" shows up more than once? Other songs to as well. If we counted rows by `performer` now, we could count that song more than once. That's not what we want.
+
+### Using distinct()
+
+Our next challenge in our logic is to show only unique performer/song combinations. We do this with [`distinct()`](https://dplyr.tidyverse.org/reference/distinct.html).
+
+We feed the `distinct()` function with the variables we want to consider together, in our case the `perfomer` and `song`. All other columns are dropped since including them would mess up their distinctness.
+
+1. Add the distinct() function to your code chunk.
+
+
+```r
+hot100 %>%
+ filter(week_position == 1) %>%
+ distinct(song, performer)
+```
+
+```
+## # A tibble: 1,124 × 2
+## song performer
+##
+## 1 Poor Little Fool Ricky Nelson
+## 2 Nel Blu Dipinto Di Blu (Volaré) Domenico Modugno
+## 3 Little Star The Elegants
+## 4 It's All In The Game Tommy Edwards
+## 5 It's Only Make Believe Conway Twitty
+## 6 Tom Dooley The Kingston Trio
+## 7 To Know Him, Is To Love Him The Teddy Bears
+## 8 The Chipmunk Song The Chipmunks With David Seville
+## 9 Smoke Gets In Your Eyes The Platters
+## 10 Stagger Lee Lloyd Price
+## # … with 1,114 more rows
+```
+
+Now we have a list of just No. 1 songs!
+
+### Summarize the performers
+
+Now that we have our list of No. 1 songs, we can "count" the number of times a performer is in the list to know how many No. 1 songs they have.
+
+We'll again use the group_by/summarize combination for this, but we are only grouping by `performer` since that is what we are counting.
+
+1. Edit your chunk to add a group_by on `performer` and then a `summarize()` to count the rows. Name the new column `no_hits`. Run it.
+1. After you are sure the group_by/summarize runs, add an `arrange()` to show the `no1_hits` in descending order.
+
+
+ You've done this. Give it ago!
+
+```r
+hot100 %>%
+ filter(week_position == 1) %>%
+ distinct(song, performer) %>%
+ group_by(performer) %>%
+ summarize(no1_hits = n()) %>%
+ arrange(no1_hits %>% desc())
+```
+
+```
+## # A tibble: 744 × 2
+## performer no1_hits
+##
+## 1 The Beatles 19
+## 2 Mariah Carey 16
+## 3 Madonna 12
+## 4 Michael Jackson 11
+## 5 Whitney Houston 11
+## 6 The Supremes 10
+## 7 Bee Gees 9
+## 8 The Rolling Stones 8
+## 9 Janet Jackson 7
+## 10 Stevie Wonder 7
+## # … with 734 more rows
+```
+
+
+### Filter for a good cutoff
+
+Like we did earlier, use a `filter()` after your arrange to cut the list off at a logical place.
+
+1. Edit your chunk to filter the summary to show performer with `8` or more No. 1 hits.
+
+
+ You can do this. Really
+
+```r
+hot100 %>%
+ filter(week_position == 1) %>%
+ distinct(song, performer) %>%
+ group_by(performer) %>%
+ summarize(no1_hits = n()) %>%
+ arrange(no1_hits %>% desc()) %>%
+ filter(no1_hits >= 8)
+```
+
+```
+## # A tibble: 8 × 2
+## performer no1_hits
+##
+## 1 The Beatles 19
+## 2 Mariah Carey 16
+## 3 Madonna 12
+## 4 Michael Jackson 11
+## 5 Whitney Houston 11
+## 6 The Supremes 10
+## 7 Bee Gees 9
+## 8 The Rolling Stones 8
+```
+
+
+
+## No. 1 hits in last five years
+
+Which performer had the most songs reach No. 1 in the most recent five years?
+
+Let's talk through the logic. This is very similar to the No. 1 hits above but with two differences:
+
+- In addition to filtering for No. 1 songs, we also want to filter for songs in 2016-2020.
+- We might need to adjust our last filter for a better "break point".
+
+We haven't talked about filtering dates, so let me tell you this: You can use filter operations on dates just like you do any other text. This will give you rows _after_ 2015.
+
+```r
+filter(week_id > "2015-12-31")
+```
+
+But since we need this filter before our group, we can do this within the same filter function where we get the number one songs.
+
+1. Create a new section (headline, text, chunk).
+1. Build (from scratch) the same filter, group_by, summarize and arrange as above, but leave out the cut-off filter at the end. (We'll need to adjust that based on the results). Make sure it runs.
+1. EDIT your filter to put a comma after `week_position == 1` and then add this filter: `week_id > "2015-12-31"`. Run the code.
+1. Build a new cut-off filter at the end keep only rows with more than 1 `top_hits`.
+
+
+ No, really. Try it on your own first.
+
+
+```r
+hot100 %>%
+ filter(
+ week_position == 1,
+ week_id > "2015-12-31"
+ ) %>%
+ distinct(song, performer) %>%
+ group_by(performer) %>%
+ summarize(top_hits = n()) %>%
+ arrange(top_hits %>% desc()) %>%
+ filter(top_hits > 1)
+```
+
+```
+## # A tibble: 10 × 2
+## performer top_hits
+##
+## 1 Drake 5
+## 2 Ariana Grande 3
+## 3 Taylor Swift 3
+## 4 BTS 2
+## 5 Cardi B 2
+## 6 Ed Sheeran 2
+## 7 Justin Bieber 2
+## 8 Olivia Rodrigo 2
+## 9 The Weeknd 2
+## 10 Travis Scott 2
+```
+
+
+
+## Top 10 hits overall
+
+Which performer had the most Top 10 hits overall?
+
+This one I want you to do on your own.
+
+The logic is very similar to the "Most No. 1 hits" quest you did before, but you need to adjust your filter to find songs within position 1 through 10. Don't overthink it, but do recognize that the "top" of the charts are smaller numbers, not larger ones.
+
+1. Make a new section
+1. Describe what you are doing
+1. Do it using the group_by/summarize method
+1. Filter to cut off at a logical number or rows. (i.e., don't stop at a tie)
+
+### A shortcut: count()
+
+You are going to think I'm a horrible person, but there is an easier way to do this ...
+
+We count stuff in data science (and journalism) all the time. Because of this tidyverse has a shortcut to group and count rows of data. I needed to show you the long way because a) we will use `group_by()` and `summarize()` with other math that isn't just counting rows, and b) you need to understand what is happening when you use `count()`, which is really just using group_by/summarize underneath.
+
+The [`count()`](https://dplyr.tidyverse.org/reference/count.html) function takes the columns you want to group and then does the summarize on `n()` for you:
+
+
+```r
+hot100 %>%
+ count(performer)
+```
+
+```
+## # A tibble: 10,061 × 2
+## performer n
+##
+## 1 "? (Question Mark) & The Mysterians" 33
+## 2 "'N Sync" 172
+## 3 "'N Sync & Gloria Estefan" 20
+## 4 "'N Sync Featuring Nelly" 20
+## 5 "'Til Tuesday" 53
+## 6 "\"Groove\" Holmes" 14
+## 7 "\"Little\" Jimmy Dickens" 10
+## 8 "\"Pookie\" Hudson" 1
+## 9 "\"Weird Al\" Yankovic" 91
+## 10 "(+44)" 1
+## # … with 10,051 more rows
+```
+
+To get the same pretty table you still have to rename the new column and reverse the sort, you just do it differently as arguments within the `count()` function. You can view the [`count()` options here.](https://dplyr.tidyverse.org/reference/count.html)
+
+- Add this chunk to your notebook (with a note you are trying `count()`) so you have it to refer to.
+
+
+```r
+hot100 %>%
+ count(performer, name = "appearances", sort = TRUE) %>%
+ filter(appearances > 600)
+```
+
+```
+## # A tibble: 13 × 2
+## performer appearances
+##
+## 1 Taylor Swift 1022
+## 2 Elton John 889
+## 3 Madonna 857
+## 4 Kenny Chesney 758
+## 5 Drake 746
+## 6 Tim McGraw 731
+## 7 Keith Urban 673
+## 8 Stevie Wonder 659
+## 9 Rod Stewart 657
+## 10 Mariah Carey 621
+## 11 Michael Jackson 611
+## 12 Chicago 607
+## 13 Rascal Flatts 604
+```
+
+So you have to do the same things here as in our first quest, but when you just need a quick count to get an answer, then `count()` is brilliant.
+
+> IMPORTANT: We concentrate on using group_by/summarize/arrange because it can do so much more than `count()`. Count can ONLY count rows. It can't do any other kind of math in summarize.
+
+### Complex filters
+
+Don't do these, but you'll need them for reference later:
+
+If you want filter data for multiple criteria, you can write two equations and combine with `&`. Only rows with both sides being true are returned.
+
+```r
+# gives you only Poor Little Fool rows where song is No. 1, but not any other position
+filter(song == "Poor Little Fool" & week_position == 1)
+```
+
+If you want an "or" filter, then you write two equations with a `|` between them.
+
+> `|` is the _Shift_ of the `\` key above Return on your keyboard. That `|` character is also sometimes called a "pipe", which gets confusing in R with ` %>% `.)
+
+```r
+# gives you Taylor or Drake songs
+filter(performer == "Taylor Swift" | performer == "Drake")
+```
+
+If you have multiple criteria, you separate them with a comma `,`. Note I've also added returns to make it more readable.
+
+```r
+# gives us rows with either Taylor Swift or Drake, but only those at No. 1
+filter(
+ performer == "Taylor Swift" | performer == "Drake",
+ week_position == 1
+)
+```
+
+## Review of what we've learned
+
+We introduced a number of new functions in this lesson, most of them from the [dplyr](https://dplyr.tidyverse.org/) package. Mostly we filtered and summarized our data. Here are the functions we introduced in this chapter, many with links to documentation:
+
+- [`filter()`](https://dplyr.tidyverse.org/reference/filter.html) returns only rows that meet logical criteria you specify.
+- [`summarize()`](https://dplyr.tidyverse.org/reference/summarise.html) builds a summary table _about_ your data. You can count rows [`n()`](https://dplyr.tidyverse.org/reference/n.html) or do math on numerical values, like `mean()`.
+- [`group_by()`](https://dplyr.tidyverse.org/reference/group_by.html) is often used with `summarize()` to put data into groups before building a summary table based on the groups.
+- [`distinct()`](https://dplyr.tidyverse.org/reference/distinct.html) returns rows based on unique values in columns you specify. i.e., it deduplicates data.
+- [`count()`](https://dplyr.tidyverse.org/reference/count.html) is a shorthand for the group_by/summarize operation to count rows based on groups. You can name your summary columns and sort the data within the same function.
+
+## Turn in your project
+
+1. Make sure everything runs properly (Restart R and Run All Chunks) and then Knit to HTML.
+1. Zip the folder.
+1. Upload to the Canvas assignment.
+
+## Soundtrack for this assignment
+
+This lesson was constructed with the vibes of [The Bright Light Social Hour](https://www.thebrightlightsocialhour.com/home). They've never had a song on the Hot 100 (at least not through 2020).
diff --git a/docs/05-sums-import.md b/docs/05-sums-import.md
new file mode 100644
index 00000000..372f931b
--- /dev/null
+++ b/docs/05-sums-import.md
@@ -0,0 +1,520 @@
+# Summarize with math - import {#sums-import}
+
+With our Billboard assignment, we went through some common data wrangling processes — importing data, cleaning it and querying it for answers. All of our answers involved counting numbers of rows and we did so with two methods: The summary trio: `group_by`, `summmarize` and `arrange` (which I dub GSA), and then the shortcut `count()` that allows us to do all of that in one line.
+
+For this data story we need to leave `count` behind and stick with the summary trio GSA because now we must do different kinds of math within our summarize functions, mainly `sum()`.
+
+## About the story: Military surplus transfers
+
+In June 2020, Buzzfeed published the story [_Police Departments Have Received Hundreds Of Millions Of Dollars In Military Equipment Since Ferguson_](https://www.buzzfeednews.com/article/johntemplon/police-departments-military-gear-1033-program) about the amount of military equipment transferred to local law enforcement agencies since Michael Brown was killed in Ferguson, Missouri. After Brown's death there was a public outcry after "what appeared to be a massively disproportionate show of force during protests brought scrutiny to a federal program that transfers unused military equipment to local law enforcement." Reporter John Templon used data from the [Law Enforcement Support Office](https://www.dla.mil/DispositionServices/Offers/Reutilization/LawEnforcement/PublicInformation/) for the update on the program and published his [data analysis](https://github.com/BuzzFeedNews/2020-06-leso-1033-transfers-since-ferguson), which he did in Python.
+
+You will analyze the same dataset focusing on some local police agencies and write a short data drop about transfers to those agencies.
+
+### The LESO program
+
+The Defense Logistics Agency transfers surplus military equipment to local law enforcement through its [Law Enforcement Support Office](https://www.dla.mil/DispositionServices/Offers/Reutilization/LawEnforcement/PublicInformation/). You can find more information [about the program here](https://www.dla.mil/DispositionServices/Offers/Reutilization/LawEnforcement/ProgramFAQs/).
+
+The agency updates the data quarterly and the data I've collected contains transfers through **June 30, 2021**. The original file is linked from the headline "ALASKA - WYOMING AND US TERRITORIES".
+
+The data there comes in an Excel spreadsheet that has a new sheet for each state. I used R to pull the data from each sheet and combine it into a single data set and I'll cover the process I used in class, but you won't have to do that part.
+
+**I will supply a link to the combined data below.**
+
+### About the data
+
+There is no data dictionary or record layout included with the data but I have corresponded with the Defense Logistics Agency to get a decent understanding of what is included. Columns in bold are those we care about the most.
+
+- sheet: Which sheet the data came from. This is an artifact from the data merging script.
+- **state**: A two-letter designation for the state of the agency.
+- **agency_name**: This is the agency that got the equipment.
+- nsn: A special number that identifies the item. It is not germane to this specific assignment.
+- **item_name**: The item transferred. Googling the names can sometimes yield more info on specific items.
+- **quantity**: The number of the "units" the agency received.
+- ui: Unit of measurement (item, kit, etc.)
+- **acquisition_value**: a cost *per unit* for the item.
+- demil_code: Another special code not germane to this assignment.
+- demil_ic: Another special code not germane to this assignment.
+- **ship_date**: The date the item(s) were sent to the agency.
+- station_type: What kind of law enforcement agency made the request.
+
+Here is a glimse of our main columns of interest, except for the date:
+
+
+
+
+```
+## Rows: 10
+## Columns: 5
+## $ state "KY", "SC", "CA", "TX", "OH", "NC", "CA", "MI", "AZ"…
+## $ agency_name "MEADE COUNTY SHERIFF DEPT", "PROSPERITY POLICE DEPT…
+## $ item_name "GENERATOR SET,DIESEL ENGINE", "RIFLE,7.62 MILLIMETE…
+## $ quantity 5, 1, 32, 1, 1, 1, 1, 1, 1, 1
+## $ acquisition_value 4623.09, 138.00, 16.91, 749.00, 749.00, 138.00, 499.…
+```
+
+Each row of data is a transfer of a particular type of item from the U.S. Department of Defense to a local law enforcement agency. The row includes the name of the item, the quantity, and the value ($) of a single unit.
+
+What the data doesn't have is the **total value** of the items in the shipment. If there are 5 generators as noted in the first row above and the cost of each one is $4623.09, we have to multiply the `quantity` times the `acquisition_value` to get the total value of that equipment. We will do that as part of the assignment.
+
+The local agencies really only pay the shipping costs for the item, _so you can't say they paid for the items_, so the **total value** you calculate is the "value" of the items, not their cost to the local agency.
+
+## The questions we will answer
+
+All answers will be based on data from **Jan. 1, 2010** to present. In addition, we'll only consider **Texas** agencies as you answer the following.
+
+- For each agency in Texas, find the summed **quantity** and summed **total value** of the equipment they received. (When I say "summed" that means we'll add together all the values in the column.)
+ - Once you have the list, we'll think about what stands out and why?
+- We'll take the list above, but filter that summary to show only the following local agencies:
+ - AUSTIN POLICE DEPT
+ - SAN MARCOS POLICE DEPT
+ - TRAVIS COUNTY SHERIFFS OFFICE
+ - UNIV OF TEXAS SYSTEM POLICE HI_ED
+ - WILLIAMSON COUNTY SHERIFF'S OFFICE
+- For each of the agencies above we'll use summarize to get the _summed_ **quantity** and _summed_ **total_value** of each **item** shipped to the agency. We'll create a summarized list for each agency so we can write about each one.
+- You'll research some of the more interesting items the agencies received (i.e. Google the names) so you can include them in your data drop.
+
+## Create your project
+
+We will build the same project structure that we did with the Billboard project. In fact, all our class projects will have this structure. Since we've done this before, some of the directions are less detailed.
+
+1. With RStudio open, make sure you don't have a project open. Go to File > Close project.
+1. Use the create project button (or File > New project) to create a new project in a "New Directory". Name the directory "yourname-military-surplus".
+1. Create two folders: `data-raw` and `data-processed`.
+
+## Import/cleaning notebook
+
+Again, like Billboard, we'll create a notebook specifically for downloading, cleaning and prepping our data.
+
+1. Create your RNotebook.
+1. Rename the title "Military Surplus import/clean".
+1. Remove the rest of the boilerplate template.
+1. Save the file and name it `01-import.Rmd`.
+
+### Add the goals of the notebook
+
+1. In Markdown, add a headline noting these are notebook goals.
+1. Add the goals below:
+
+```text
+- Download the data
+- Import the data
+- Clean datatypes
+- Remove unnecessary columns
+- Create a total_value column
+- Filter to Texas agencies
+- Filter the date range (since Jan. 1 2010)
+- Export the cleaned data
+```
+
+> NOTE: Most of these are pretty standard in a import/cleaning notebook. Filtering to Texas agencies is specific to this data set, but we would do all these other things in all projects.
+
+### Add a setup section
+
+This is the section where we add our libraries and such. Again, every notebook has this section, though the packages may vary on need.
+
+1. Add a headline and text about what we are doing: Our project setup.
+2. Add a code chunk to load the libraries. You should only need `tidyverse` for this notebook because the data already has clean names (no need for janitor) and the dates will import correctly (no need for lubridate).
+
+
+```r
+library(tidyverse)
+```
+
+
+### Download the data
+
+1. A new section means a new headline and description. Add it. It is good practice to describe and link to the data you will be using. You can use this:
+
+```text
+The Defense Logistics Agency transfers surplus military equipment to local law enforcement through its [Law Enforcement Support Office](https://www.dla.mil/DispositionServices/Offers/Reutilization/LawEnforcement/PublicInformation/). You can find more information [about the program here](https://www.dla.mil/DispositionServices/Offers/Reutilization/LawEnforcement/ProgramFAQs/).
+```
+
+1. Use the `download.file()` function to download the date into your `data-raw` folder. Remember you need two arguments:
+
+```r
+download.file("url_to_data", "path_to_folder/filename.csv")
+```
+
+- The data can be found at this url: `https://github.com/utdata/rwd-r-leso/blob/main/data-processed/leso.csv?raw=true`
+- It should be saved into your `data-raw` folder with a name for the file.
+
+Once you've built your code chunk and run it, you should make sure the file downloaded into the correct place: in your `data-raw` folder.
+
+
+ You should be able to do this on your own. Really.
+
+
+```r
+# You can comment the line below once you have the data
+download.file("https://github.com/utdata/rwd-r-leso/blob/main/data-processed/leso.csv?raw=true",
+ "data-raw/leso.csv")
+```
+
+
+
+### Import the data
+
+We are again working with a CSV, or comma-separated-values text file.
+
+1. Add a new section: Headline, text if needed, code chunk.
+
+I suggest you build the code chunk a bit at a time in this order:
+
+1. Use `read_csv()` to read the file from our `data-raw` folder.
+1. Edit that line to put the result into a tibble object using `<-`. Name your new tibble `leso`.
+1. Print the tibble as a table to the screen again by putting the tibble object on a new line and running it. This allows you to see it in columnar form.
+
+
+ Try real hard first before clicking here for the answer. Note the book will also show the response.
+
+
+```r
+# assigning the tibble
+leso <- read_csv("data-raw/leso.csv")
+```
+
+```
+## Rows: 129348 Columns: 12
+```
+
+```
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (7): state, agency_name, nsn, item_name, ui, demil_code, station_type
+## dbl (4): sheet, quantity, acquisition_value, demil_ic
+## dttm (1): ship_date
+```
+
+```
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+```
+
+```r
+# printing the tibble
+leso
+```
+
+```
+## # A tibble: 129,348 × 12
+## sheet state agency_name nsn item_name quantity ui acquisition_val…
+##
+## 1 1 AL ABBEVILLE PO… 1005-… MOUNT,RIFLE 10 Each 1626
+## 2 1 AL ABBEVILLE PO… 1240-… SIGHT,REFLEX 9 Each 333
+## 3 1 AL ABBEVILLE PO… 1240-… OPTICAL SIG… 1 Each 246.
+## 4 1 AL ABBEVILLE PO… 1385-… UNMANNED VE… 1 Each 10000
+## 5 1 AL ABBEVILLE PO… 2320-… TRUCK,UTILI… 1 Each 62627
+## 6 1 AL ABBEVILLE PO… 2320-… TRUCK,UTILI… 1 Each 62627
+## 7 1 AL ABBEVILLE PO… 2355-… MINE RESIST… 1 Each 658000
+## 8 1 AL ABBEVILLE PO… 2540-… BALLISTIC B… 10 Kit 15872.
+## 9 1 AL ABBEVILLE PO… 5855-… ILLUMINATOR… 10 Each 926
+## 10 1 AL ABBEVILLE PO… 6760-… CAMERA ROBOT 1 Each 1500
+## # … with 129,338 more rows, and 4 more variables: demil_code ,
+## # demil_ic , ship_date , station_type
+```
+
+
+
+### Glimpse the data
+
+1. In a new block, print the tibble but pipe it into `glimpse()` so you can see all the column names.
+
+
+```r
+leso %>% glimpse()
+```
+
+```
+## Rows: 129,348
+## Columns: 12
+## $ sheet 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
+## $ state "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
+## $ agency_name "ABBEVILLE POLICE DEPT", "ABBEVILLE POLICE DEPT", "A…
+## $ nsn "1005-01-587-7175", "1240-01-411-1265", "1240-DS-OPT…
+## $ item_name "MOUNT,RIFLE", "SIGHT,REFLEX", "OPTICAL SIGHTING AND…
+## $ quantity 10, 9, 1, 1, 1, 1, 1, 10, 10, 1, 5, 10, 11, 10, 1, 3…
+## $ ui "Each", "Each", "Each", "Each", "Each", "Each", "Eac…
+## $ acquisition_value 1626.00, 333.00, 245.88, 10000.00, 62627.00, 62627.0…
+## $ demil_code "D", "D", "D", "Q", "C", "C", "C", "D", "D", "D", "D…
+## $ demil_ic 1, 1, NA, 3, 1, 1, 1, 1, 1, 7, 1, 1, 1, 1, 1, NA, 1,…
+## $ ship_date 2016-09-19, 2016-09-14, 2016-06-02, 2017-03-28, 201…
+## $ station_type "State", "State", "State", "State", "State", "State"…
+```
+
+#### Checking datatypes
+
+Take a look at your glimpse returns. These are the things to watch for:
+
+- Are your variable names (column names) clean? All lowercase with `_` separating words?
+- Are dates saved in a date format? `ship_date` looks good at ``, which means "datetime".
+- Are your numbers really numbers? `acquisition_value` is the column we are most concerned about here, and it looks good.
+
+This data set looks good (because I pre-prepared it fo you), but you always want to check and make corrections, like we did to fix the date in the Billboard assignment.
+
+### Remove unnecessary columns
+
+Sometimes at this point in a project, you might not know what columns you need to keep and which you could do without. The nice thing about doing this with code in a notebook is we can always go back, make corrections and run our notebook again. In this case, I'm going to tell you which columns you can remove so we have a tighter data set to work with. We'll also learn a cool trick with `select()`.
+
+1. Start a new section with a headline, text to explain you are removing unneeded columns.
+2. Add a code chunk and the following code. I'll explain it below.
+
+
+```r
+leso_tight <- leso %>%
+ select(
+ -sheet,
+ -nsn,
+ -starts_with("demil")
+ )
+
+leso_tight %>% glimpse()
+```
+
+```
+## Rows: 129,348
+## Columns: 8
+## $ state "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
+## $ agency_name "ABBEVILLE POLICE DEPT", "ABBEVILLE POLICE DEPT", "A…
+## $ item_name "MOUNT,RIFLE", "SIGHT,REFLEX", "OPTICAL SIGHTING AND…
+## $ quantity 10, 9, 1, 1, 1, 1, 1, 10, 10, 1, 5, 10, 11, 10, 1, 3…
+## $ ui "Each", "Each", "Each", "Each", "Each", "Each", "Eac…
+## $ acquisition_value 1626.00, 333.00, 245.88, 10000.00, 62627.00, 62627.0…
+## $ ship_date 2016-09-19, 2016-09-14, 2016-06-02, 2017-03-28, 201…
+## $ station_type "State", "State", "State", "State", "State", "State"…
+```
+
+We did a select like this with billboard, but note the third item within the `select()`:
+
+`-starts_with("demil")`.
+
+This removes both the `demil_code` and `demil_ic` columns in one move by finding all the columns that "start with 'demil'". The `-` before it negates (or removes) the columns.
+
+There are other special operators you can use with select(), like: `ends_with()`, `contains()` and many more. [Check out the docs on the select function](https://dplyr.tidyverse.org/reference/select.html).
+
+So now we have a tibble called `leso_tight` that we will work with in the next section.
+
+### Create a total_value column
+
+When we used `mutate()` to convert the date in the Billboard assignment, we were reassigning values in each row of a column back into the same column.
+
+In this assignment, we will use `mutate()` to create a **new** column with new values based on a calculation (`quantity` multiplied by the `acquisition_value`) for each row. Let's review the concept first.
+
+If you started with data like this:
+
+| item | item_count | item_value |
+|-------|-----------:|-----------:|
+| Bread | 2 | 1.5 |
+| Milk | 1 | 2.75 |
+| Beer | 3 | 9 |
+
+And wanted to create a total value of each item in the table, you would use `mutate()`:
+
+```r
+data %>%
+ mutate(total_value = item_count * item_value)
+```
+
+And you would get a return like this, with your new `total_value` column added at the end:
+
+| item | item_count | item_value | total_value |
+|-------|-----------:|-----------:|------------:|
+| Bread | 2 | 1.5 | 3 |
+| Milk | 1 | 2.75 | 2.75 |
+| Beer | 3 | 9 | 27 |
+
+Other math operators work as well: `+`, `-`, `*` and `/`.
+
+So, now that we've talked about how it is done, I want you to:
+
+1. Create a new section with headline, text and code chunk.
+1. Use `mutate()` to create a new `total_value` column that multiplies `quantity` times `acquisition_value`.
+2. Assign those results into a new tibble called `leso_total` so we can all be on the same page.
+3. Glimpse the new tibble so you can check the results.
+
+
+ Try it on your own. You can figure it out!
+
+
+```r
+leso_total <- leso_tight %>%
+ mutate(
+ total_value = quantity * acquisition_value
+ )
+
+leso_total %>% glimpse()
+```
+
+```
+## Rows: 129,348
+## Columns: 9
+## $ state "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL", "AL"…
+## $ agency_name "ABBEVILLE POLICE DEPT", "ABBEVILLE POLICE DEPT", "A…
+## $ item_name "MOUNT,RIFLE", "SIGHT,REFLEX", "OPTICAL SIGHTING AND…
+## $ quantity 10, 9, 1, 1, 1, 1, 1, 10, 10, 1, 5, 10, 11, 10, 1, 3…
+## $ ui "Each", "Each", "Each", "Each", "Each", "Each", "Eac…
+## $ acquisition_value 1626.00, 333.00, 245.88, 10000.00, 62627.00, 62627.0…
+## $ ship_date 2016-09-19, 2016-09-14, 2016-06-02, 2017-03-28, 201…
+## $ station_type "State", "State", "State", "State", "State", "State"…
+## $ total_value 16260.00, 2997.00, 245.88, 10000.00, 62627.00, 62627…
+```
+
+
+
+
+**Check that it worked!!**. Use the glimpsed data to check the first item: For me, 10 * 1626.00 = 16260.00, which is correct!
+
+Note that new columns are added at the end of the tibble. That is why I suggested you glimpse the data instead of printing the tibble so you can easily see results on one screen.
+
+### Filtering our data
+
+You used `filter()` in the Billboard lesson to get No. 1 songs and to get a date range of data. We need to do something similar here to get only Texas data of a certain date range, but we'll build the filters one at a time so we can check the results.
+
+#### Apply the TX filter
+
+1. Create a new section with headlines and text that denote you are filtering the data to Texas and since Jan. 1, 2010
+2. Create the code chunk and start your filter process using the `leso_total` tibble.
+3. Use `filter()` on the `state` column to keep all rows with "TX".
+
+
+ Really, you got this.
+
+
+```r
+leso_total %>%
+ filter(
+ state == "TX"
+ )
+```
+
+```
+## # A tibble: 8,684 × 9
+## state agency_name item_name quantity ui acquisition_val…
+##
+## 1 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 2 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 3 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 4 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 5 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 6 TX ABERNATHY POLICE DEPT RIFLE,5.56 MILLI… 1 Each 749
+## 7 TX ABERNATHY POLICE DEPT RIFLE,5.56 MILLI… 1 Each 749
+## 8 TX ABERNATHY POLICE DEPT SIGHT,REFLEX 5 Each 333
+## 9 TX ABERNATHY POLICE DEPT TRUCK,UTILITY 1 Each 62627
+## 10 TX ABILENE POLICE DEPT RIFLE,5.56 MILLI… 1 Each 499
+## # … with 8,674 more rows, and 3 more variables: ship_date ,
+## # station_type , total_value
+```
+
+
+
+How do you know if it worked? Well the first column in the data is the `state` column, so they should all start with "TX". Also note you started with nearly 130k observations (rows), and there are only 8,600+ in Texas.
+
+#### Add the date filter
+
+1. Now, **EDIT THAT SAME CHUNK** to add a new part to your filter to also get rows with a `ship_date` of 2010-01-01 or later.
+
+
+ If you do this on your own, treat yourself to a cookie
+
+
+```r
+leso_total %>%
+ filter(
+ state == "TX",
+ ship_date >= "2010-01-01"
+ )
+```
+
+```
+## # A tibble: 7,407 × 9
+## state agency_name item_name quantity ui acquisition_val…
+##
+## 1 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 2 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 3 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 4 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 5 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .… 1 Each 58.7
+## 6 TX ABERNATHY POLICE DEPT RIFLE,5.56 MILLI… 1 Each 749
+## 7 TX ABERNATHY POLICE DEPT RIFLE,5.56 MILLI… 1 Each 749
+## 8 TX ABERNATHY POLICE DEPT SIGHT,REFLEX 5 Each 333
+## 9 TX ABERNATHY POLICE DEPT TRUCK,UTILITY 1 Each 62627
+## 10 TX ALLEN POLICE DEPT SIGHT,REFLEX 1 Each 333
+## # … with 7,397 more rows, and 3 more variables: ship_date ,
+## # station_type , total_value
+```
+
+
+
+#### Checking the results with summary()
+
+How do you know this date filter worked? Well, we went from 8600+ rows to 7400+ rows, so we did something. You might look at the results table and click over to the `ship_date` columns so you can see some of the results, but you can't be sure the top row is the oldest. We could use an `arrange()` to test that, but I have another suggestion: `summary()`.
+
+Now, `summary()` is different than `summarize()`, which we'll do plenty of in a mintue. The `summary()` function will show you some results about each column in your data, and when it is a number or date, it will give you some basic stats like min, max and median values.
+
+1. Use the image below to add a `summary()` function to your filtering data chunk.
+2. Once you've confirmed that the "Min." of `ship_date` is not older than 2010, then **REMOVE THE SUMMARY STATEMENT**.
+
+If you leave the summary statement there when we create our updated tibble, then you'll "save" the summary and not the data.
+
+
+![Summary function](images/military-date-summary.png)
+
+#### Add filtered data to new tibble
+
+Once you've checked and removed the summary, you can save your filtered data into a new tibble.
+
+1. Edit the filtering chunk to put the results into a new tibble called `leso_filtered`.
+
+
+ Seriously? You were going to look?
+
+
+```r
+leso_filtered <- leso_total %>%
+ filter(
+ state == "TX",
+ ship_date >= "2010-01-01"
+ )
+
+leso_filtered %>% glimpse()
+```
+
+```
+## Rows: 7,407
+## Columns: 9
+## $ state "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX"…
+## $ agency_name "ABERNATHY POLICE DEPT", "ABERNATHY POLICE DEPT", "A…
+## $ item_name "PISTOL,CALIBER .45,AUTOMATIC", "PISTOL,CALIBER .45,…
+## $ quantity 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
+## $ ui "Each", "Each", "Each", "Each", "Each", "Each", "Eac…
+## $ acquisition_value 58.71, 58.71, 58.71, 58.71, 58.71, 749.00, 749.00, 3…
+## $ ship_date 2011-11-03, 2011-11-03, 2011-11-03, 2011-11-03, 201…
+## $ station_type "State", "State", "State", "State", "State", "State"…
+## $ total_value 58.71, 58.71, 58.71, 58.71, 58.71, 749.00, 749.00, 1…
+```
+
+
+
+### Export cleaned data
+
+Now that we have our data selected, mutated and filtered how we want it, we can export your `leso_filtered` tibble into an `.rds` file to use in our analysis notebook. If you recall, we use the `.rds` format because it will remember data types and such.
+
+1. Create a new section with headline and text explaining that you are exporting the data.
+1. Do it. The function you need is called `write_rds` and you need to give it a path/name that saves the file in the `data-processed` folder. Name it `01-leso-tx.rds` so you know it a) came from the first notebook b) is the Texas only data. **Well-formatted, descriptive file names are important to your future self and other colleagues**.
+
+
+ Try it
+
+
+```r
+leso_filtered %>% write_rds("data-processed/01-leso-tx.rds")
+```
+
+
+
+## Things we learned in this lesson
+
+This chapter was similar to when we imported data for Billboard, but we did introduce a couple of new concepts:
+
+- `starts_with()` can be used within a `select()` function to select columns with similar names. There are also `ends_with()` and `contains()` and others. [See the documentation on Select](https://dplyr.tidyverse.org/reference/select.html).
+- `summary()` gives you descriptive statistics about your tibble. We used it to check the "min" date, but you can also see averages (mean), max and medians.
diff --git a/docs/06-sums-analysis.md b/docs/06-sums-analysis.md
new file mode 100644
index 00000000..145b4052
--- /dev/null
+++ b/docs/06-sums-analysis.md
@@ -0,0 +1,459 @@
+# Summarize with math - analysis {#sums-analyze}
+
+In the last chapter, we covered the overall story about the LESO data ... that local law enforcement agencies can get surplus military equipment from the U.S. Department of Defense. We downloaded a pre-processed version of the data and filtered it to just Texas records over a specific time period (2010 to present), and used `mutate()` to create a new column calculated fron other variables in the data.
+
+## Learning goals of this lesson
+
+In this chapter we will start querying the data using **summarize with math**, basically using summarize to add values in a column instead of counting rows, which we did with the Billboard assignment.
+
+Our learning goals are:
+
+- To use the combination of `group_by()`, `summarize()` and `arrange()` to add columns of data using `sum()`.
+- To use different `group_by()` groupings in specific ways to get desired results.
+- To practice using `filter()` on those summaries to better see certain results, including filtering with*in* a vector (or list of strings).
+- We'll research and write about some of the findings, practicing data-centric ledes and sentences describing data.
+
+## Questions to answer
+
+A reminder of what we are looking for: All answers are be based on data from **Jan. 1, 2010** to present for only consider **Texas** agencies. We did this filtering already.
+
+- For each agency in Texas, find the summed **quantity** and summed **total value** of the equipment they received. (When I say "summed" that means we'll add together all the values in the column.)
+ - Once you have the list, we'll think about what stands out and why?
+- We'll take the list above, but filter that summary to show only the following local agencies:
+ - AUSTIN POLICE DEPT
+ - SAN MARCOS POLICE DEPT
+ - TRAVIS COUNTY SHERIFFS OFFICE
+ - UNIV OF TEXAS SYSTEM POLICE HI_ED
+ - WILLIAMSON COUNTY SHERIFF'S OFFICE
+- For each of the agencies above we'll use summarize to get the _summed_ **quantity** and _summed_ **total_value** of each **item** shipped to the agency. We'll create a summarized list for each agency so we can write about each one.
+- You'll research some of the more interesting items the agencies received (i.e. Google the names) so you can include them in your data drop.
+
+## Set up the analysis notebook
+
+Before we get into how to do this, let's set up our analysis notebook.
+
+1. Make sure you have your military surplus project open in RStudio. If you have your import notebook open, close it and use Run > Restart R and Clear Output.
+1. Create a new RNotebook and edit the title as "Military surplus analysis".
+1. Remove the boilerplate text.
+1. Create a setup section (headline, text and code chunk) that loads the tidyverse library.
+1. Save the notebook at `02-analysis.Rmd`.
+
+We've started each notebook like this, so you should be able to do this on your own now.
+
+
+
+### Load the data into a tibble
+
+1. Next create an import section (headline, text and chunk) that loads the data from the previous notebook and save it into a tibble called `tx`.
+1. Add a `glimpse()` of the data for your reference.
+
+We did this in Billboard and you should be able to do it. You'll use `read_rds()` and find your data in your data-processed folder.
+
+
+ Remember your data is in data-processed
+
+```r
+tx <- read_rds("data-processed/01-leso-tx.rds")
+
+tx %>% glimpse()
+```
+
+```
+## Rows: 7,407
+## Columns: 9
+## $ state "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX", "TX"…
+## $ agency_name "ABERNATHY POLICE DEPT", "ABERNATHY POLICE DEPT", "A…
+## $ item_name "PISTOL,CALIBER .45,AUTOMATIC", "PISTOL,CALIBER .45,…
+## $ quantity 1, 1, 1, 1, 1, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
+## $ ui "Each", "Each", "Each", "Each", "Each", "Each", "Eac…
+## $ acquisition_value 58.71, 58.71, 58.71, 58.71, 58.71, 749.00, 749.00, 3…
+## $ ship_date 2011-11-03, 2011-11-03, 2011-11-03, 2011-11-03, 201…
+## $ station_type "State", "State", "State", "State", "State", "State"…
+## $ total_value 58.71, 58.71, 58.71, 58.71, 58.71, 749.00, 749.00, 1…
+```
+
+
+
+You should see the `tx` object in you Environment.
+
+## How to tackle summaries
+
+As we get into the first quest, let's talk about "how" we do summaries.
+
+When I am querying my data, I start by envisioning what the result should look like.
+
+Let's take the first question: For each agency in Texas, find the summed **quantity** and summed **total value** of the equipment they received.
+
+Let's break this down:
+
+- "For each agency in Texas". For all the questions, we only want Texas agencies. We took care of this in nthe import book so TX agencies should already filtered.
+- But the "For each agency" part tells me I need to **group_by** the `agency_name` so I can summarize totals within each agency.
+- "find the summed **quantity** and summed **total_value**": Because I'm looking for a total (or `sum()` of columns) I need `summarize()`.
+
+So I envision my result looking like this:
+
+| agency_name | summed_quantity | summed_total_value |
+|--------------------|-----------:|------------:|
+| AFAKE POLICE DEPT | 6419 | 10825707.5 |
+| BFAKE SHERIFF'S OFFICE | 381 | 3776291.52 |
+| CFAKE SHERIFF'S OFFICE | 270 | 3464741.36 |
+| DFAKE POLICE DEPT | 1082 | 3100420.57 |
+
+The first columns in that summary will be our grouped values. This example is only grouping by one thing, `agency_name`. The other two columns are the summed values I'm looking to generate.
+
+### Summaries with math
+
+We'll start with the **total_quantity**.
+
+1. Add a new section (headline, text and chunk) that describes the first quest: For each agency in Texas, find the summed **quantity** and summed **total value** of the equipment they received.
+1. Add the code below into the chunk and run it.
+
+
+
+```r
+tx %>%
+ group_by(agency_name) %>%
+ summarize(
+ sum_quantity = sum(quantity)
+ )
+```
+
+```
+## # A tibble: 357 × 2
+## agency_name sum_quantity
+##
+## 1 ABERNATHY POLICE DEPT 13
+## 2 ALLEN POLICE DEPT 11
+## 3 ALVARADO ISD PD 4
+## 4 ALVIN POLICE DEPT 539
+## 5 ANDERSON COUNTY SHERIFFS OFFICE 8
+## 6 ANDREWS COUNTY SHERIFF OFFICE 12
+## 7 ANSON POLICE DEPT 9
+## 8 ANTHONY POLICE DEPT 10
+## 9 ARANSAS PASS POLICE DEPARTMENT 38
+## 10 ARP POLICE DEPARTMENT 18
+## # … with 347 more rows
+```
+
+Let's break this down a little.
+
+- We start with the `tx` data, and then ...
+- We group by `agency_name`. This organizes our data (behind the scenes) so our summarize actions will happen _within each agency_. Now I normally say run your code one line at a time, but you would note be able to _see_ the groupings, so I usually write `group_by()` and `summarize()` together.
+- In `summarize()` we first name our new column: `sum_quantity`. We could call this whatever we want, but good practice is to name it what it is. We use good naming techniqes and split the words using `_`. I also use all lowercase characters.
+- We set that column to equal `=` the **sum of all values in the `quantity` column**. `sum()` is the function, and we feed it the column we want to add together: `quantity`.
+- I put the inside of the summarize function in its own line because we will add to it. I enhances readability. RStudio will help you with the indenting, etc.
+
+If you look at the first line of the return, it is taking all the rows for the "ABERNATHY POLICE DEPT" and then adding together all the values in the `quantity` field.
+
+If you wanted to test this (and it is a real good idea), you might look at the data from one of the values and check the math. Here are the Abernathy rows. I usually do these tests in a code chunk of their own, and sometimes I delete them after I'm sure it worked.
+
+
+```r
+tx %>%
+ filter(agency_name == "ABERNATHY POLICE DEPT")
+```
+
+```
+## # A tibble: 9 × 9
+## state agency_name item_name quantity ui acquisition_val…
+##
+## 1 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .4… 1 Each 58.7
+## 2 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .4… 1 Each 58.7
+## 3 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .4… 1 Each 58.7
+## 4 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .4… 1 Each 58.7
+## 5 TX ABERNATHY POLICE DEPT PISTOL,CALIBER .4… 1 Each 58.7
+## 6 TX ABERNATHY POLICE DEPT RIFLE,5.56 MILLIM… 1 Each 749
+## 7 TX ABERNATHY POLICE DEPT RIFLE,5.56 MILLIM… 1 Each 749
+## 8 TX ABERNATHY POLICE DEPT SIGHT,REFLEX 5 Each 333
+## 9 TX ABERNATHY POLICE DEPT TRUCK,UTILITY 1 Each 62627
+## # … with 3 more variables: ship_date , station_type ,
+## # total_value
+```
+
+If we look at the `quantity` column there and eyeball all the rows, we see there 8 rows with a value of "1", and one row with a value of "5". 8 + 5 = 13, which matches our `sum_quantity` answer in our summary table. We're good!
+
+### Add the total_value
+
+We don't have to stop at one summary. We can perform multiple summarize actions on the same or different columns within the same expression.
+
+**Edit your summary chunk** to:
+
+1. Add add a comma after the first summarize action.
+1. Add the new expression to give us the `sum_total_value` and run it.
+
+
+```r
+tx %>%
+ group_by(agency_name) %>%
+ summarize(
+ sum_quantity = sum(quantity),
+ sum_total_value = sum(total_value)
+ )
+```
+
+```
+## # A tibble: 357 × 3
+## agency_name sum_quantity sum_total_value
+##
+## 1 ABERNATHY POLICE DEPT 13 66084.
+## 2 ALLEN POLICE DEPT 11 1404024
+## 3 ALVARADO ISD PD 4 480
+## 4 ALVIN POLICE DEPT 539 2545240.
+## 5 ANDERSON COUNTY SHERIFFS OFFICE 8 827891
+## 6 ANDREWS COUNTY SHERIFF OFFICE 12 1476
+## 7 ANSON POLICE DEPT 9 5077
+## 8 ANTHONY POLICE DEPT 10 7490
+## 9 ARANSAS PASS POLICE DEPARTMENT 38 571738
+## 10 ARP POLICE DEPARTMENT 18 5789.
+## # … with 347 more rows
+```
+
+### Arrange the results
+
+OK, this gives us our answers, but in alphabetical order. We want to arrange the data so it gives us the most `sum_total_value` in **desc**ending order.
+
+1. EDIT your block to add an `arrange()` function below
+
+
+```r
+tx %>%
+ group_by(agency_name) %>%
+ summarize(
+ sum_quantity = sum(quantity),
+ sum_total_value = sum(total_value)
+ ) %>%
+ arrange(sum_total_value %>% desc())
+```
+
+```
+## # A tibble: 357 × 3
+## agency_name sum_quantity sum_total_value
+##
+## 1 HOUSTON POLICE DEPT 6419 10825708.
+## 2 HARRIS COUNTY SHERIFF'S OFFICE 381 3776292.
+## 3 DPS SWAT- TEXAS RANGERS 1730 3520630.
+## 4 JEFFERSON COUNTY SHERIFF'S OFFICE 270 3464741.
+## 5 SAN MARCOS POLICE DEPT 1082 3100421.
+## 6 AUSTIN POLICE DEPT 1458 2741021.
+## 7 MILAM COUNTY SHERIFF DEPT 125 2723192.
+## 8 ALVIN POLICE DEPT 539 2545240.
+## 9 HARRIS COUNTY CONSTABLE PCT 3 293 2376945.
+## 10 PARKS AND WILDLIFE DEPT 5608 2325655.
+## # … with 347 more rows
+```
+
+### Consider the results
+
+Is there anything that sticks out in that list? It helps if you know a little bit about Texas cities and counties, but here are some thoughts to ponder:
+
+- Houston is the largest city in the state (4th largest in the country). It makes sense that it tops the list. Same for Harris County or even the state police force. Austin being up there is also not crazy, as it's almost a million people.
+- But what about San Marcos (63,220)? Or Milam County (24,770)? Those are way smaller cities and law enforcement agencies. They might be worth looking into.
+
+Perhaps we should look some at the police agencies closest to us.
+
+## Looking a local agencies
+
+Our next goal is this:
+
+We'll take the summary above, but filter it to show only some local agencies of interest.
+
+Since we are essentially taking an existing summary and adding more filtering to it, it makes sense to go back into that chunk and save it into a new object so we can reuse it.
+
+1. EDIT your existing summary chunk to save it into a new tibble. Name it `tx_quants_totals` so we are all on the same page.
+1. Add a new line that prints the result to the screen so you can still see it.
+
+
+```r
+# adding the new tibble object in next line
+tx_quants_totals <- tx %>%
+ group_by(agency_name) %>%
+ summarize(
+ sum_quantity = sum(quantity),
+ sum_total_value = sum(total_value)
+ ) %>%
+ arrange(sum_total_value %>% desc())
+
+# peek at the result
+tx_quants_totals
+```
+
+```
+## # A tibble: 357 × 3
+## agency_name sum_quantity sum_total_value
+##
+## 1 HOUSTON POLICE DEPT 6419 10825708.
+## 2 HARRIS COUNTY SHERIFF'S OFFICE 381 3776292.
+## 3 DPS SWAT- TEXAS RANGERS 1730 3520630.
+## 4 JEFFERSON COUNTY SHERIFF'S OFFICE 270 3464741.
+## 5 SAN MARCOS POLICE DEPT 1082 3100421.
+## 6 AUSTIN POLICE DEPT 1458 2741021.
+## 7 MILAM COUNTY SHERIFF DEPT 125 2723192.
+## 8 ALVIN POLICE DEPT 539 2545240.
+## 9 HARRIS COUNTY CONSTABLE PCT 3 293 2376945.
+## 10 PARKS AND WILDLIFE DEPT 5608 2325655.
+## # … with 347 more rows
+```
+
+The result is the same, but we can reuse the `tx_quants_totals` tibble.
+
+### Filtering within a vector
+
+**Let's talk through the filter concepts before you try it with this data.**
+
+When we talked about filtering with the Billboard project, we discussed using the `|` operator as an "OR" function. If we were to apply that logic here, it would look like this:
+
+```r
+data %>%
+ filter(column_name == "Text to find" | column_name == "More text to find")
+```
+
+That can get pretty unwieldy if you have more than a couple of things to look for.
+
+There is another operator `%in%` where we can search for multiple items from a list. (This list of terms is officially called a vector, but whatever.) Think of it like this in plain English: *Filter* the *column* for things *in* this *list*.
+
+```r
+data %>%
+ filter(col_name %in% c("This string", "That string"))
+```
+
+We can take this a step further by saving the items in our list into an R object so we can reuse that list and not have to type out all the terms each time we use them.
+
+```r
+list_of_strings <- c(
+ "This string",
+ "That string"
+)
+
+data %>%
+ filter(col_name %in% list_of_strings)
+```
+
+### Use the vector to build this filter
+
+1. Create a new section (headline, text and chunk) and describe you are filtering the summed quantity/values for some select local agencies.
+1. Create a saved vector list (like the list_of_strings above) of the five agencies we want to focus on. Call it `local_agencies`.
+1. Start with the `tx_quants_totals` tibble you created for totals by agency and then use `filter()` and `%in%` to filter by your new `local_agencies` list.
+
+These are the agencies:
+
+```
+AUSTIN POLICE DEPT
+SAN MARCOS POLICE DEPT
+TRAVIS COUNTY SHERIFFS OFFICE
+UNIV OF TEXAS SYSTEM POLICE HI_ED
+WILLIAMSON COUNTY SHERIFF'S OFFICE
+```
+
+> To be clear, in the interest of time I've done considerable work beforehand to figure out the exact names of these agencies. It helps that I'm familiar with local cities and counties so I used some creative filtering to find their "official" names in the data. I just don't want to get into how right now.
+
+
+ Use the example above to build with your data
+
+
+```r
+local_agencies <- c(
+ "AUSTIN POLICE DEPT",
+ "SAN MARCOS POLICE DEPT",
+ "TRAVIS COUNTY SHERIFFS OFFICE",
+ "UNIV OF TEXAS SYSTEM POLICE HI_ED",
+ "WILLIAMSON COUNTY SHERIFF'S OFFICE"
+)
+
+tx_quants_totals %>%
+ filter(agency_name %in% local_agencies)
+```
+
+```
+## # A tibble: 5 × 3
+## agency_name sum_quantity sum_total_value
+##
+## 1 SAN MARCOS POLICE DEPT 1082 3100421.
+## 2 AUSTIN POLICE DEPT 1458 2741021.
+## 3 UNIV OF TEXAS SYSTEM POLICE HI_ED 3 1305000
+## 4 TRAVIS COUNTY SHERIFFS OFFICE 151 935354.
+## 5 WILLIAMSON COUNTY SHERIFF'S OFFICE 210 431449.
+```
+
+
+
+## Item quantities, totals for local agencies
+
+Now that we have an overall idea of what local agencies are doing, let's dive a little deeper. It's time to figure out the specific items that they received.
+
+Here is the quest: For each of the agencies above we'll use summarize to get the _summed_ **quantity** and _summed_ **total_value** of each **item** shipped to the agency. We'll create a summarized list for each agency so we can write about each one.
+
+In some cases an agency might get the same item shipped to them at different times. For instance, APD has multiple rows of a single "ILLUMINATOR,INTEGRATED,SMALL ARM" shipped to them on the same date, and at other times the quantity is combined as 30 items into a single row. We'll group our summarize by **item_name** so we can get the totals for both **quantity** and **total_value** for like items.
+
+1. Create a new section (headline, text and first code chunk) and describe that you are finding the sums of each different item the agency has received since 2010.
+2. Our first code chunk will start with the `tx` data, and then filter the results to just "AUSTIN POLICE DEPT".
+3. Use `group_by` to group by `item_name`.
+4. Use summarize to build the `summed_quantity` and `summed_total_value` columns.
+5. Arrange the results so the most expensive items are at the top.
+
+
+```r
+tx %>%
+ filter(agency_name == "AUSTIN POLICE DEPT") %>%
+ group_by(item_name) %>%
+ summarize(
+ summed_quantity = sum(quantity),
+ summed_total_value = sum(total_value)
+ ) %>%
+ arrange(summed_total_value %>% desc())
+```
+
+```
+## # A tibble: 46 × 3
+## item_name summed_quantity summed_total_val…
+##
+## 1 HELICOPTER,FLIGHT TRAINER 1 833400
+## 2 IMAGE INTENSIFIER,NIGHT VISION 85 467847.
+## 3 SIGHT,THERMAL 29 442310
+## 4 PACKBOT 510 WITH FASTAC REMOTELY CONTROLLE… 4 308000
+## 5 SIGHT,REFLEX 420 144245.
+## 6 ILLUMINATOR,INTEGRATED,SMALL ARMS 135 122302
+## 7 RECON SCOUT XT 8 92451.
+## 8 RECON SCOUT XT,SPEC 6 81900
+## 9 TEST SET,NIGHT VISION VIEWER 2 56650
+## 10 PICKUP 1 26327
+## # … with 36 more rows
+```
+
+**Please realize** that this combines items that may have been shipped on any date our time period. If you want to learn more about _when_ they got the items, you would have to build a new list of the data without grouping/summarizing.
+
+### Build the lists for other agencies
+
+On your own ...
+
+1. Build a similar list for all the other local agencies. Basically you are just changing the filtering. You should end up with five chunks, each summarizing a different agency.
+
+### Google some interesting items
+
+You'll want some more detail in your data drop about some of these specific items.
+
+1. Do some Googling on some of these items of interest to learn more about them. I realize (and you should, too) that for a "real" story we would need to reach out to sources for more information, but you can get a general idea from what you find online for the writing assignment below.
+
+## Write a data drop
+
+Once you've found answers to all the questions listed, you'll weave those into a writing assignment. Include this as a Microsoft Word document saved into your project folder along with your notebooks. (If you are a Google Docs fan, you can write there and then export as a Word doc.)
+
+You will **not** be writing a full story ... we are just practicing writing a lede and "data sentences" about what you've found. You _do_ need to source the data and briefly describe the program but this is not a fully-fleshed story. Just concentrate on how you would write the facts and attribution. You will want to refer back to [Chapter 5.1](https://utdata.github.io/rwdir/sums-import.html#about-the-story-military-surplus-transfers) and the subchapters there for more information.
+
+1. Write a data drop from the data of between four and six paragraphs. Be sure to include attribution about where the data came from.
+1. You can pick the lede angle from any of the questions outlined above. Each additional paragraph should describe what you found from the data.
+1. Use Microsoft Word and include it inside your stuffed project when you upload it to Canvas.
+
+Here is a **partial** example to give you an idea of what I'm looking for. (These numbers may be old and you can't use this angle as your lede ;-)).
+
+> The Jefferson County Sheriff's Office is flying high thanks to gifts of over $3.5 million worth of surplus U.S. Department of Defense equipment.
+
+> Among the items transferred over the past decade to the department was a $923,000 helicopter in October 2016 and related parts the following year, according to data from the Defense Logistics Agency data — the agency that handles the transfers.
+
+> The sheriff's office has received the fourth highest value of equipment among any law enforcement agency in Texas since August 2014 despite being a county of only 250,000 people.
+
+## What we learned in this chapter
+
+- We used `sum()` within a `group_by()`/`summarize()` function to add values within a column.
+- We used `summary()` to get descriptive statistics about our data, like the minimum and maximum values, or an average (mean).
+- We learned how to use `c()` to **combine** a list of like values into a _vector_, and then used that vector to filter a column for valus `%in%` that vector.
+
diff --git a/docs/07-plots.md b/docs/07-plots.md
new file mode 100644
index 00000000..3609bedc
--- /dev/null
+++ b/docs/07-plots.md
@@ -0,0 +1,393 @@
+# Intro to ggplot {#ggplot-intro}
+
+## Goals for this section
+
+- An introduction to the Grammar of Graphics
+- We'll make charts!
+
+## Introduction to ggplot
+
+[ggplot2](https://ggplot2.tidyverse.org/) is the data visualization library within Hadley Wickham's [tidyverse](https://www.tidyverse.org/). It uses a concept called the [Grammar of Graphics](https://byrneslab.net/classes/biol607/readings/wickham_layered-grammar.pdf), the idea that you can build every graph from the same components: a data set, a coordinate system, and geoms -- the visual marks that represent data points.
+
+Even though the package is called `ggplot2`, the function to make graphs is just `ggplot()`. I will often just call everything `ggplot`.
+
+### What I like/dislike about ggplot
+
+The ggplot system allows you to display data right in your notebook. It is really good at helping you find important things in your data that can inform reporting. It's an important tool in your R-based data journalism toolkit.
+
+What ggplot is less good at is creating publishable graphics. Don't get me wrong ... you can do it, but nuances of the ggplot system take time to master at that level. There are other tools (like [Datawrapper](https://www.datawrapper.de/) and [Flourish](https://flourish.studio/)) that can do that better, even without code. That said, there is a place for R in that workflow, too. We'll cover using Datawrapper in a later chapter.
+
+### The Grammar of Graphics
+
+ggplot uses this concept of the Grammar of Graphics ... i.e., that you can use code to describe how to build a chart layer-by-layer.
+
+With a hat tip to [Matt Waite](http://www.mattwaite.com/), we can describe the components of the Grammar of Graphics as:
+
+- **data**: which data you are pulling from for the chart.
+- **aesthetics**: describes how to apply specific data to the plot. What is on x axis, what is on y axis, for starters.
+- **geometries**: the shape the data is going to take on the graph. lines, columns, points.
+- **scales**: any transformations we might make on the data.
+- **layers**: how we might layer multiple geometries over top of each other to reveal new information.
+- **facets**: how we might graph many elements of the same data set in the same space.
+
+What to remember here is this: for every graphic we start with the data, and then we build a chart from it one "layer" at a time.
+
+The best way to learn this system is to do it and explain along the way.
+
+## Start a new project
+
+1. Get into RStudio and make sure you don't have any other files or projects open.
+1. Create a new project, name it `yourname-ggplot` and save it in your rwd folder.
+1. No need to create our folder structure ... we won't need it here.
+1. Start a new RMarkdown notebook and save it as `01-intro-ggplot.Rmd`.
+1. Remove the boilerplate and create a setup section that loads `library(tidyverse)`, like we do with every notebook.
+
+
+
+## The layers of ggplot
+
+> Much of this first plot explanation comes from Hadley Wickham's [R for Data Science](https://r4ds.had.co.nz/data-visualisation.html?q=aes#data-visualisation), with edits to fit the lesson here.
+
+We're going to use a data set that is part of the tidyverse to explore how ggplot works.
+
+1. Start a new section "First plot" and add a code chunk.
+1. Add the code below and run it to see what the mpg dataset looks like.
+
+
+```r
+mpg
+```
+
+```
+## # A tibble: 234 × 11
+## manufacturer model displ year cyl trans drv cty hwy fl class
+##
+## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
+## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
+## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
+## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
+## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
+## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
+## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
+## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
+## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
+## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
+## # … with 224 more rows
+```
+
+The `mpg` data contains observations collected by the US Environmental Protection Agency on 38 models of cars. It's a data set embedded into the tidyverse for lessons like this one.
+
+Among the variables in mpg are:
+
+- `displ`, a car’s engine size, in liters.
+- `hwy`, a car’s fuel efficiency on the highway, in miles per gallon (mpg).
+
+With these two variables we can test the theory that cars with smaller engines (`displ`) get better gas mileage (`hwy`).
+
+### Build the base layer
+
+With ggplot2, you begin a plot with the function` ggplot()`. `ggplot()` creates a coordinate system that you can add layers to. The first argument for `ggplot()` is the data set to use in the graph So `ggplot(mpg)` creates an empty graph with no axes or anything.
+
+You complete your graph by adding one or more layers to `ggplot()`. The function `geom_point()` adds a layer of points to your plot, which creates a scatterplot. ggplot2 comes with many geom functions that each add a different type of layer to a plot.
+
+Each geom function in ggplot2 takes a `mapping` argument. This defines how variables in your dataset are mapped to visual properties. The `mapping` argument is always paired with `aes()`, and the `x` and `y` arguments of `aes()` specify which variables to map to the x and y axes. ggplot2 looks for the mapped variables in the `data` argument, in this case, `mpg.`
+
+We can apply `aes()` mappings to the graph as a whole and/or to the individual geom layers.
+
+Frankly, it is easier to show this than explain it. The code below sets up the grid and axes lines, but it hasn't placed any data on the plot.
+
+Do this:
+
+1. Add some text that you are building the mpg chart.
+2. Add the code chunk below and run it.
+
+
+```r
+ggplot(mpg, aes(x = displ, y = hwy))
+```
+
+
+
+Let's work through the code above:
+
+- `ggplot()` is our function to make a chart.
+- The first argument `ggplot()` needs is the data. It could be specified as `data = mpg` but we don't need the `data = ` part as it is always the first item specified inside of (or piped into) `ggplot()`
+- Next is the **aesthetics** or `aes()`. This is where tell ggplot what data to plot on the `x` and `y` axis. You might see this as `mapping = aes()` but we can often get by without the `mapping =` part.
+
+In our case we are applying these `aes()` to the entire chart. You'll see later we can also apply different `aes()` to specific geoms.
+
+### Layers can we add to our plots
+
+We'll now add onto this base layer a number of things:
+
+- **geometries** (or geoms as we call them) are the way we plot data on the base grid. There are [many geoms](https://github.com/rstudio/cheatsheets/blob/master/data-visualization.pdf), but here are a few common ones:
+ - `geom_points()` adds dots onto the grid based on the data. Will will use these here to build a scatterplot graph.
+ - `geom_line()` adds lines between data points on the grid. Basically a line chart.
+ - `geom_col()` and `geom_bars()` adds bars to the grid based on values in the data. A bar chart. We'll use `geom_col()` later in this lesson but you can read about the difference between the two in a later chapter.
+ - `geom_text()` adds labels based on values in the data.
+- **labels** (or labs, since we use the `labs()` function for them) are a series of text-based items we can layer onto our plots like titles, bylines and axis names.
+- **themes** change the visual styles of the grids and axis. There are several available within ggplot and [many other from the R community](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/).
+
+We add layers onto the chart using the `+` at the end of a line. Think of the `+` as the ` %>% ` of ggplot.
+
+### Add geom_points
+
+1. EDIT your plot chunk to add the `+` and a new line for `geom_point()`
+
+
+```r
+ggplot(mpg, aes(x = displ, y = hwy)) + # don't forget the + at the end of this line
+ geom_point() # the geom_point
+```
+
+
+
+The `geom_point()` function above is inheriting the `aes()` values from the line above it.
+
+### Adding other mappings
+
+We can add aesthetics to either the plot as a whole (which we did with the x and y values above) and those will apply to all the geoms unless overwritten.
+
+But we can also add aesthetics to specific geoms. We'll demonstrate this below.
+
+1. Edit your `geom_point()` function to add a color mapping to the points with `aes(color = class)`. `color` is the type of aesthetic, and `class` is a column in the data.
+
+
+```r
+ggplot(mpg, aes(x = displ, y = hwy)) +
+ geom_point(aes(color = class)) # this is the line you are editing
+```
+
+
+
+As you can see, the dots were given colors based on the values in the `class` column, and ggplot also added a legend to the graphic.
+
+There are other aesthetics you can use.
+
+1. Change the `color` aesthetic to one of these values and run it to see how it affects the chart: `alpha`, `size` and `shape`. (i.e., `alpha = class`.)
+2. Once you've tried them, change it back to `color`.
+
+OK, enough of the basics ... let's build a chart you _might_ care about.
+
+## Let's build a bar chart
+
+We'll build some charts from our first-day survey where you told me your favorite Disney Princess and favored flavor of ice cream.
+
+We aren't going to create different notebooks or download the data to to your computer ... we're just doing to save it directly into a tibble.
+
+1. Start a new section: Princess preference chart.
+1. Use text to note that we'll use class data to build a chart.
+1. Add the code below to get the data.
+
+
+```r
+# read the data and fill the tibble: class
+class <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vRnSAx9eBoOGdZ3pMLZ2XhrBlgl56BeycxJwdTofmgfTBTZ7A1-LMuBxAI094aAZnCmeThPNXaU-xro/pub?gid=1648328850&single=true&output=csv")
+```
+
+```
+## Rows: 34 Columns: 3
+```
+
+```
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (3): name, princess, ice_cream
+```
+
+```
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+```
+
+```r
+# peek at the data
+class
+```
+
+```
+## # A tibble: 34 × 3
+## name princess ice_cream
+##
+## 1 Addie Rapunzel (Tangled) Cookie Dough
+## 2 Aisling Pocahontas Mint Chocolate Chip
+## 3 Alexis Jasmine (Aladdin) Cookie Dough
+## 4 Ana Belle (Beauty and the Beast) Cookies and Cream
+## 5 Andreana Mulan Strawberry
+## 6 Angelica Tiana (Princess and the Frog) Strawberry
+## 7 Ariana Merida (Brave) Coffee/Jamoca
+## 8 Cecilia Cinderella Mint Chocolate Chip
+## 9 Chandle Jasmine (Aladdin) Cookie Dough
+## 10 Chris Cinderella Cookie Dough
+## # … with 24 more rows
+```
+
+### Prepare the data
+
+While there are ways for ggplot to calculate values from your data on the fly, I much prefer to first build a table of the values I want plotted on a chart.
+
+Our goal here is to make a bar (or column) chart showing the number of votes for each princess from the data. So, we need to count the number of rows for each value ... our typical group_by/summarize/arrange process. I'm going to use the `count()` shortcut for our GSA here since we haven't used it much lately. I'm saving the summarized data into a new dataframe called `princess_data`. Follow along in your notebook:
+
+1. Add a new section: Princess chart
+1. Add text that you are creating a data frame to plot.
+2. Add the code below to create that data.
+
+
+```r
+princess_data <- class %>%
+ count(princess, name = "votes", sort = TRUE)
+ # this above line counts the princess rows, sets the name and sorts
+
+# peek at the data
+princess_data
+```
+
+```
+## # A tibble: 10 × 2
+## princess votes
+##
+## 1 Mulan 8
+## 2 Jasmine (Aladdin) 4
+## 3 Pocahontas 4
+## 4 Aurora (Sleeping Beauty) 3
+## 5 Belle (Beauty and the Beast) 3
+## 6 Cinderella 3
+## 7 Rapunzel (Tangled) 3
+## 8 Tiana (Princess and the Frog) 3
+## 9 Merida (Brave) 2
+## 10 Ariel (Little Mermaid) 1
+```
+
+I hope you understand what we've done here. We're counting the number of rows for each princess.
+
+### Build our plot with geom_col
+
+Our first goal is to build the first layer the plot ... basically tell ggplot what data we are using so it will build the grid to hold our plots.
+
+1. Add some text noting that you'll now plot.
+1. Add the following code chunk that starts the plot and run it.
+
+
+```r
+ggplot(princess_data, aes(x = princess, y = votes)) # sets our x and y axes
+```
+
+
+
+You'll see the grid and x/y axis of the data, but no geometries are applied yet. Don't worry yet now it looks ... we'll get there.
+
+### Add the geom_col layer
+
+Now it is time to add our columns.
+
+1. Edit the plot code to add the ggplot pipe `+` and on the next line add `geom_col()`.
+
+
+```r
+ggplot(princess_data, aes(x = princess, y = votes)) + # don't forget the + on this line
+ geom_col() # adds the bars
+```
+
+
+
+This added our data to the plot, though there are a couple of issues:
+
+- We can't read the value names. We can fix this.
+- The order of the bars is alphabetical instead of in vote order. Again, we can fix it.
+
+### Flip the axis
+
+We can "flip" the axis to turn it sideways to read the labels. This can be a bit confusing later because the "x" axis is now going up/down.
+
+1. Edit your plot chunk to add the ggplot pipe `+`and `coord_flip()` on the next line.
+
+
+```r
+ggplot(princess_data, aes(x = princess, y = votes)) +
+ geom_col() + # don't forget the +
+ coord_flip() # flips the axis
+```
+
+
+
+### Reorder the bars
+
+The bars on our chart are in alphabetical order of the x axis (and reversed thanks to our flip.) We want to order the values based on the `votes` in the data.
+
+> Complication alert: Categorical data can have [factors](https://r4ds.had.co.nz/factors.html), which are like an internal ordering system. Some categories, like months in a year, have an "order" that is not alphabetical.
+
+We can reorder our categorical values in a plot by editing the `x` values in our `aes()` using `reorder()`. (There is a tidyverse function called `fct_reorder()` that works the same way.
+
+`reorder()` takes two arguments: The column to reorder, and the column to base that reorder on. It can happen in two different ways, and I'll be honest and say I don't know which is easier to comprehend.
+
+- `x = reorder(princess, votes)` says "set the x axis as `princess`, but order as `votes`. OR ...
+- `x = princess %>% reorder(votes)` says "set the x axis as `princess` _and then_ reorder by `votes`.
+
+They both work. Even though I'm a fan of the tidyverse ` %>% ` construct, I'm going with the first version.
+
+1. Edit your chunk to reorder the bars.
+
+
+```r
+ggplot(princess_data, aes(x = reorder(princess, votes), y = votes)) + # this is the line you edit
+ geom_col() +
+ coord_flip()
+```
+
+
+
+### Add some titles, labels
+
+Now we'll add a **layer** of labels to our chart using the the `labs()` function. You'll see we can add and change a number of things with `labs()`.
+
+- To add the labels to the bars, we use a `geom_text()` because we are actually plotting them on the graph. The example below also changes the color of the text and moves the labels to inside the bar with `hjust` (or horizontal justification. `vjust` would move it up and down).
+- The `labs()` function allows for labels "around" the chart. These are some standard values used.
+
+
+```r
+ggplot(princess_data, aes(x = reorder(princess, votes), y = votes)) +
+ geom_col() +
+ coord_flip() + # don't forget +
+ geom_text(aes(label = votes), hjust = 2, color = "white") + # plots votes text values on chart
+ # labs below has several settings
+ labs(
+ title = "Favored princess", # adds a title
+ subtitle = "Disney Princess votes from Reporting with Data, Fall 2021.", # adds a subtitle
+ caption = "By Christian McDonald", # adds the byline
+ x = "Princess choices", # renames the x axis label (which is really y since it is flipped)
+ y = "Number of votes" # renames the y axis label (which is really x since it is flipped)
+ )
+```
+
+
+
+There you go! You've made a chart showing how our classes rated Disney Princesses.
+
+## On your own: Ice cream!
+
+Now it is time for you to put these skills to work:
+
+1. Build a chart about the favorite ice creams from RWD classes.
+
+Some things to consider:
+
+- You need a new section, etc.
+- You're starting with the same `class` data
+- You need to prepare the data based on `ice_cream`
+- You need to build the chart
+
+It's essentially the same process we used for the princess chart, but using `ice_cream` variable.
+
+## What we've learned
+
+There is a ton, really.
+
+- ggplot2 (which is really the `ggplot()` function) is the charting library for the tidyverse. This whole lesson was about it.
+
+Here are some more references for ggplot:
+
+- [The ggplot2 documentation](http://ggplot2.tidyverse.org/reference/index.html) and [ggplot2 cheatsheets](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf).
+- [R for Data Science, Chap 3.](https://r4ds.had.co.nz/data-visualisation.html) Hadley Wickam dives right into plots in his book.
+- [R Graphics Cookbook](https://r-graphics.org/) has lots of example plots. Good to harvest code and see how to do things.
+- [The R Graph Gallery](https://www.r-graph-gallery.com/) another place to see examples.
diff --git a/docs/07-plots_files/figure-html/mpg-base-1.png b/docs/07-plots_files/figure-html/mpg-base-1.png
new file mode 100644
index 00000000..16894c8d
Binary files /dev/null and b/docs/07-plots_files/figure-html/mpg-base-1.png differ
diff --git a/docs/07-plots_files/figure-html/mpg-color-1.png b/docs/07-plots_files/figure-html/mpg-color-1.png
new file mode 100644
index 00000000..e3fc64af
Binary files /dev/null and b/docs/07-plots_files/figure-html/mpg-color-1.png differ
diff --git a/docs/07-plots_files/figure-html/mpg-points-1.png b/docs/07-plots_files/figure-html/mpg-points-1.png
new file mode 100644
index 00000000..b746c76f
Binary files /dev/null and b/docs/07-plots_files/figure-html/mpg-points-1.png differ
diff --git a/docs/07-plots_files/figure-html/princess-base-1.png b/docs/07-plots_files/figure-html/princess-base-1.png
new file mode 100644
index 00000000..3b144624
Binary files /dev/null and b/docs/07-plots_files/figure-html/princess-base-1.png differ
diff --git a/docs/07-plots_files/figure-html/princess-col-1.png b/docs/07-plots_files/figure-html/princess-col-1.png
new file mode 100644
index 00000000..4796d02d
Binary files /dev/null and b/docs/07-plots_files/figure-html/princess-col-1.png differ
diff --git a/docs/07-plots_files/figure-html/princess-flipped-1.png b/docs/07-plots_files/figure-html/princess-flipped-1.png
new file mode 100644
index 00000000..b5b2d737
Binary files /dev/null and b/docs/07-plots_files/figure-html/princess-flipped-1.png differ
diff --git a/docs/07-plots_files/figure-html/princess-labs-1.png b/docs/07-plots_files/figure-html/princess-labs-1.png
new file mode 100644
index 00000000..e193b726
Binary files /dev/null and b/docs/07-plots_files/figure-html/princess-labs-1.png differ
diff --git a/docs/07-plots_files/figure-html/princess-reorder-1.png b/docs/07-plots_files/figure-html/princess-reorder-1.png
new file mode 100644
index 00000000..70ee31b4
Binary files /dev/null and b/docs/07-plots_files/figure-html/princess-reorder-1.png differ
diff --git a/docs/08-plots-more.md b/docs/08-plots-more.md
new file mode 100644
index 00000000..8ff0d741
--- /dev/null
+++ b/docs/08-plots-more.md
@@ -0,0 +1,529 @@
+# Deeper into ggplot {#ggplot-more}
+
+In the last chapter you were introduced to [ggplot2](https://ggplot2.tidyverse.org/index.html), the graphic function that is part of the tidyverse. With this chapter we'll walk through more ways to use ggplot. This will also serve as a reference for you. A huge hat tip to [Jo Lukito](https://www.jlukito.com/). One of her lessons inspired most of this.
+
+## References
+
+ggplot2 has a LOT to it and we'll cover only the basics. Here are some references you might use:
+
+- [ggplot cheatsheet](https://github.com/rstudio/cheatsheets/blob/master/data-visualization-2.1.pdf)
+- [R for Data Science](https://r4ds.had.co.nz/index.html)
+- [R Graphics Cookbook](https://r-graphics.org/)
+- [The R Graph Gallery](https://www.r-graph-gallery.com/) another place to see examples.
+- [ggplot2: Elegant graphics for Data Analysis](https://ggplot2-book.org/index.html)
+
+
+## Learning goals for this chapter
+
+Some things we'll touch on concerning ggplot:
+
+- Prepare and build a line chart
+- Using themes to change the look of our charts
+- Adding/changing aesthetics in layers
+- Facets (multiple charts from same data)
+- Saving files
+- Interactivity with Plotly
+
+## Set up your notebook
+
+We'll use the same `yourname-ggplot` project we used in the last chapter, but start a new RNotebook.
+
+1. Open your plot project.
+2. Start a new RNotebook. Add the goals listed above.
+3. Load the tidyverse package.
+
+
+
+### Let's get the data
+
+> I hope to demonstrate in class the creation of this first plot. Otherwise you should be able to follow along in the screencast.
+
+Again, we won't download the data ... we'll just import it and save it to a tibble. We are using data from a weekly project called #tidytuesday that the community uses to practice R. Perhaps in the near future we'll have our own #tidytuesday sessions!
+
+1. Start a new section to indication you are importing the data
+2. Note in text it is from #tidytuesday
+3. Add the code chunk below, which will download a saved copy of the data.
+
+
+```r
+kids_data <- read_rds("https://github.com/utdata/rwdir/blob/main/data-raw/kids-data.rds?raw=true")
+
+# peek at the table
+kids_data
+```
+
+```
+## # A tibble: 23,460 × 6
+## state variable year raw inf_adj inf_adj_perchild
+##
+## 1 Alabama PK12ed 1997 3271969 4665308. 3.93
+## 2 Alaska PK12ed 1997 1042311 1486170 7.55
+## 3 Arizona PK12ed 1997 3388165 4830986. 3.71
+## 4 Arkansas PK12ed 1997 1960613 2795523 3.89
+## 5 California PK12ed 1997 28708364 40933568 4.28
+## 6 Colorado PK12ed 1997 3332994 4752320. 4.38
+## 7 Connecticut PK12ed 1997 4014870 5724568. 6.70
+## 8 Delaware PK12ed 1997 776825 1107629. 5.63
+## 9 District of Columbia PK12ed 1997 544051 775730. 6.11
+## 10 Florida PK12ed 1997 11498394 16394885 4.45
+## # … with 23,450 more rows
+```
+
+```r
+# glimpse it
+kids_data %>% glimpse()
+```
+
+```
+## Rows: 23,460
+## Columns: 6
+## $ state "Alabama", "Alaska", "Arizona", "Arkansas", "Californ…
+## $ variable "PK12ed", "PK12ed", "PK12ed", "PK12ed", "PK12ed", "PK…
+## $ year 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997, 1997,…
+## $ raw 3271969, 1042311, 3388165, 1960613, 28708364, 3332994…
+## $ inf_adj 4665308.5, 1486170.0, 4830985.5, 2795523.0, 40933568.…
+## $ inf_adj_perchild 3.929449, 7.548493, 3.706679, 3.891275, 4.282325, 4.3…
+```
+
+If you want to learn more about this dataset you can find [information here](https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-09-15/readme.md). In short: "This dataset provides a comprehensive accounting of public spending on children from 1997 through 2016." Included is spending for higher education (the `highered` values in the column `variable`.)
+
+We'll filter this data to get to our data of interest: How much does Texas (and some neighboring states) spend on higher education.
+
+## Make a line chart of the Texas data
+
+Our first goal here is to plot the "inflation adjusted spending per child" for higher education in Texas.
+
+### Prepare the data
+
+We need to filter our data to include just the `highered` values for Texas. We're going to save that filtered data into a new tibble.
+
+1. Add a new section indicating we are building a chart to show higher education spending in Texas.
+2. Note that we are preparing the data.
+3. Add the chunk below and run it.
+
+
+```r
+tx_hied <- kids_data %>%
+ filter(
+ variable == "highered",
+ state == "Texas"
+ )
+
+# peek at the data
+tx_hied
+```
+
+```
+## # A tibble: 20 × 6
+## state variable year raw inf_adj inf_adj_perchild
+##
+## 1 Texas highered 1997 3940232 5618146. 0.944
+## 2 Texas highered 1998 4185619 5895340. 0.970
+## 3 Texas highered 1999 4578617 6368075 1.03
+## 4 Texas highered 2000 4810358 6554888. 1.05
+## 5 Texas highered 2001 5684406. 7564852 1.20
+## 6 Texas highered 2002 6558453 8589044 1.34
+## 7 Texas highered 2003 6584970. 8462055 1.31
+## 8 Texas highered 2004 6611486 8290757 1.27
+## 9 Texas highered 2005 7180804 8730524 1.32
+## 10 Texas highered 2006 7744386 9119121 1.34
+## 11 Texas highered 2007 7540724 8644579 1.25
+## 12 Texas highered 2008 8914255 10011380 1.42
+## 13 Texas highered 2009 10039289 11145217 1.55
+## 14 Texas highered 2010 13097474 14413453 1.99
+## 15 Texas highered 2011 13366868 14416946 1.97
+## 16 Texas highered 2012 13999386 14828316 2.01
+## 17 Texas highered 2013 14520493 15124855 2.04
+## 18 Texas highered 2014 16101982 16470816 2.19
+## 19 Texas highered 2015 16591235 16773450 2.21
+## 20 Texas highered 2016 15507047 15507047 2.02
+```
+
+### Plot the chart
+
+I want you to create the plot here one step at a time so you can review how the layers are added.
+
+1. Add and run the ggplot() line first (but without the `+`)
+2. Then add the `+` and the `geom_point()` and run it.
+3. Then add the `+` and `geom_line()` and run it.
+4. When you add the `labs()` add all the lines at once and run it.
+
+
+```r
+ggplot(tx_hied, aes(x = year, y = inf_adj_perchild)) + # we create our graph
+ geom_point() + # adding the points
+ geom_line() + # adding the lines between points
+ labs(
+ title = "School spending slips",
+ subtitle = "Texas spent less per child on higher education in 2016.",
+ x = "Year", y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids"
+ )
+```
+
+
+
+```r
+ # labs above add text layer on top of graph
+```
+
+We have a pretty decent chart showing `year` on our x axis and `inf_adj_perchild` (or inflation-adjusted spending per child) on our y axis.
+
+### Saving plots as an object
+
+Sometimes it is helpful to push the results of a plot into an R object to "save" those configurations. You can continue to add layers after, but don't have to rebuild the basic chart each time. We'll do that here so we can explore themes next.
+
+1. Edit your Texas plot chunk you made earlier to save it into an R object, and then call `tx_plot` after it so you can see it.
+
+
+```r
+# the line below pushes the graph results into tx_plot
+tx_plot <- ggplot(tx_hied, aes(x = year, y = inf_adj_perchild)) +
+ geom_point() +
+ geom_line() +
+ labs(
+ title = "School spending slips",
+ subtitle = "Texas spent less per child on higher education in 2016.",
+ x = "Year", y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids"
+ )
+
+# Since we saved the plot into an R object above, we have to call it again to see it.
+# We save graphs like this so we can reuse them.
+tx_plot
+```
+
+
+
+We can continue to build upon the `tx_plot` object like we do below with themes, but those changes won't be "saved" into the R environment unless you assign it to an R object.
+
+## Themes
+
+The _look_ of the graph is controlled by the theme. There are a number of preset themes you can use. Let's look at a couple.
+
+1. Create a new section saying we'll explore themes
+2. Add the chunk below and run it.
+
+
+```r
+tx_plot +
+ theme_minimal()
+```
+
+
+
+This takes our existing `tx_plot` and then applies the `theme_minimal()` look to it.
+
+There are a number of themes built into ggplot, most are pretty simplistic.
+
+1. Edit your existing chunk to try different themes. Some you might try are `theme_classic()`, `theme_dark()` and `theme_void()`.
+
+### More with ggthemes
+
+There are a number of other packages that build upon `ggplot2`, including [`ggthemes`](https://yutannihilation.github.io/allYourFigureAreBelongToUs/ggthemes/).
+
+1. In your R console, install the ggthemes package: `install.packages("ggthemes")`
+2. Add the `library(ggthemes)` at the top of your current chunk.
+3. Update the theme line to view some of the others options noted below.
+
+
+```r
+library(ggthemes)
+tx_plot +
+ theme_economist()
+```
+
+
+
+
+```r
+tx_plot +
+ theme_fivethirtyeight()
+```
+
+
+
+
+```r
+tx_plot +
+ theme_stata()
+```
+
+
+
+### There is more to themes
+
+There is also a `theme()` function that allows you individually [adjust about every visual element](https://ggplot2.tidyverse.org/reference/theme.html) on your plot.
+
+We do a wee bit of that later.
+
+## Adding more information
+
+OK, our Texas higher education spending is fine ... but how does that compare to neighboring states? Let's work through building a new chart that shows all those steps.
+
+### Prepare the data
+
+We need to go back to our original `kids_data` to get the additional states.
+
+1. Start a new section that notes we are building a chart for five states.
+2. Note that we'll first prepare the data.
+
+
+```r
+five_hied <- kids_data %>%
+ filter(
+ variable == "highered",
+ state %in% c("Texas", "Oklahoma", "Arkansas", "New Mexico", "Louisiana")
+ )
+
+five_hied
+```
+
+```
+## # A tibble: 100 × 6
+## state variable year raw inf_adj inf_adj_perchild
+##
+## 1 Arkansas highered 1997 457171 651853. 0.907
+## 2 Louisiana highered 1997 672364 958684. 0.731
+## 3 New Mexico highered 1997 639409 911696. 1.68
+## 4 Oklahoma highered 1997 624053 889800. 0.942
+## 5 Texas highered 1997 3940232 5618146. 0.944
+## 6 Arkansas highered 1998 477757 672909. 0.930
+## 7 Louisiana highered 1998 747739 1053172. 0.805
+## 8 New Mexico highered 1998 667738 940492. 1.74
+## 9 Oklahoma highered 1998 690234 972177. 1.02
+## 10 Texas highered 1998 4185619 5895340. 0.970
+## # … with 90 more rows
+```
+
+Note we used our `%in%` filter to get any state listed in `c()`.
+
+### Plot multiple line chart
+
+Let's add a different line for each state. To do this you would use the color aesthetic `aes()` in the `geom_line()` geom. Recall that geoms can have their own `aes()` variable information. This is especially useful for working with a third variable (like when making a stacked bar chart or line plot with multiple lines). Notice that the color aesthetic (meaning that it is in aes) takes a variable, not a color. You can learn how to change these colors [here](http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually).
+
+1. Add a note that we'll now build the chart.
+2. Add the code chunk below and run it. Look through the comments so you understand it.
+
+
+```r
+ggplot(five_hied, aes(x = year, y = inf_adj_perchild)) +
+ geom_point() +
+ geom_line(aes(color = state)) + # The aes selects a color for each state
+ labs(
+ title = "Spending on Higher Education in Texas, Bordering States",
+ x = "Year",
+ y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids"
+ )
+```
+
+
+
+Notice that R changes the color of the line, but not the point? This is because we only included the aesthetic in the `geom_line()` geom and not the `geom_point()` geom.
+
+1. Edit your `geom_point()` to add `aes(color = state)`.
+
+
+```r
+ggplot(five_hied, aes(x = year, y = inf_adj_perchild)) +
+ geom_point(aes(color = state)) + # add the aes here
+ geom_line(aes(color = state)) +
+ labs(title = "Spending on Higher Education in Texas, Bordering States",
+ x = "Year", y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids")
+```
+
+
+
+## On your own: Line chart
+
+I want you to make a line chart of preschool-to-high-school spending (the "PK12ed" value in the `variable` column) showing the inflation adjusted per-child spending (the `inf_adj_perchild` column) for the five states that border the Gulf of Mexico. This is very similar to the chart you just made, but with different values.
+
+Some things to do/consider:
+
+1. Do this in a new section and explain it.
+1. You'll need to prepare the data just like we did above to get the right data points and the right states.
+2. I really suggest you build both chunks (the data prep and the chart) one line at a time so you can see what each step adds.
+1. Save the resulting plot into a new R object because we'll use it later.
+
+
+
+## Tour of some other adjustments
+
+You don't have to add these examples below to your own notebook, but here are some examples of other things you can control.
+
+### Line width
+
+
+```r
+ggplot(five_hied, aes(x = year, y = inf_adj_perchild)) +
+ geom_point(aes(color = state)) +
+ geom_line(aes(color = state), size = 1.5) + # added size here
+ labs(title = "Spending on Higher Education in Texas, Bordering States",
+ x = "Year", y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids")
+```
+
+
+
+### Line type
+
+This example removes the points and adds a `linetype = state` to the ggplot aesthetic. This gives each state a different type of line. We also set the color in the `geom_line()`
+
+
+```r
+ggplot(five_hied, aes(x = year, y = inf_adj_perchild)) +
+ geom_line(aes(color = state, linetype = state), size = .75) +
+ labs(title = "Spending on Higher Education in Texas, Bordering States",
+ x = "Year", y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids")
+```
+
+
+
+### Adjust axis
+
+`ggplot()` typically makes assumptions about scale. Sometimes, you may want to change it though (e.g., make them a little larger). There are a couple different ways to do this. The most straightforward may be `xlim()` and `ylim()`.
+
+
+```r
+ggplot(five_hied, aes(x = year, y = inf_adj_perchild, linetype = state)) +
+ geom_line(aes(color = state), size = .75) +
+ xlim(1995, 2020) + # sets minimum and maximum values on axis
+ labs(title = "Spending on Higher Education in Texas, Bordering States",
+ x = "Year", y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids")
+```
+
+
+
+The function `xlim()` and `ylim()` are shortcuts for `scale_x_continuous()` and `scale_y_continuous()` which [do more things](https://ggplot2.tidyverse.org/reference/scale_continuous.html#examples).
+
+## Facets
+
+Facets are a way to make multiple graphs based on a variable in the data. There are two types, the `facet_wrap()` and the `facet_grid()`. There is a good explanation of these in [R for Data Science](https://r4ds.had.co.nz/data-visualisation.html?q=facet#facets).
+
+We'll start by creating a base graph and then apply the facet.
+
+1. Start a new section about facets
+2. Add the code below to create your chart and view it.
+
+
+```r
+five_plot <- ggplot(five_hied, aes(x = year,
+ y = inf_adj_perchild)) +
+ geom_line(aes(color = state)) +
+ geom_point(aes(color = state)) +
+ labs(title = "Spending on Higher Education in Texas, Bordering States",
+ x = "Year", y = "$ per child (Adjusted for Inflation)",
+ caption = "Source: tidykids")
+
+five_plot
+```
+
+
+
+### Facet wrap
+
+The facet_wrap() splits your chart based on a single variable. You define which variable to split upon with `~` followed by the variable name.
+
+1. Add a new chunk and create the facet wrap shown here.
+
+
+```r
+five_plot +
+ facet_wrap(~ state) +
+ theme(legend.position = "none") # removes the legend. Try it without it!
+```
+
+
+
+A couple of notes about the above code:
+
+- Note the comment in the code above where we used the `theme()` function to remove the legend.
+- You can specify the number of rows or columns of the grouping by adjusting the facet_wrap() function: `facet_wrap(~ state, nrow = 2)` or `facet_wrap(~ state, ncol = 2)`. Try them!
+
+### Facet grids
+
+A `facet_grid()` allows you to plot on a combination of variables. We don't really have two numbers to compare in our higher education data so we'll show this with the `mpg` data we've used before.
+
+1. Start a new section noting you'll try facet grid.
+2. Add the chunk below and run it.
+
+Explanations follow the chart.
+
+
+```r
+ggplot(mpg) +
+ geom_point(aes(x = displ, y = hwy)) + # add points to the chart
+ facet_grid(drv ~ cyl) # splits into charts by drive train and cylinder
+```
+
+
+
+This chart is kinda hard to read, but let's try:
+
+- Inside the mini charts, the best gas mileage is toward the top (from `hwy`) and the smaller engines are to the left (from `displ`.)
+- The rows of charts are divided by drive train `drv`: four-wheel drive, front-wheel drive and rear-wheel drive.
+- The columns of charts are divided by cylinders: like a 4-cylinder car vs 8-cylinder car.
+
+This chart tells us that 4-cylinder, front-wheel drive cars with smaller engines get the best gas mileage. The blank charts mean that combination of values didn't exist in the data.
+
+## On your own: Facet wrap
+
+1. Create a section about doing a facet wrap on your own.
+1. Take the "On your own" plot that you made earlier (The school spending for Gulf states) and apply a `facet_wrap()` here. You were instructed to save the plot into an R object, so you should be able to use that.
+1. Remove the legend since each mini chart is labeled.
+
+
+
+
+## Saving plots
+
+To save plots as images, you can right-click plots that you make in RNotebooks. Or, you can use the export button in the Plot pane. Or (and this is a preferred strategy), you can save them using `ggsave()`. ([Learn more here](https://ggplot2.tidyverse.org/reference/ggsave.html)).
+
+1. Use your Files pane to create a new folder called "images" so we can save our chart there.
+2. Start a section on saving plots and add the following chunk.
+
+
+```r
+ggsave("images/txplot.png", plot = tx_plot)
+```
+
+```
+## Saving 7 x 5 in image
+```
+
+Using `ggsave` creates a higher-res image than other methods. It needs"
+
+- The path and name of the image, in quotes
+- the `plot =` variable to say which plot you are saving. (Your plot must already be saved into an R object for this method to work.)
+
+## Interactive plots
+
+Want to make your plot interactive? You can use [plotly](https://plotly.com/r/)'s `ggplotly()` function to transform your graph into an interactive chart.
+
+To use plotly, you’ll want to install the plotly package, add the library, and then use the ggplotly() function:
+
+1. In your R Console, run `install.packages("plotly")`. (You only have to do this once on your computer.)
+1. Add a new section to note you are creating an interactive chart.
+2. Add the code below and run it. Then play with the chart!
+
+```r
+library(plotly)
+
+tx_plot %>%
+ ggplotly()
+```
+
+(We can't show the interactive version in this book.)
+
+Now you have tool tips on your points when you hover over them.
+
+The `ggplotly()` function is not perfect. Alternatively, you can use plotly's own syntax to build some quite interesting charts, but it's a whole [new syntax to master](https://plotly.com/r/).
+
+## What we learned
+
+There is so much more to ggplot2 than what we've shown here, but these are the basics that should get you through the class. At the top of this chapter are a list of other resources to learn more.
+
diff --git a/docs/08-plots-more_files/figure-html/adjust-axis-1.png b/docs/08-plots-more_files/figure-html/adjust-axis-1.png
new file mode 100644
index 00000000..b737fe11
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/adjust-axis-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/facet-data-1.png b/docs/08-plots-more_files/figure-html/facet-data-1.png
new file mode 100644
index 00000000..7d040e46
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/facet-data-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/facet-grid-1.png b/docs/08-plots-more_files/figure-html/facet-grid-1.png
new file mode 100644
index 00000000..4b2eb9b9
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/facet-grid-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/facet-wrap-1.png b/docs/08-plots-more_files/figure-html/facet-wrap-1.png
new file mode 100644
index 00000000..23d09272
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/facet-wrap-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/five-hied-lcolor-1.png b/docs/08-plots-more_files/figure-html/five-hied-lcolor-1.png
new file mode 100644
index 00000000..473da257
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/five-hied-lcolor-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/five-hied-pcolor-1.png b/docs/08-plots-more_files/figure-html/five-hied-pcolor-1.png
new file mode 100644
index 00000000..1bd6998f
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/five-hied-pcolor-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/line-type-1.png b/docs/08-plots-more_files/figure-html/line-type-1.png
new file mode 100644
index 00000000..0a31da65
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/line-type-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/line-width-1.png b/docs/08-plots-more_files/figure-html/line-width-1.png
new file mode 100644
index 00000000..06a84cf2
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/line-width-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/theme-538-1.png b/docs/08-plots-more_files/figure-html/theme-538-1.png
new file mode 100644
index 00000000..8f02a31a
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/theme-538-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/theme-economist-1.png b/docs/08-plots-more_files/figure-html/theme-economist-1.png
new file mode 100644
index 00000000..4cc5e500
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/theme-economist-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/theme-minimal-1.png b/docs/08-plots-more_files/figure-html/theme-minimal-1.png
new file mode 100644
index 00000000..9a84ac65
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/theme-minimal-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/theme-stata-1.png b/docs/08-plots-more_files/figure-html/theme-stata-1.png
new file mode 100644
index 00000000..6bb8d573
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/theme-stata-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/tx-hied-chart-1.png b/docs/08-plots-more_files/figure-html/tx-hied-chart-1.png
new file mode 100644
index 00000000..3a82302f
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/tx-hied-chart-1.png differ
diff --git a/docs/08-plots-more_files/figure-html/tx-plot-create-1.png b/docs/08-plots-more_files/figure-html/tx-plot-create-1.png
new file mode 100644
index 00000000..3a82302f
Binary files /dev/null and b/docs/08-plots-more_files/figure-html/tx-plot-create-1.png differ
diff --git a/docs/09-tidy-data.md b/docs/09-tidy-data.md
new file mode 100644
index 00000000..35c83fe7
--- /dev/null
+++ b/docs/09-tidy-data.md
@@ -0,0 +1,405 @@
+# Tidy data {#tidy-data}
+
+Data "shape" can be important when you are trying to work with and visualize data. In this chapter we'll discuss "tidy" data and how this style of organization helps us.
+
+> Slides by Hadley Wickham are used with permission from the author.
+
+## Goals for this section
+
+- Explore what it means to have "tidy" data.
+- Learn about and use `pivot_longer()`, `pivot_wider()` to make our data tidy.
+- Use Skittles to explore shaping data.
+
+## The questions we'll answer
+
+- Are candy colors evenly distributed within a package of Skittles? (The mean of candies by color over all packages)
+- Plot a column chart showing the average number of colored candies among all packages using ggplot
+- Plot the same data using Datawrapper.
+- Bonus 1: Who got the most candies in their bag?
+- Bonus 2: What is the average number of candy in a bag?
+
+## What is tidy data
+
+"Tidy" data is well formatted so each variable is in a column, each observation is in a row and each value is a cell. Our first step in working with any data is to make sure we are "tidy".
+
+![Tidy data definition](images/tidy-example.png)
+
+
+
+
+
+It's easiest to see the difference through examples. The data frame below is of tuberculosis reports from the World Health Organization.
+
+- Each row is a set of observations (or case) from a single country for a single year.
+- Each column describes a unique variable. The year, the number of cases and the population of the country at that time.
+
+![A tidy table](images/tidy-table-tidy.png)
+
+
+Table2 below isn't tidy. The **count** column contains two different type of values.
+
+![An untidy table](images/tidy-table-nottidy.png)
+
+When our data is tidy, it is easy to manipulate. We can use functions like `mutate()` to calculate new values for each case.
+
+![Manipulate a tidy table](images/tidy-table-manipulate.png)
+
+## Tidyr package
+
+When our data is tidy, we can use the [tidyr](https://tidyr.tidyverse.org/) package to reshape the layout of our data to suit our needs. It gets loaded with `library(tidyverse)`.
+
+In the figure below, the table on the left is "wide". There are are multiple year columns describing the same variable. It might be useful if we want to calculate the difference of the values for two different years. It's less useful if we want plot on a graphic because we don't have columns to map as X and Y values.
+
+The table on the right is "long", in that each column describes a single variable. It is this shape we need when we want to plot values on a chart. We can then set our "Year" column as an X axis, our "n" column on our Y axis, and group by the "Country".
+
+![Wide vs long](images/tidy-wide-vs-long.png)
+
+## The tidyr verbs
+
+The two functions we'll use most to reshape are data are:
+
+- [pivot_longer()](https://tidyr.tidyverse.org/reference/pivot_longer.html) "lengthens" data, increasing the number of rows and decreasing the number of columns.
+- [pivot_wider()](https://tidyr.tidyverse.org/reference/pivot_wider.html) "widens" data, increasing the number of columns and decreasing the number of rows.
+
+Again, the best way to learn this is to present the problem and solve it with explanation.
+
+## Prepare our Skittles project
+
+Start a new project to explore this subject.
+
+1. Create a new project and call it: `yourname-skittles`
+1. No need to create folders. We'll just load data directly into the notebook.
+2. Start a new RNotebook and edit the headline
+3. Create your setup block and load the libraries below.
+
+
+```r
+library(tidyverse)
+library(janitor)
+library(lubridate)
+```
+
+### Get the data
+
+We'll just load this data directly from Google Sheets into this notebook.
+
+1. Add a section that you are importing data.
+1. Add this import chunk.
+
+
+```r
+data <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vTxm9NxK67thlGjYOQBo_0JRvx2d137xt0nZGffqR6P1vl8QrlTUduiOsDJ2FKF6yLgQAQphVZve76z/pub?output=csv") %>% clean_names()
+```
+
+```
+## Rows: 124 Columns: 7
+```
+
+```
+## ── Column specification ────────────────────────────────────────────────────────
+## Delimiter: ","
+## chr (2): Timestamp, Name
+## dbl (5): Red, Green, Orange, Yellow, Purple
+```
+
+```
+##
+## ℹ Use `spec()` to retrieve the full column specification for this data.
+## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+```
+
+```r
+# peek at the data
+data %>% glimpse()
+```
+
+```
+## Rows: 124
+## Columns: 7
+## $ timestamp "7/27/2020 18:21:19", "8/1/2020 16:04:56", "7/30/2020 14:56:…
+## $ name "Alora Jones", "Alyssa Hiarker", "Annie Patton", "Christian …
+## $ red 12, 13, 12, 9, 7, 10, 12, 11, 7, 18, 13, 11, 10, 14, 8, 8, 1…
+## $ green 11, 15, 12, 10, 12, 14, 15, 5, 10, 9, 11, 13, 7, 12, 15, 10,…
+## $ orange 12, 10, 8, 17, 13, 9, 12, 17, 10, 13, 11, 7, 12, 10, 20, 13,…
+## $ yellow 9, 9, 10, 6, 11, 11, 10, 14, 21, 7, 11, 7, 15, 9, 4, 14, 10,…
+## $ purple 15, 15, 18, 18, 17, 16, 12, 13, 14, 13, 12, 21, 16, 14, 10, …
+```
+
+We cleaned the name on import. The `timestamp` is not a real date, so we need to fix that.
+
+### Fix the date
+
+We're going to convert the `timestamp` and then turn it into a regular date.
+
+1. Create a section and note you are fixing dates.
+2. Add this chunk and run it. I'll explain it below.
+
+
+```r
+skittles <- data %>%
+ mutate(
+ date_entered = mdy_hms(timestamp) %>% date()
+ ) %>%
+ select(-timestamp)
+
+skittles %>% glimpse()
+```
+
+```
+## Rows: 124
+## Columns: 7
+## $ name "Alora Jones", "Alyssa Hiarker", "Annie Patton", "Christi…
+## $ red 12, 13, 12, 9, 7, 10, 12, 11, 7, 18, 13, 11, 10, 14, 8, 8…
+## $ green 11, 15, 12, 10, 12, 14, 15, 5, 10, 9, 11, 13, 7, 12, 15, …
+## $ orange 12, 10, 8, 17, 13, 9, 12, 17, 10, 13, 11, 7, 12, 10, 20, …
+## $ yellow 9, 9, 10, 6, 11, 11, 10, 14, 21, 7, 11, 7, 15, 9, 4, 14, …
+## $ purple 15, 15, 18, 18, 17, 16, 12, 13, 14, 13, 12, 21, 16, 14, 1…
+## $ date_entered 2020-07-27, 2020-08-01, 2020-07-30, 2020-06-23, 2020-07-…
+```
+
+Let's talk just a minute about what we've done here:
+
+- We name our new tibble.
+- We are filling that tibble starting with our imported data called `data`.
+- We use mutate to create a new column `date_entered`, then fill it by first converting the text to an official timestamp datatype (which requires the lubridate function `mdy_hms()`), and then we extract just the date of that with `date()`.
+- We then use `select()` to remove the old timestamp column.
+
+### Peek at the wide table
+
+Let's look closer at this:
+
+
+```r
+skittles %>% head()
+```
+
+```
+## # A tibble: 6 × 7
+## name red green orange yellow purple date_entered
+##
+## 1 Alora Jones 12 11 12 9 15 2020-07-27
+## 2 Alyssa Hiarker 13 15 10 9 15 2020-08-01
+## 3 Annie Patton 12 12 8 10 18 2020-07-30
+## 4 Christian McDonald 9 10 17 6 18 2020-06-23
+## 5 Claudia Ng 7 12 13 11 17 2020-07-30
+## 6 Cristina Pop 10 14 9 11 16 2020-07-22
+```
+
+This is not the worst example of data. It could be useful to create a "total" column, but there are better ways to do this with **long** data.
+
+## Pivot longer
+
+What we want here is five rows for Alora Jones, with a column for "color" and a column for "candies".
+
+The `pivot_longer()` function needs several arguments:
+
+- Which columns do you want to pivot? For us, these are the color columns.
+- What do you want to name the new column to describe the column names? For us we want to name this "color" since that's what those columns described.
+- What do you want to name the new column to describe the values that were in the cells? For us we want to call this "candies" since these are the number of candies in each bag.
+
+There are a number of ways we can describe which columns to pivot ... anything in [tidy-select](https://tidyr.tidyverse.org/reference/tidyr_tidy_select.html) works. You can see a bunch of [examples here](https://tidyr.tidyverse.org/reference/pivot_longer.html#examples).
+
+We are using a range, naming the first "red" and the last column "purple" with `:` in between. This only works because those columns are all together. We could also use `cols = !c(name, date_entered)` to say everything but those two columns.
+
+1. Add a note that you are pivoting the data
+1. Add the chunk below and run it
+
+
+```r
+skittles_long <- skittles %>%
+ pivot_longer(
+ cols = red:purple, # sets which columns to pivot based on their names
+ names_to = "color", # sets column name for color
+ values_to = "candies" # sets column name for candies
+ )
+
+skittles_long %>% head()
+```
+
+```
+## # A tibble: 6 × 4
+## name date_entered color candies
+##
+## 1 Alora Jones 2020-07-27 red 12
+## 2 Alora Jones 2020-07-27 green 11
+## 3 Alora Jones 2020-07-27 orange 12
+## 4 Alora Jones 2020-07-27 yellow 9
+## 5 Alora Jones 2020-07-27 purple 15
+## 6 Alyssa Hiarker 2020-08-01 red 13
+```
+
+### Average candies per color
+
+To get the average number of candies per each color, we can use our `skittles_long` data and `group_by` color (which will consider all the **red** rows together, etc.) and use `summarize()` to get the mean.
+
+This is something you should be able to do on your own, as it is very similar to the `sum()`s we did with military surplus, but you use `mean()` instead.
+
+Save the resulting summary table into a new tibble called `skittles_avg`.
+
+
+ Try it on your own
+
+```r
+skittles_avg <- skittles_long %>%
+ group_by(color) %>%
+ summarize(avg_candies = mean(candies))
+
+skittles_avg
+```
+
+```
+## # A tibble: 5 × 2
+## color avg_candies
+##
+## 1 green 11.2
+## 2 orange 12.1
+## 3 purple 12.0
+## 4 red 11.6
+## 5 yellow 11.5
+```
+
+
+### Round the averages
+
+Let's modify this summary to round the averages to tenths so they will plot nicely on our chart.'
+
+The `round()` function needs the column to change, and then the number of digits past the decimal to include.
+
+1. Edit your summary to include the mutate below.
+
+
+```r
+skittles_avg <- skittles_long %>%
+ group_by(color) %>%
+ summarize(avg_candies = mean(candies)) %>%
+ mutate(
+ avg_candies = round(avg_candies, 1)
+ )
+
+skittles_avg
+```
+
+```
+## # A tibble: 5 × 2
+## color avg_candies
+##
+## 1 green 11.2
+## 2 orange 12.1
+## 3 purple 12
+## 4 red 11.6
+## 5 yellow 11.5
+```
+
+BONUS POINT OPPORTUNITY: Using a similar method to rounding above, you can also capitalize the names of the colors. You don't _have_ to do this, but I'll give you bonus points if you do:
+
+- In your mutate, add a rule that updates `color` column using `str_to_title(color)`.
+
+You can read more about [converting the case of a string here](https://stringr.tidyverse.org/reference/case.html). It's part of the [stringr](https://stringr.tidyverse.org/index.html) package, which is loaded with tidyverse.
+
+### On your own: Plot the averages
+
+Now I want you to use ggplot to create a bar chart that shows the average number of candies in a bag. This is very similar to your plots of Disney Princesses and ice cream in Chapter 6.
+
+1. Build a bar chart of averge color using ggplot.
+
+Some things to consider:
+
+- I want the bars to be ordered by the highest average on top.
+- I want a good title, subtitle and byline, along with good axis names.
+- Include the values on the bars
+- Change the theme to something other than the default
+
+Here is what it should look like, but with good text, etc. The numbers shownn here may vary depending on future updates to the data:
+
+
+
+## Using Datawrapper
+
+There are some other great charting tools that journalists use. My favorite is [Datawrapper](https://www.datawrapper.de/) and is free for the level you need it.
+
+Datawrapper is so easy I don't even have to teach you how to use it. They have [excellent tutorials](https://academy.datawrapper.de/).
+
+What you do need is the data to plot, but you've already "shaped" it the way you need it. Your `skittles_avg` tibble is what you need.
+
+Here are the steps I want you to follow:
+
+### Review how to make a bar chart
+
+1. In a web browser, go to the [Datawrapper Academy](https://academy.datawrapper.de/)
+1. Click on **Bar charts**
+1. Choose **[How to create a bar chart](https://academy.datawrapper.de/article/7-bar-chart)**
+
+The first thing to note there is they show you what they expect the data to look like. Your `skittles_avg` tibble is just like this, but with Color and Candies.
+
+You'll use these directions to create your charts so you might keep this open in its own tab.
+
+### Start a chart
+
+1. In a new browser tab, go to [datawrapper.de](https://www.datawrapper.de/) and click the big **Start creating** button.
+2. Use the **Login/Sign Up** button along the top to create an account or log in if you have one.
+1. The first screen you have is where you can **Upload data** or paste it into the window. We are going to paste the data, but we have to do some stuff in R to get it.
+
+### Get your candies data
+
+We need to install a package called clipr.
+
+1. In your R project in the R Console install clipr: `install.packages("clipr")`.
+1. Start a section that says you are going to get data for Datawrapper.
+3. Create a chunk with the following and run it.
+
+
+```r
+library(clipr)
+```
+
+```
+## Welcome to clipr. See ?write_clip for advisories on writing to the clipboard in R.
+```
+
+```r
+skittles_avg %>% write_clip(allow_non_interactive = TRUE)
+```
+
+You don't see anything happen, but all the data in `skittles_long` has been added to your clipboard. You have to have the `allow_non_interactive = TRUE` part to allow your page to knit.
+
+### Build the datawrapper graphic
+
+1. Return to the browser where you are making the chart, but your cursor into the "Paste your copied data here ..." window and paste.
+1. Click **Proceed**.
+
+You can now follow the Datawrapper Academy directions to finish your chart.
+
+When you get to the Publish & Embed window, I want you to add that link to your R Notebook so I can find it for grading.
+
+## Bonus questions
+
+More opportunities for bonus points on this assignment. These aren't plots, just data wrangling.
+
+### Most/least candies
+
+Answer me this: Who got the most candies in their bag? Who got the least?
+
+I want a well-structured section (headline, text) with two chunks, one for most and one for least.
+
+### Average total candies in a bag
+
+Answer me this: What is the average number of candy in a bag?
+
+Again, well-structured section and include the code.
+
+Hint: You need a total number of candies per person before you can get an average.
+
+## Turn in your work
+
+1. Make sure your notebook runs start-to-finish.
+1. Knit the notebook
+1. Stuff your project and turn it into the Skittles assignment in Canvas.
+
+## What we learned
+
+- We learned what "tidy data" means and why it is important. It is the best shape for data wrangling and plotting.
+- We learned about [`pivot_longer()`](https://tidyr.tidyverse.org/reference/pivot_longer.html) and [`pivot_wider()`](https://tidyr.tidyverse.org/reference/pivot_wider.html) and we used `pivot_longer()` on our Skittles data.
+- Along the way we practiced a little [lubridate](https://lubridate.tidyverse.org/) conversion with `mdy_hms()` and extracted a date with `date()`.
+- We also used [`round()`]((http://www.cookbook-r.com/Numbers/Rounding_numbers/)) to round off some numbers, and you might have used `str_to_title()` to change the case of the color values.
+
+
diff --git a/docs/09-tidy-data_files/figure-html/avg-plot-1.png b/docs/09-tidy-data_files/figure-html/avg-plot-1.png
new file mode 100644
index 00000000..d438c610
Binary files /dev/null and b/docs/09-tidy-data_files/figure-html/avg-plot-1.png differ
diff --git a/docs/10-plots-answers.md b/docs/10-plots-answers.md
new file mode 100644
index 00000000..7c5c82b5
--- /dev/null
+++ b/docs/10-plots-answers.md
@@ -0,0 +1,906 @@
+# Plotting for answers {#plot-aac}
+
+For this chapter, we will use some data from the City of Austin data portal on animal intakes to the Austin Animal Center. You'll use that portal to download the data, prepare it for R, then answer some questions and make plots to show the answers.
+
+Along the way we'll learn some stuff.
+
+## Goals of this lesson
+
+- We'll use some string and date function to clean and parse some dates.
+- We'll use `count()` to make some summaries. All of our "answers" are counting operations, so we'll practice using that shortcut instead of "GSA".
+- We'll learn how to add commas to axis names in ggplot with the [scales](https://scales.r-lib.org/) package.
+- We'll use `recode()` to update some values in our data.
+
+## Questions we will answer
+
+We'll tackle these after we clean our data, but so you know where we are going:
+
+- Are animal intakes increasing or decreasing each year since 2016? Plot a column chart of intakes by year. Don't include 2021 since it is not a full year.
+- Are there seasonal monthly trends in overall animal intakes? Plot intakes by year/month from 2016 to current, including 2021. (One long line or column chart.)
+- Plot that same monthly trend, except with month as the x axis, with a new line for each year.
+- Do certain types of animals drive seasonal trends? Use five full years of data (not 2021) to summarize intakes by animal type and month. Plot with month on the x axis with a line for each animal type.
+
+## Create your project
+
+We'll be starting a new R project with our typical folder structure.
+
+1. Create a new project. Call it `yourname-aac`.
+2. Create your `data-raw` and `data-processed` folders.
+3. Create a new R Notebook, title it "AAC Import/clean" and name the file `01-import.Rmd`
+4. Add your setup section with the following libraries:
+
+
+```r
+library(tidyverse)
+library(janitor)
+library(lubridate)
+```
+
+As always, create good Markdown sections, descriptions, add notes and name your R chunks so you have good bookmarks to work with.
+
+## Download the data
+
+This time you have to go get your data online and then put the file into your `data-raw` folder yourself.
+
+1. Go to
+2. Search for "animal intakes". Find the link "Austin Animal Center Intakes" and click on it. (It may not be the first return, so be careful to get the right one.)
+
+This page tells you about the data at the top. Note that records go back to October 2013. Further down the page there is a list of the "Columns in this Dataset" that kinda describes them, and you can see an example of the data at the bottom. Review those.
+
+1. At the top-right of the page is an **Export** button. Click on that and choose **CSV**. This will save the file called `Austin_Animal_Center_Intakes.csv` to your computer's Downloads folder.
+1. Find your Downloads folder and the file on your computer, and them move the csv file into your `data-raw` folder in your project folder.
+
+Note that when you source this data in stories or charts, it comes from the Austin Animal Center, not the city's data portal. The portal is just the delivery method.
+
+## Import your data
+
+Go back into RStudio in your `01-import.Rmd` file and import the data. You've done this many times now and should be able to do it on your own.
+
+1. Don't forget to create a section with a headline, etc.
+1. Work one line at a time. Use `read_csv()` to find your data and load it onto the screen.
+1. Once that is using, add a ` %>% ` and use the `clean_names()` function to fix the column names.
+1. Once that is all good, edit the chunk to save the imported data into a new tibble called `raw_data`.
+
+
+ You shouldn't need this.
+
+
+```r
+raw_data <- read_csv("data-raw/Austin_Animal_Center_Intakes.csv") %>% clean_names()
+
+# peek at the data
+raw_data
+```
+
+```
+## # A tibble: 132,214 × 12
+## animal_id name date_time month_year found_location intake_type
+##
+## 1 A786884 *Brock 01/03/2019 … 01/03/2019 0… 2501 Magin Meadow D… Stray
+## 2 A706918 Belle 07/05/2015 … 07/05/2015 1… 9409 Bluegrass Dr i… Stray
+## 3 A724273 Runster 04/14/2016 … 04/14/2016 0… 2818 Palomino Trail… Stray
+## 4 A665644 10/21/2013 … 10/21/2013 0… Austin (TX) Stray
+## 5 A682524 Rio 06/29/2014 … 06/29/2014 1… 800 Grove Blvd in A… Stray
+## 6 A743852 Odin 02/18/2017 … 02/18/2017 1… Austin (TX) Owner Surr…
+## 7 A635072 Beowulf 04/16/2019 … 04/16/2019 0… 415 East Mary Stree… Public Ass…
+## 8 A708452 Mumble 07/30/2015 … 07/30/2015 0… Austin (TX) Public Ass…
+## 9 A818975 06/18/2020 … 06/18/2020 0… Braker Lane And Met… Stray
+## 10 A774147 06/11/2018 … 06/11/2018 0… 6600 Elm Creek in A… Stray
+## # … with 132,204 more rows, and 6 more variables: intake_condition ,
+## # animal_type , sex_upon_intake , age_upon_intake ,
+## # breed , color
+```
+
+
+
+## Fix the dates
+
+Take a look at the `date_time` and `month_year` columns. They are both **timestamps** that include both the date and time. They imported as a **character** datatype `` and are in this format:
+
+```text
+01/03/2019 04:19:00 PM
+```
+
+We won't be using the time for this exercise, so all we really need is the date. Lubridate doesn't have a conversion for this exact format (at least that I could find.) That's ok, we'll use some [stringr](https://stringr.tidyverse.org/) functions to whip this into shape.
+
+We can use a function called `str_sub()` to pluck out the date from this string. We'll create a new column using `mutate()` to do this. You've used mutate before.
+
+`str_sub()` allows you to pluck any number of characters out of a string. We want the first 10 characters: "01/03/2019".
+
+It takes three arguments:
+
+- The column you are looking at? For us, this is the `date_time` column.
+- What position do you want to start at? For us, we start at "1", the first character.
+- How many characters do you want? For us, we want "10".
+
+We use this inside a `mutate()` function to create a new column with the results of the `str_sub()` function.
+
+1. Create a new section and note we are fixing the date.
+2. Create a chunk, call your `raw_data` and pipe that into a `mutate()` function.
+3. Inside your mutate, name your new column `intake_date`.
+4. Set `intake_date` to `=` to `str_sub(date_time, 1, 10)`.
+
+Try it, and then check the last column of the data that comes back to make sure you actually have a 10-character string like "01/03/2019".
+
+
+ Part of the answer
+
+```r
+raw_data %>%
+ mutate(
+ intake_date = str_sub(date_time, 1, 10)
+ )
+```
+
+```
+## # A tibble: 132,214 × 13
+## animal_id name date_time month_year found_location intake_type
+##
+## 1 A786884 *Brock 01/03/2019 … 01/03/2019 0… 2501 Magin Meadow D… Stray
+## 2 A706918 Belle 07/05/2015 … 07/05/2015 1… 9409 Bluegrass Dr i… Stray
+## 3 A724273 Runster 04/14/2016 … 04/14/2016 0… 2818 Palomino Trail… Stray
+## 4 A665644 10/21/2013 … 10/21/2013 0… Austin (TX) Stray
+## 5 A682524 Rio 06/29/2014 … 06/29/2014 1… 800 Grove Blvd in A… Stray
+## 6 A743852 Odin 02/18/2017 … 02/18/2017 1… Austin (TX) Owner Surr…
+## 7 A635072 Beowulf 04/16/2019 … 04/16/2019 0… 415 East Mary Stree… Public Ass…
+## 8 A708452 Mumble 07/30/2015 … 07/30/2015 0… Austin (TX) Public Ass…
+## 9 A818975 06/18/2020 … 06/18/2020 0… Braker Lane And Met… Stray
+## 10 A774147 06/11/2018 … 06/11/2018 0… 6600 Elm Creek in A… Stray
+## # … with 132,204 more rows, and 7 more variables: intake_condition ,
+## # animal_type , sex_upon_intake , age_upon_intake ,
+## # breed , color , intake_date
+```
+
+
+### Edit to convert to a real date
+
+If you did the above correctly, you should have a column called `intake_date` as the last column, but it isn't actually a **date** yet, it is just characters that look like a date. We'll fix that now.
+
+1. Edit your date-fix chunk to add another rule INSIDE your mutate.
+2. The new column will still be `intake_date =` but now you'll set that to `mdy(intake_date)`
+3. Run the chunk and make sure that your same last column `intake_date` says `` right below the name. The order should now be `2019-01-03`.
+4. Now that this all works, assign all this using `<- ` into a tibble called `date_fix`.
+5. Add a `glimpse()` of the date_fix tibble in the same chunk so you can eyeball the results.
+
+
+ this was simlar to converting the date in billboard
+
+```r
+date_fix <- raw_data %>%
+ mutate(
+ intake_date = str_sub(date_time, 1, 10),
+ intake_date = mdy(intake_date)
+ )
+
+date_fix %>% glimpse()
+```
+
+```
+## Rows: 132,214
+## Columns: 13
+## $ animal_id "A786884", "A706918", "A724273", "A665644", "A682524"…
+## $ name "*Brock", "Belle", "Runster", NA, "Rio", "Odin", "Beo…
+## $ date_time "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ month_year "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ found_location "2501 Magin Meadow Dr in Austin (TX)", "9409 Bluegras…
+## $ intake_type "Stray", "Stray", "Stray", "Stray", "Stray", "Owner S…
+## $ intake_condition "Normal", "Normal", "Normal", "Sick", "Normal", "Norm…
+## $ animal_type "Dog", "Dog", "Dog", "Cat", "Dog", "Dog", "Dog", "Dog…
+## $ sex_upon_intake "Neutered Male", "Spayed Female", "Intact Male", "Int…
+## $ age_upon_intake "2 years", "8 years", "11 months", "4 weeks", "4 year…
+## $ breed "Beagle Mix", "English Springer Spaniel", "Basenji Mi…
+## $ color "Tricolor", "White/Liver", "Sable/White", "Calico", "…
+## $ intake_date 2019-01-03, 2015-07-05, 2016-04-14, 2013-10-21, 2014…
+```
+
+
+Now that you can see the `date_time` and `intake_date` columns at once, check to make sure they converted correctly and you don't have any problems. Doublecheck the datatype for `intake_date`, which should be ``.
+
+## Parse the date into helpful variables
+
+Now that we have a good date to work with, we can use other [lubridate](https://lubridate.tidyverse.org/) functions to create some versions of the date that will help us down the road when we do summaries and plots.
+
+> TBH, just diving into the data at this point you might not _know_ you need these date parts yet until you try to create summaries and plots. If you find later that you need helpful columns like this, you can always come back to your import notebook, create and re-run it to get updated data. In the interest of time I'm front-loading the need based on experience.
+
+We are going to create three variations of the date to help us later:
+
+- A `yr` column with just the year, like `2019`.
+- A `mo` column with the month, but using the name, like `Jan`.
+- A `yrmo` column like `2019-01`
+
+We'll do this in the same `mutate()` function, but we'll use different methods to do each one, which is a useful learning experience. We'll also use `select()` to reorder our columns to put these all at front of the tibble so we can see them.
+
+We'll work through this out in the open so I can explain as we go along.
+
+### Extract the year
+
+We can use `year()` from lubridate to pluck the YYYY value from `intake_date`. We'll use this to build our mutate.
+
+1. Create a new section and note we are creating helpful date parts.
+2. Add the following chunk so we can get started.
+
+
+```r
+date_parts <- date_fix %>%
+ mutate(
+ yr = year(intake_date) # creates yr and fills it with YYYY
+ )
+
+# peek
+date_parts %>% glimpse()
+```
+
+```
+## Rows: 132,214
+## Columns: 14
+## $ animal_id "A786884", "A706918", "A724273", "A665644", "A682524"…
+## $ name "*Brock", "Belle", "Runster", NA, "Rio", "Odin", "Beo…
+## $ date_time "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ month_year "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ found_location "2501 Magin Meadow Dr in Austin (TX)", "9409 Bluegras…
+## $ intake_type "Stray", "Stray", "Stray", "Stray", "Stray", "Owner S…
+## $ intake_condition "Normal", "Normal", "Normal", "Sick", "Normal", "Norm…
+## $ animal_type "Dog", "Dog", "Dog", "Cat", "Dog", "Dog", "Dog", "Dog…
+## $ sex_upon_intake "Neutered Male", "Spayed Female", "Intact Male", "Int…
+## $ age_upon_intake "2 years", "8 years", "11 months", "4 weeks", "4 year…
+## $ breed "Beagle Mix", "English Springer Spaniel", "Basenji Mi…
+## $ color "Tricolor", "White/Liver", "Sable/White", "Calico", "…
+## $ intake_date 2019-01-03, 2015-07-05, 2016-04-14, 2013-10-21, 2014…
+## $ yr 2019, 2015, 2016, 2013, 2014, 2017, 2019, 2015, 2020,…
+```
+
+Look how I set up this chunk to work with it. I know that I'm going to be adding columns and checking values and it is a pain click to the end of the tibble each time to see the results. So what I've done is set this up to go into a new tibble called `date_parts` and then I glimpse that at the end so I can peek at the results. This allows me to look at the first couple of values in the glimpse to make sure I've done the work right. **I'll still be working one line at a time** as I edit the chunk further, but at least I can _see_ what I'm doing.
+
+Now, note we have a new column `yr` at that starts with "2019", which matches what is in `intake_date` (and even `date_time`). This is good.
+
+Can you see how our mutate created the new `yr` column?
+
+- We name the new column `yr`
+- We fill that column with `year(intake_date)`, which plucks the year from that column.
+
+### Extract the month name
+
+We'll edit the **same chunk** to do a similar action to get the name of the month in a new column. You'll see in a minute how we can choose to get the _name_ of the month instead of the number.
+
+1. Edit your chunk to add a new line to the mutate function. Don't forget the comma after the existing rule.
+2. Add the new line as indicated below, then run it to see the results.
+
+
+```r
+date_parts <- date_fix %>%
+ mutate(
+ yr = year(intake_date), # don't forget the comma
+ mo = month(intake_date) # the new mutate rule to get month
+ )
+
+# peek
+date_parts %>% glimpse()
+```
+
+```
+## Rows: 132,214
+## Columns: 15
+## $ animal_id "A786884", "A706918", "A724273", "A665644", "A682524"…
+## $ name "*Brock", "Belle", "Runster", NA, "Rio", "Odin", "Beo…
+## $ date_time "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ month_year "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ found_location "2501 Magin Meadow Dr in Austin (TX)", "9409 Bluegras…
+## $ intake_type "Stray", "Stray", "Stray", "Stray", "Stray", "Owner S…
+## $ intake_condition "Normal", "Normal", "Normal", "Sick", "Normal", "Norm…
+## $ animal_type "Dog", "Dog", "Dog", "Cat", "Dog", "Dog", "Dog", "Dog…
+## $ sex_upon_intake "Neutered Male", "Spayed Female", "Intact Male", "Int…
+## $ age_upon_intake "2 years", "8 years", "11 months", "4 weeks", "4 year…
+## $ breed "Beagle Mix", "English Springer Spaniel", "Basenji Mi…
+## $ color "Tricolor", "White/Liver", "Sable/White", "Calico", "…
+## $ intake_date 2019-01-03, 2015-07-05, 2016-04-14, 2013-10-21, 2014…
+## $ yr 2019, 2015, 2016, 2013, 2014, 2017, 2019, 2015, 2020,…
+## $ mo 1, 7, 4, 10, 6, 2, 4, 7, 6, 6, 8, 10, 7, 2, 3, 2, 11,…
+```
+
+What we get in return here is the _number_ of the month: A "1" for January; a "7" for July, etc. What we really want is the _names_ of the month to help us with plotting later.
+
+1. Edit the mutate to add `, label = TRUE` within the `month()`.
+
+
+```r
+date_parts <- date_fix %>%
+ mutate(
+ yr = year(intake_date),
+ mo = month(intake_date, label = TRUE) # add the label argument
+ )
+
+# peek
+date_parts %>% glimpse()
+```
+
+```
+## Rows: 132,214
+## Columns: 15
+## $ animal_id "A786884", "A706918", "A724273", "A665644", "A682524"…
+## $ name "*Brock", "Belle", "Runster", NA, "Rio", "Odin", "Beo…
+## $ date_time "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ month_year "01/03/2019 04:19:00 PM", "07/05/2015 12:59:00 PM", "…
+## $ found_location "2501 Magin Meadow Dr in Austin (TX)", "9409 Bluegras…
+## $ intake_type "Stray", "Stray", "Stray", "Stray", "Stray", "Owner S…
+## $ intake_condition "Normal", "Normal", "Normal", "Sick", "Normal", "Norm…
+## $ animal_type "Dog", "Dog", "Dog", "Cat", "Dog", "Dog", "Dog", "Dog…
+## $ sex_upon_intake "Neutered Male", "Spayed Female", "Intact Male", "Int…
+## $ age_upon_intake "2 years", "8 years", "11 months", "4 weeks", "4 year…
+## $ breed