Skip to content

Commit

Permalink
submission of Project 1
Browse files Browse the repository at this point in the history
  • Loading branch information
downthesun01 committed Aug 4, 2014
1 parent dc20c7c commit d5739ec
Show file tree
Hide file tree
Showing 11 changed files with 753 additions and 6 deletions.
203 changes: 197 additions & 6 deletions PA1_template.Rmd
Original file line number Diff line number Diff line change
@@ -1,20 +1,211 @@
# Reproducible Research: Peer Assessment 1
---
output:
html_document:
keep_md: yes
---
Peer Assessment 1
========================

Necessary libraries for the analysis of the activity.csv data

## Loading and preprocessing the data
```{r}
library(lattice)
```

Read in the activity.csv file
```{r}
temp <- tempfile()
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2Factivity.zip",temp)
data <- read.csv(unz(temp, "activity.csv"))
unlink(temp)
head(data, n = 5)
## What is mean total number of steps taken per day?
```
### **What is the total number of steps taken per day?**

- **Make a histogram of the total number of steps taken each day**

```{r}
options(scipen=999)
## What is the average daily activity pattern?
#Create a data frame for the total number of steps per day for the dataset
totalSteps <- aggregate(data$steps, list(data$date), "sum")
#Rename variables for the dataset containing the step totals per day
names(totalSteps) <- c("date", "total")
#Create a histogram of the total number of steps per day for the dataset
hist(totalSteps$total,
main = "Histogram of The Total Number of Steps Per Day",
xlab = "Total Number of Steps Per Day",
col = "blue")
```


- **Calculate and report the *mean* and *median* total number of steps taken per day**

The mean total number of steps taken per day is `r mean(totalSteps$total, na.rm = TRUE)`

```{r}
mean(totalSteps$total, na.rm = TRUE)
```

The median total number of steps taken per day is `r median(totalSteps$total, na.rm = TRUE)`

```{r}
median(totalSteps$total, na.rm = TRUE)
```

### **What is the average daily activity pattern?**

- **Make a time series plot (i.e. `type = "l"`) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all days (y-axis)**

```{r}
#Create data frame for the mean number of steps per 5 minute interval
meanSteps <- aggregate(data$steps, list(data$interval), "mean", na.rm = TRUE)
#Rename the variables in meanSteps
names(meanSteps) <- c("interval", "mean")
xyplot(mean ~ interval,
data = meanSteps,
type = "l",
main = "Mean Number of Steps Per Interval",
xlab = "Interval",
ylab = "Mean Number of Steps")
```


- **Which 5-minute interval, on average across all the days in the dataset, contains the maximum number of steps?**

The 5-minute interval, on average across all the days in the dataset, that contains the maximum number of steps is `r meanSteps[meanSteps$mean == max(meanSteps$mean, na.rm = TRUE), "interval"]`

```{r}
meanSteps[meanSteps$mean == max(meanSteps$mean, na.rm = TRUE), "interval"]
```



### **Inputing missing values**

**Note that there are a number of days/intervals where there are missing values (coded as `NA`). The presence of missing days may introduce bias into some calculations or summaries of the data.**

- **Calculate and report the total number of missing values in the dataset (i.e. the total number of rows with `NA`s)**

There are `r sum(is.na(data$steps) == TRUE)` missing values in the dataset

```{r}
sum(is.na(data$steps) == TRUE)
```
- **Devise a strategy for filling in all of the missing values in the dataset. The strategy does not need to be sophisticated. For example, you could use the mean/median for that day, or the mean for that 5-minute interval, etc.**

The strategy used consists of replacing the all NA's in data$steps with the mean number of steps for the corresponding time interval. Means were rounded to the nearest integer.

- **Create a new dataset that is equal to the original dataset but with the missing data filled in.**

A new dataset named filledData is created.

```{r}
#Replace all NA's in data$steps with the mean number of steps for the corresponding time interval.
#Means were rounded to the nearest integer
filledData <- data
for(i in seq_len(length(data$steps)))
{
if(is.na(filledData$steps[i]) == TRUE)
{
filledData$steps[i] <- round(meanSteps[meanSteps$interval == filledData$interval[i], "mean"], digits = 0)
}
}
head(filledData, n = 5)
```

- **Make a histogram of the total number of steps taken each day and Calculate and report the *mean* and *median* total number of steps taken per day. Do these values differ from the estimates from the first part of the assignment? What is the impact of imputing missing data on the estimates of the total daily number of steps?**

```{r}
#Create a data frame for the total number of steps per day for the new data frame (filledData) that contains no NA's
totalStepsFilled <- aggregate(filledData$steps, list(filledData$date), "sum")
#Rename variables for the data frame containing the step totals with the NA's filled in
names(totalStepsFilled) <- c("date", "total")
#Create a histogram of the total number of steps per day for the new data frame that contains no NA's
hist(totalStepsFilled$total,
main = expression(atop("Histogram of The Total Number of Steps Per Day", "(NA's filled in with interval means)")),
xlab = "Total Number of Steps Per Day",
col = "red")
```


The mean total number of steps taken per day is `r mean(totalStepsFilled$total)`

```{r}
mean(totalStepsFilled$total)
```

The median total number of steps taken per day is `r median(totalStepsFilled$total)`

```{r}
median(totalStepsFilled$total)
```
Once the NA's were filled in with the mean intervals, the mean decreased by `r mean(totalSteps$total, na.rm = TRUE) - mean(totalStepsFilled$total)` and the median decreased by `r median(totalSteps$total, na.rm = TRUE) - median(totalStepsFilled$total)`

```{r}
mean(totalSteps$total, na.rm = TRUE) - mean(totalStepsFilled$total)
median(totalSteps$total, na.rm = TRUE) - median(totalStepsFilled$total)
```


### **Are there differences in activity patterns between weekdays and weekends?**

**For this part the `weekdays()` function may be of some help here. Use the dataset with the filled-in missing values for this part.**

- **Create a new factor variable in the dataset with two levels - "weekday" and "weekend" indicating whether a given date is a weekday or weekend day.**

```{r}
#Create a new column in filledData data frame with a weekday for the corresponding value in the date column for each row
#Used as.POSIXlt() to convert values in date column from interger type to date type
filledData$weekFactor <- weekdays(as.POSIXlt(filledData$date))
#Turned the column weekFactor into a two level factor by converting Saturday and Sunday values to a weekend value and
#converting Monday-Friday values to a weekday value. Set the class of filledData$weekFactor to Factor.
for(i in seq_len(length(filledData$weekFactor)))
{
if(filledData$weekFactor[i] == "Saturday" | filledData$weekFactor[i] == "Sunday")
{
filledData$weekFactor[i] <- "weekend"
}
else
{
filledData$weekFactor[i] <- "weekday"
}
}
filledData$weekFactor <- as.factor(filledData$weekFactor)
head(filledData, n = 5)
```

- **Make a panel plot containing a time series plot (i.e. `type = "l"`) of the 5-minute interval (x-axis) and the average number of steps taken, averaged across all weekday days or weekend days (y-axis).**

```{r}
#Create data frame for the mean number of steps per 5 minute interval for the filled data
meanStepsFilled <- aggregate(filledData$steps, by = list(filledData$interval, filledData$weekFactor), "mean", na.rm = TRUE)
#Rename the variables in meanSteps
names(meanStepsFilled) <- c("interval","weekFactor", "mean")
#Create a lattice line plot comparing the mean number of steps taken per interval for weekdays vs. weekends
xyplot(mean ~ interval | weekFactor,
data = meanStepsFilled,
type = "l",
layout = c(1, 2),
main = "Mean Number of Steps Per Interval",
xlab = "Interval",
ylab = "Mean Number of Steps")
```

## Imputing missing values



## Are there differences in activity patterns between weekdays and weekends?
266 changes: 266 additions & 0 deletions PA1_template.html

Large diffs are not rendered by default.

Loading

0 comments on commit d5739ec

Please sign in to comment.