Skip to content

mark79/GettingandCleaningData

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

##run_analysis.R

This R script performs some basic cleaning and simplification of the Human Activity Recognition(HAR) Using Smartphones Data Set. From the UCI Machine Learning Repository. In order to run this script you must download the HAR dataset and unzip it into R's current working directory. Link to dataset and description is below.

Abstract: Human Activity Recognition database built from the recordings of 30 subjects performing activities of daily living (ADL) while carrying a waist mounted smartphone with embedded inertial sensors.

More detail can be found here and download here

This script does 5 things:

  1. Merges the training and the test sets to create one data set.
  • First R reads in the files X_test.txt, y_test.txt, x_train.txt,y_train.txt, features.txt, subject_test.txt and subject_train.txt then merges them into one dataset. run_analysis.R lines 1-28
  1. Extracts only the measurements on the mean and standard deviation for each measurement.
  • Regular expresions are then used to extract all collumns with mean or std in the name as well as to retain the Subject and Activity collumn. Line 32.

  • Regular expressions are further used to rename activity names in accordance with tidy data principles and more importantly to make sure the names are friendly to R ie. no () characters Lines 34-38.
  1. Uses descriptive activity names to name the activities in the data set
  2. Appropriately labels the data set with descriptive activity names.
  • The file activities_lables.txt is read into R and then the numeric activities column is replaced by the appropriate descriptive activity label. Lines 42-46.
  1. Creates a second, independent tidy data set with the average of each variable for each activity and each subject.
  • The data is reshaped to include mean averages of all features by subject and activity Each row will now consist of mean averages for all features for each subject during each activity An important not is that all STD measures are mean STD's this is not reflected in the variable name in order to prevent names of unweildly length. Lines 53-55.

  • Finally, the transformed dataset is written to the disk in the current working directory as tidydataset.txt and ready for upload.

Note: All of the inertial signals files were unprocessed as their values were not required for the final data set.

About

Coursera Getting and Cleaning Data Course Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages