Coder Social home page Coder Social logo

tidydata-tutorial's Introduction

TidyData - Get and clean data from wearables UCI repository

About

This readme describes the transformations performed to convert the raw data into the tidy data.

You can run the file run_analysis.R on the raw data to get the tidy data.
There are no parameters in the script.
Below are the steps performed in the script.

Pre-requisites:

The raw dataset has been downloaded and unzipped in the same directory where the script is located. An alert message is displayed if the files do not exist.

Steps for tidying the data:

Assuming that all files have been read in memory.

  1. Merge the training and the test sets to create one data set. This will be the main data frame.
    Since test and train files have variables with common names - you can verify with intersect(names(df1), names(df2)), then you can use a normal merge() function.
    Just be sure you always follow the same order. In this case it's first test, then train.

  2. Uses descriptive activity names to name the activities in the data set
    The variables of the X data are the single measurements and they are described in the file features.txt (that went into the data frame called “features”): each feature has a label in the text.
    The script simply extracts the labels (variable “name” in features) and add them to the X data set.

  3. Extracts only the measurements on the mean and standard deviation for each measurement.
    Some measurements (such as mean or max) have been derived in the original data set. We want only them.
    Which they are, it can be found from the column name of the X data set, for example using the grepl() function.
    Every feature that contains the string “mean” OR “Mean” OR “std” is taken.

  4. Appropriately labels the data set with descriptive variable names.
    The features have already the correct name (step 2); the script now:

  • adds the subject to the main data frame (allX) and change its column into “Subject”
  • adds the activity to the main data frame (allX) and map the correct activity label using the match() function.
  1. From the data set in step 4, creates a second, independent tidy data set with the average of each variable for each activity and each subject.
  • Creates a new data frame called tidy by first grouping the main data set by Subject and Activity; using the group_by() function from the dplyr package
  • the result is piped into the the function summarise_each() and passing to it the function mean(). This calculates the mean for each group.
  1. Last, create the final tidy data set to be uploaded
    The file is created in the same directory where the script is. The name is “tiny_data.txt”

How to read the tidy data set

Use these R lines (from the path where the file is):

  data <- read.table("tiny_data.txt", header = TRUE)
  View(data)

tidydata-tutorial's People

Contributors

mashimo avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.