This depository is Case Study 2 for Doing Data Science 6306 Section 401 Tuesdays at 9:30 - 11:00 PM EST, Cohort 2017 Spring semester at SMU -- "DDS-Case-Study-2" for short. Author: Yao Yao/Robert Flamenbaum. This project was submitted through GitHub on RStudio version 1.0.136.
There are 3 directories in this depository: Data, Question 1 Code, and Other with the Paper files in the Root directory
This case study is a cumlative exercise in programming of what we learned in MSDS 6306.
Question 1 covers briefly SAS, python, and R code for matrix creation.
Question 2 is a data exploration of stock and time series for log returns and volatility.
Question 3 is a data exploration of orange trees by type, age, and circumference. Orange is a built-in data set for R.
Question 4 is a data exploration of temperatures in various cities and countries over months and years. City temperature is from CityTemp.csv while country temperature is from Temp.csv.
Question 5 is a plot of sine and cosine in polar coordinates.
Questions 2 to 4 are exercises in cleaning and analyzing data using tables and ggplot. R markdown was used to source different data files and the creation of the paper file.
"Yao Yao Robert Flamenbaum Case Study 2 MSDS 6306 401 Q1.docx" is the document file to capture the code and the screenshot outputs for question 1.
"Yao Yao Robert Flamenbaum Case Study 2 MSDS 6306 401 Q2-Q5.Rmd" is the R markdown file that is fully follows the rubric of annotating code for analysis with conclusive statements for questions 2 to 5
"Yao_Yao_Robert_Flamenbaum_Case_Study_2_MSDS_6306_401_Q2-Q5.pdf" is the final knitted paper that combines text, code, and output into one conclusive file ready for submission for questions 2 to 5
"DDS-Case-Study-2.Rproj" is the R project file for Rstudio
"CS2.sas" is the SAS code to create the matrix
"CS2.R" is the R code to create the matrix
"CS2.ipynb" is the python code to create the matrix
"DailyClosingPrice.csv" is the raw data from the AGIO stock
"LogReturns.csv" is the log returns of the AGIO stock
"LogReturnsVolatility" is the volatility decay at 10, 30, and 100 of the AGIO stock
"TEMP.csv" is the original data of average monthly temperatures of countries
"CityTemp.csv" is the original data of average monthly temperatures of cities
"Temperature2.csv" is the raw average monthly temperatures of countries imported
"CityTemp2.csv" is the raw average monthly temperatures of cities imported
"Temp1900.csv" is the cleaned average monthly temperatures of countries for 1900-2013
"CityTemp1900.csv" is the cleaned average monthly temperatures of cities for 1900-2013
"UStemp.csv" is the cleaned average monthly temperatures of US for 1990-2013
"DescRangeStdevTemp1900.csv" is the countries with the top temperature range by descending order
"DescCityRangeStdevTemp1900.csv" is the cities with the top temperature range by descending order
"Orange.csv" is the original orange dataset
"Orange.xlsx" is the original orange dataset with prework
"Case study 2 -401.pdf" is the original assignment sheet
"Case study 2.docx" is the original assignment sheet converted into word
"SP500.txt" is the scaffolding code to do Question 2