A Complete Data Science Project Using Multiple Regression - Introduction
Introduction
In this section, you'll get a chance to synthesize your skills and work through the entire Data Science workflow. To start, you'll extract appropriate data from a SQL database. From there, you'll continue exploring and cleaning your data, modeling the data, and conducting statistical analyses!
Data Science Processes
You'll take a look at three general frameworks for conducting Data Science processes using the skills you've learned thus far:
- CRoss-Industry Standard Process for Data Mining - CRISP-DM
- Knowledge Discovery in Databases - KDD
- Obtain Scrub Explore Model iNterpret - OSEMN
Note: OSEMN is pronounced "OH-sum" and rhymes with "possum"
From there, the lessons follow a similar structure:
Obtaining Data
You'll review SQL and practice importing data from a relational database using the ETL (Extract, Transform and Load) process.
Scrubbing Data
From there, you'll practice cleaning data:
- Casting columns to the appropriate data types
- Identifying and dealing with null values appropriately
- Removing columns that aren't required for modeling
- Checking for and dealing with multicollinearity
- Normalizing the data
Exploring Data
Once you've the cleaned data, you'll then do some further EDA (Exploratory Data Analysis) to check out the distributions of the various columns, examine the descriptive statistics for the dataset, and to create some initial visualizations to better understand the dataset.
Modeling Data
Finally, you'll create a definitive model. This will include fitting an initial regression model, and then conducting statistical analyses of the results. You'll take a look at the p-values of the various features and perform some feature selection. You'll test for regression assumptions including normality, heteroscedasticity, and independence. From these tests, you'll then refine and improve the model, not just for performance, but for interpretability as well.
Summary
In this section, you'll conduct end-to-end review of the Data Science process!