This short lesson summarizes the topics we covered in section 10 and why they'll be important to you as a data scientist.
You will be able to:
- Understand and explain what was covered in this section
- Understand and explain why this section will help you become a data scientist
In this section, the nominal focus was on how to perform a linear regression, but the real value was learting how to think about the application of machine learning models to data sets. Key takeaways include:
- The Pearson Correlation (range: -1 -> 1) is a standard way to describe the correlation between two variables
- Statistical learning theory deals with the problem of finding a predictive function based on data
- A loss function calculates how well a given model represents the relationship between data values
- A linear regression is simply a (straight) line of best fit for predicting a continuous value (y = mx + c)
- The Coefficient of Determination (R Squared) can be used to determine how well a given line fits a given data set
- Certain assumptions must hold true for a least squares linear regression to be useful - linearity, normality and heteroscedasticity
- Q-Q plots can check for normality in residual errors
- The Jarque-Bera test can be used to test for normality - especially when the number of data points is large
- The Goldfeld-Quant test can be used to check for homoscedasticity