Hello! This is an issue with information about my PR: #57
Here are the changes that I made:
- Fixed package imports to be fit for the most recent version of the
statsmodels
package. The change here is subtle - statsmodels now uses it's .api subpackage for data and labels and .formula.api for the case where the formula and dataframe are parameters. I also correct the capitalization errors
- I added intuitive explanations for the regularization module to explain what both types of regularization actually do and an explanation in English on how they perform that regularization
- I added some additional details on why we hold out a test set rather than just using the same data itself to test the model. Additionally, I incorporated information from the theoretical slides for this module, briefly mentioning the concept of overfitting and how by holding out a test set we can evaluate for this
- I added labels to every plot that was missing them. I believe this practice is critical when creating Notebooks that will be read by others and the best way to make this habit is to show students in our own examples :)
- I added explanations for any technical jargon within the notebooks. From my understanding and perception of the modules, the theoretical details are being applied in code with these Notebooks, as such, students should have that theoretical information readily available when they are programming. As such, I added explanations for the differences between uni- and multi- variate models as well as the descriptions for the statistical assumptions that are made.
- Some minor spelling changes
- Finally, I added descriptions about why we are using regression on this task in comparison to classification in the first submodule of this module to ease students into the coding
What I would also like to do in the future (building on this PR):
- Add intuition for why standardization works
- Add explanations for the statistical tests that are performed to validate our statistical assumptions when using linear regression
- Standardize usage of Seaborn or Matplotlib
I think the code for this module is truly fantastic, it is very comprehensive and visually appealing. I went in with the mindset of "learning linear regression for the first time" and the code flows extremely well and the visualizations allow for an intuitive understanding. As such, I primarily focused on adding aids and guiding students with intuitive descriptions of what we are doing with our program at each step.
NOTE: Please change the y-axis label for the submodules 3_1 and 3_2 in the title plot! It currently says "Input Feature" where as that is really the value we are looking to predict. See below:
Please let me know if you have any questions or concerns about my changes.