Logistic Regression Model Comparisons - Lab
Introduction
In this lab, we will further investigate some comparisons between our personal logistic regression implementation, that of sci-kit learn and further tuning parameters that can be adjusted in the model.
Objectives
- Understand and implement logistic regression
- Compare logistic model outputs
In the previous lab, we were able to recreat a logistic regression model output from sci-kit learn that did not include an intercept of regularization. Here, you will continue to analyze the impact of several tuning parameters including the intercept, and regularization parameter which we have not discussed previously.
Importing the Data
As with the previous lab, import the dataset stored in heart.csv
#Your code here
Problem Formulation
Define X and y as with the previous lab. This time, follow best practices and also implementk a standard train-test split.
For consistency of results, use random_state=17.
#Your code here
Initial Model - Personal Implementation
Use your code from the previous lab to once again train a logistic regression algorithm on the training set.
# Your code here
Now use your algorithm to make [probability] predictions on the test set
#Your code here
Create an ROC curve for your predictions
#Your code here
Update your ROC curve to not only include a graph of the test set, but one of the train set
# Your code here
Create a confusion matrix for your predictions
Use a standard decision boundary of .5 to convert your probabilities output by logistic regression into binary classifications. (Again this should be for the test set.) Afterwards, feel free to use the built in sci-kit learn methods to compute the confusion matrix as we discussed in previous sections.
# Your code here
Initial Model - sci-kit learn
Do the same using the built in method from sci-kit learn. To start, create an identical model as you did in the last section; turn off the intercept and set the regularization parameter, C, to a ridiculously large number such as 1e16.
# Your code here
Create an ROC Curve for the sci-kit learn model
#Your code here
As before add an ROC curve to the graph for the train set as well
#Your code here
Adding an Intercept
Now add an intercept to the sci-kit learn model. Keep the regularization parameter C set to a very large number such as 1e16. Plot all three models ROC curves on the same graph.
# Your code here
Altering the Regularization Parameter
Now, experiment with altering the regularization parameter. At minimum, create 5 different subplots with varying regularization (C) parameters. For each, plot the ROC curve of the train and test set for that specific model.
Regularization parameters between 1 and 20 are recommended. Observe the difference in test and train auc as you go along.
# Your code here
Comment on how the Regularization Parameter Impacts the ROC curves plotted above
#Your response here
Summary
In this lesson, we reviewed many of the accuracy measures of classification algorithms and observed the impact of additional tuning parameters such as regularization.