Coder Social home page Coder Social logo

phase_3_review_quanggang's Introduction

Phase 3 Review

review guy

TOC

  1. Gradient Descent
  2. Logistic Regression
  3. Confusion Matrix
  4. Accuracy/Precision/Recall/F1
  5. auc_roc
  6. Algos
from src.student_caller import one_random_student
from src.student_list import quanggang

Gradient Descent

Question: What is a loss function? (Explain it in terms of the relationship between true and predicted values)

one_random_student(quanggang)

Question: What loss functions do we know and what types of data work best with each?

one_random_student(quanggang)

To solidify our knowledge of gradient descent, we will use Sklearn's stochastic gradient descent algorithm for regression SGDRegressor. Sklearn classifiers share many methods and parameters, such as fit/predict, but some have useful additions. SGDRegressor has a new method called partial_fit, which will allow us to inspect the calculated coefficients after each step of gradient descent.
We will use the diabetes dataset for this task.

from sklearn.datasets import load_diabetes
import numpy as np

data = load_diabetes(as_frame=True)
X = data['data']
y = data['target']
X.shape
X.head()
from sklearn.linear_model import SGDRegressor
# Instantiate a SGDRegressor object and run partial fit on X and y. For now, pass the argument `penalty=None`
one_random_student(quanggang)
# Inspect the coefficient array
one_random_student(quanggang)
# Import mean_squared_error from metrics, and pass in the true ys, an array of predictions
# and the agrument squared = False
one_random_student(quanggang)
# Repeat the partial fit. Inspect, RMSE, coefficients.
one_random_student(quanggang)

Pick a coefficient, and explain the gradient descent update.

Together, let's plot the trajectory of one coefficient against the loss.

# code

Compare that to a full fit of the SGDRegressor.

# code

Logistic Regression and Modeling

What type of target do we feed the logistic regression model?

one_random_student(quanggang)
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer(as_frame=True)
X = data['data']
y = data['target']
# Perform a train-test split
one_random_student(quanggang)

Question: What is the purpose of train/test split?

one_random_student(quanggang)

Question: Why should we never fit to the test portion of our dataset?

one_random_student(quanggang)
# Scale the training set using a standard scaler
ss = None
X_train_scaled = None
one_random_student(quanggang)
X_train_scaled.head()

Question: Why is scaling our data important? For part of your answer, relate to one of the advantages of logistic regression over another classifier.

# fit model with logistic regression to the appropriate portion of our dataset
one_random_student(quanggang)

Now that we have fit our classifier, the object lr has been filled up with information about the best fit parameters. Take a look at the coefficients held in the lr object. Interpret what their magnitudes mean.

# Inspect the .coef_ attribute of lr and interpret
one_random_student(quanggang)

Logistic regression has a predict method just like linear regression. Use the predict method to generate a set of predictions (y_hat_train) for the training set.

# use predict to generate a set of predictions
y_hat_train = None
one_random_student(quanggang)

Confusion Matrix

Confusion matrices are a great way to visualize the performance of our classifiers.

Question: What does a good confusion matrix look like?

one_random_student(quanggang)
# create a confusion matrix for our logistic regression model fit on the scaled training data
one_random_student(quanggang)

Accuracy/Precision/Recall/F_1 Score

We have a bunch of additional metrics, most of which we can figure out from the CM

Question: Define accuracy. What is the accuracy score of our classifier?

# Confirm accuracy in code
one_random_student(quanggang)

Question: Why might accuracy fail to be a good representation of the quality of a classifier?

one_random_student(quanggang)

Question: Define recall. What is the recall score of our classifier?

# Confirm recall in code
one_random_student(quanggang)

Question: Define precision? What is the precision score of our classifier?

# Confirm precision in code
one_random_student(quanggang)

Question: Define f1 score? What is the f1 score score of our classifier?

one_random_student(quanggang)

Auc_Roc

The AUC_ROC curve can't be deduced from the confusion matrix. Describe what the AUC_ROC curve shows. Look here for some nice visualizations of AUC_ROC. Describe the AUC_ROC curve. What does a good AUC_ROC curve look like? What is a good AUC_ROC score?

one_random_student(quanggang)

One of the advantages of logistic regression is that it generates a set of probabilities associated with each prediction. What is the default threshold? How would decrease or increasing your threshold affect true positive and false positive rates?

For our scaled X_train, generate an array of probabilities associated with the probability of the positive class.

# your code here
one_random_student(quanggang)

Now, using those probabilities, create two arrays, one which converts the probabilities to label predictions using the default threshold, and one using a threshold of .4. How does it affect our metrics?

# Plot the AUC_ROC curve for our classifier

More Algorithms

Much of the sklearn syntax is shared across classifiers and regressors. Fit, predict, score, and more are methods associated with all sklearn classifiers. They work differently under the hood. KNN's fit method simply stores the training set in memory. Logistic regressions .fit() does the hard work of calculating coefficients.

lazy_george

However, each algo also has specific parameters and methods associated with it. For example, decision trees have feature importances and logistic has coefficients. KNN has n_neighbors and decision trees has max_depth.

Getting to know the algo's and their associated properties is an important area of study.

That being said, you now are getting to the point that no matter which algorithm you choose, you can run the code to create a model as long as you have the data in the correct shape. Most importantly, the target is the appropriate form (continuous/categorical) and is isolated from the predictors.

Here are the algos we know so far.

  • Linear Regression
  • Lasso/Ridge Regression
  • Logistic Regression
  • Naive-Bayes
  • KNN
  • Decision Trees

Note that KNN and decision trees also have regression classes in sklearn.

Here are two datasets from seaborn and sklearn. Let's work through the process of creating simple models for each.

import seaborn as sns
penguins = sns.load_dataset('penguins')
penguins.head()

Question: What algorithm would be appropriate based on the target

# split target from predictors
one_random_student(quanggang)

For the first simple model, let's just use the numeric predictors.

one_random_student(quanggang)
# isolate numeric predictors
one_random_student(quanggang)
# Scale appropriately
one_random_student(quanggang)
# instantiate appropriate model and fit to appropriate part of data.
one_random_student(quanggang)
# Create a set of predictions

y_hat_train = None
y_hat_test = None
one_random_student(quanggang)
# Create and analyze appropriate metrics
one_random_student(quanggang)
from sklearn.datasets import load_boston
data = load_boston()
X = pd.DataFrame(data['data'], columns = data['feature_names'])
y = data['target']

Question: What algorithm would be appropriate based on the target?

one_random_student(quanggang)
# split target from predictors
one_random_student(quanggang)

For the first simple model, let's just use the numeric predictors.

# isolate numeric predictors
one_random_student(quanggang)
# Scale appropriately
one_random_student(quanggang)
# instantiate appropriate model and fit to appropriate part of data.
one_random_student(quanggang)
# Create a set of predictions

y_hat_train = None
y_hat_test = None
one_random_student(quanggang)
# Create and analyze appropriate metrics
one_random_student(quanggang)

phase_3_review_quanggang's People

Contributors

davidbelliott1 avatar davidelliottfis avatar j-max avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.