Coder Social home page Coder Social logo

cv2-mod4-sec28-extensions-to-linear-models-lesson's Introduction

Questions

Objectives

YWBAT

  • explain bias/variance tradeoff
  • explain ridge regression
  • explain lasso regression
  • explain AIC and BIC

What are the assumptions of linear regression?

Features and Target

  • Linear Relationship between the features and the target
  • Multicollinearity - features cannot have multicollinearity

Assumptions on your Residuals

  • Normality Assumption
  • Homoskedacicity - want this to be true of the residuals
  • Autocorrelation - no correlation between features and residuals

Outline

import pandas as pd
import numpy as np

import statsmodels.api as sm

from sklearn.linear_model import Lasso, Ridge, LinearRegression
from sklearn.datasets import california_housing
from sklearn.model_selection import train_test_split


import matplotlib.pyplot as plt
cal_housing = california_housing.fetch_california_housing()
Downloading Cal. housing from https://ndownloader.figshare.com/files/5976036 to /Users/rcarrasco/scikit_learn_data
y = cal_housing.target
X = cal_housing.data
features = cal_housing.feature_names
df = pd.DataFrame(X, columns=features)
df['target'] = y
df.head()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude target
0 8.3252 41.0 6.984127 1.023810 322.0 2.555556 37.88 -122.23 4.526
1 8.3014 21.0 6.238137 0.971880 2401.0 2.109842 37.86 -122.22 3.585
2 7.2574 52.0 8.288136 1.073446 496.0 2.802260 37.85 -122.24 3.521
3 5.6431 52.0 5.817352 1.073059 558.0 2.547945 37.85 -122.25 3.413
4 3.8462 52.0 6.281853 1.081081 565.0 2.181467 37.85 -122.25 3.422

Why don't we want multicollinearity? What does this cause?

Takes away from linear equation

if 2 features f1 and f2 are correlated

yhat = b0 + b1f1 + b2f2

giving these some numbers

gallons_per_mile = 2.5 x car_weight + 3.8 x engine_size

increase car_weight by 1 -> gallons_per_mile increasing by 2.5

because these are multicollinear the 2.5 and 3.8 don't mean anything.

# let's build an OLS model using statsmodels (baseline)
ols = sm.OLS(y, df.drop("target", axis=1))
results = ols.fit()
results.summary()
OLS Regression Results
Dep. Variable: y R-squared (uncentered): 0.892
Model: OLS Adj. R-squared (uncentered): 0.892
Method: Least Squares F-statistic: 2.137e+04
Date: Thu, 12 Sep 2019 Prob (F-statistic): 0.00
Time: 17:33:56 Log-Likelihood: -24087.
No. Observations: 20640 AIC: 4.819e+04
Df Residuals: 20632 BIC: 4.825e+04
Df Model: 8
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
MedInc 0.5135 0.004 120.594 0.000 0.505 0.522
HouseAge 0.0157 0.000 33.727 0.000 0.015 0.017
AveRooms -0.1825 0.006 -29.673 0.000 -0.195 -0.170
AveBedrms 0.8651 0.030 28.927 0.000 0.806 0.924
Population 7.792e-06 5.09e-06 1.530 0.126 -2.19e-06 1.78e-05
AveOccup -0.0047 0.001 -8.987 0.000 -0.006 -0.004
Latitude -0.0639 0.004 -17.826 0.000 -0.071 -0.057
Longitude -0.0164 0.001 -14.381 0.000 -0.019 -0.014
Omnibus: 4353.392 Durbin-Watson: 0.909
Prob(Omnibus): 0.000 Jarque-Bera (JB): 14087.489
Skew: 1.069 Prob(JB): 0.00
Kurtosis: 6.436 Cond. No. 1.03e+04


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 1.03e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

Skewness of 1.069 is a bit positively skewed. But pretty close to 0.

Kurtosis of 6.436 means that we have a lot of outliers.

df.corr()
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude target
MedInc 1.000000 -0.119034 0.326895 -0.062040 0.004834 0.018766 -0.079809 -0.015176 0.688075
HouseAge -0.119034 1.000000 -0.153277 -0.077747 -0.296244 0.013191 0.011173 -0.108197 0.105623
AveRooms 0.326895 -0.153277 1.000000 0.847621 -0.072213 -0.004852 0.106389 -0.027540 0.151948
AveBedrms -0.062040 -0.077747 0.847621 1.000000 -0.066197 -0.006181 0.069721 0.013344 -0.046701
Population 0.004834 -0.296244 -0.072213 -0.066197 1.000000 0.069863 -0.108785 0.099773 -0.024650
AveOccup 0.018766 0.013191 -0.004852 -0.006181 0.069863 1.000000 0.002366 0.002476 -0.023737
Latitude -0.079809 0.011173 0.106389 0.069721 -0.108785 0.002366 1.000000 -0.924664 -0.144160
Longitude -0.015176 -0.108197 -0.027540 0.013344 0.099773 0.002476 -0.924664 1.000000 -0.045967
target 0.688075 0.105623 0.151948 -0.046701 -0.024650 -0.023737 -0.144160 -0.045967 1.000000
X = df.drop(["target", "AveRooms", "Latitude", "Longitude"], axis=1)
y = df.target

ols = sm.OLS(y, X)
results = ols.fit()

results.summary()
OLS Regression Results
Dep. Variable: target R-squared (uncentered): 0.884
Model: OLS Adj. R-squared (uncentered): 0.884
Method: Least Squares F-statistic: 3.140e+04
Date: Thu, 12 Sep 2019 Prob (F-statistic): 0.00
Time: 17:38:19 Log-Likelihood: -24870.
No. Observations: 20640 AIC: 4.975e+04
Df Residuals: 20635 BIC: 4.979e+04
Df Model: 5
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
MedInc 0.4210 0.003 165.642 0.000 0.416 0.426
HouseAge 0.0160 0.000 45.980 0.000 0.015 0.017
AveBedrms -0.0185 0.010 -1.902 0.057 -0.038 0.001
Population 1.665e-05 4.6e-06 3.618 0.000 7.63e-06 2.57e-05
AveOccup -0.0047 0.001 -8.713 0.000 -0.006 -0.004
Omnibus: 4262.669 Durbin-Watson: 0.758
Prob(Omnibus): 0.000 Jarque-Bera (JB): 9935.375
Skew: 1.167 Prob(JB): 0.00
Kurtosis: 5.471 Cond. No. 3.16e+03


Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.16e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

What is the goal of linear regression?

Predict a target based on features

What are we using to make these predictions?

  • Parameters, also known as, coefficients, also known as, weights

How do we find the best parameters?

  • Something to do with smallest error...yes that is true
  • Least Mean Squared Error...
  • The best way to find Parameters is using Gradient Descent

What is Gradient Descent?

  • is a Process
  • What are the ingredients for Gradient Descent?
    • initial guess of our Parameters
    • Loss Function -> Way of Calculating Error
    • You update weights based on gradient of Error w/ respect to Parameters
    • Then weights with the lowest error are chosen

What happens in linear regression is if I add hundreds of features...

  • You overfit, but your r2 goes up and error goes down
    • and gradient descent is trying to minimize error
  • this is where ridge and lasso come in
    • these punish using a lot of parameters
    • what else do these do?
      • ensures optimal parameters
      • prevents us from overfitting
xtrain, xtest, ytrain, ytest = train_test_split(df.drop('target', axis=1), df.target, test_size=0.20)

Out of the box linear regerssion

linreg = LinearRegression()
linreg.fit(xtrain, ytrain)
linreg.score(xtest, ytest)
0.6200923803673022
plt.bar(features, linreg.coef_)
plt.xticks(range(len(linreg.coef_)), features, rotation=70)
plt.show()

png

Ridge regression

ridge = Ridge(alpha=10.0)
ridge.fit(xtrain, ytrain)

ridge.score(xtest, ytest)
0.620053583047234
ridge.coef_.sum()
0.08746926231461982
plt.bar(features, ridge.coef_)
plt.xticks(range(len(ridge.coef_)), features, rotation=70)
plt.show()

png

Lasso

lasso = Lasso(alpha=0.5)
lasso.fit(xtrain, ytrain)

lasso.score(xtest, ytest)
0.4601413921538754
plt.bar(features, lasso.coef_)
plt.xticks(range(len(lasso.coef_)), features, rotation=70)
plt.show()

png

Some Formulas

AIC/ BIC

Lasso Regression

Ridge Regression

Still need to plan

Some deep thinking

Why would one want to use ridge over lasso over no penalty regression? What do these affect? Why are these important?

Assessment

What did we learn

  • The issues with linear regression
    • residuals need to be normal
    • We need multicollinearity
  • Learned about Gradient Descent
    • That it's a process of finding the least amount of error by finding the best parameters
  • Lasso and Ridge Regression
    • Help us find best parameters by penalizing the number of parameters we use
    • Prevent overfitting

cv2-mod4-sec28-extensions-to-linear-models-lesson's People

Contributors

erdos2n avatar erdosn avatar

Watchers

 avatar  avatar

Forkers

jirvingphd

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.