Bias-Variance Tradeoff

Agenda

Revisit the goal of model building, and relate it to expected value, bias and variance
Defining Error: prediction error and irreducible error
Define prediction error as a combination of bias and variance
Explore the bias-variance tradeoff
Code a basic train-test split
Code K-Folds

1. Revisit the goal of model building, and relate it to expected value, bias and variance

https://towardsdatascience.com/cultural-overfitting-and-underfitting-or-why-the-netflix-culture-wont-work-in-your-company-af2a62e41288

What makes a model good?

We don’t ultimately care about how well your model fits your data.
What we really care about is how well your model describes the process that generated your data.
Why? Because the data set you have is but one sample from a universe of possible data sets, and you want a model that would work for any data set from that universe

What is a “Model”?

A “model” is a general specification of relationships among variables.
- E.G. Linear Regression: or $ Price = \beta_1*Time + \beta_0 + \epsilon$
A “trained model” is a particular model with parameters estimated using some training data.

Remember Expected Value? How is it connected to bias and variance?

The expected value of a quantity is the weighted average of that quantity across all possible samples

for a 6 sided die, another way to think about the expected value is the arithmetic mean of the rolls of a very large number of independent samples.

The expected value of a 6-sided die is:

probs = 1/6
rolls = range(1,7)

expected_value = sum([probs * roll for roll in rolls])
expected_value

3.5

Now lets imagine we create a model that always predicts a roll of 3.
- The bias is the difference between the average prediction of our model and the average roll of the die as we roll more and more times.
  - What is the bias of a model that alway predicts 3?
- The variance is the average difference between each individual prediction and the average prediction of our model as we roll more and more times.
  - What is the variance of that model?

2. Defining Error: prediction error and irreducible error

Regression fit statistics are often called “error”

Sum of Squared Errors (SSE) $ {\displaystyle \operatorname {SSE} =\sum {i=1}^{n}(Y{i}-{\hat {Y_{i}}})^{2}.} $
Mean Squared Error (MSE)

$ {\displaystyle \operatorname {MSE} ={\frac {1}{n}}\sum {i=1}^{n}(Y{i}-{\hat {Y_{i}}})^{2}.} $

Root Mean Squared Error (RMSE)
$ {\displaystyle \operatorname {RMSE} =\sqrt{MSE}} $

All are calculated using residuals

Individual Code: Turn Off Screen

Fit a quick and dirty linear regression model
Store predictions in the y_hat variable using predict() from the fit model
handcode SSE
Divide by the length of array to find Mean Squared Error
Make sure MSE equals sklearn's mean_squared_error function

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
np.random.seed(42)
df = pd.read_csv('data/king_county.csv', index_col='id')
df = df.iloc[:,:12]
X = df.drop('price', axis=1)
y = df.price

y_hat = None
sse = None
mse = None
rmse = None

#__SOLUTION__

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
np.random.seed(42)
df = pd.read_csv('data/king_county.csv', index_col='id')
df = df.iloc[:,:12]
X = df.drop('price', axis=1)
y = df.price
lr = LinearRegression()
lr.fit(X,y)

y_hat = lr.predict(X)

sse = sum((y_hat - y)**2)
mse = sse/len(y_hat)
rmse = np.sqrt(mse)
print(mean_squared_error(y, y_hat))
print(mse)
print(rmse)

53170511676.69001
53170511676.68982
230587.3189849993

This error can be broken up into parts:

There will always be some random, irreducible error inherent in the data. Real data always has noise.

The goal of modeling is to reduce the prediction error, which is the difference between our model and the realworld processes from which our data is generated.

3. Define prediction error as a combination of bias and variance

$\Large Total\ Error\ = Prediction\ Error+ Irreducible\ Error$

Our prediction error can be further broken down into error due to bias and error due to variance.

$\Large Total\ Error = Model\ Bias^2 + Model\ Variance + Irreducible\ Error$

Model Bias is the expected prediction error of the expected trained model

In other words, if you were to train multiple models on different samples, what would be the average difference between the prediction and the real value.

Model Variance is the expected variation in predictions, relative to your expected trained model

In other words, what would be the average difference between any one model's prediction and the average of all the predictions .

Thought Experiment

Imagine you've collected 23 different training sets for the same problem.
Now imagine using one algorithm to train 23 models, one for each of your training sets.
Bias vs. variance refers to the accuracy vs. consistency of the models trained by your algorithm.

http://scott.fortmann-roe.com/docs/BiasVariance.html

4. Explore Bias Variance Tradeoff

High bias algorithms tend to be less complex, with simple or rigid underlying structure.

They train models that are consistent, but inaccurate on average.
These include linear or parametric algorithms such as regression and naive Bayes.
For linear, perhaps some assumptions about our feature set could lead to high bias. - We did not include the correct predictors - We did not take interactions into account - In linear, we missed a non-linear relationship (polynomial).

High bias models are underfit

On the other hand, high variance algorithms tend to be more complex, with flexible underlying structure.

They train models that are accurate on average, but inconsistent.
These include non-linear or non-parametric algorithms such as decision trees and nearest neighbors.
For linear, perhaps we included an unreasonably large amount of predictors. - We created new features by squaring and cubing each feature
High variance models are modeling the noise in our data

High variance models are overfit

While we build our models, we have to keep this relationship in mind. If we build complex models, we risk overfitting our models. Their predictions will vary greatly when introduced to new data. If our models are too simple, the predictions as a whole will be inaccurate.

The goal is to build a model with enough complexity to be accurate, but not too much complexity to be erratic.

http://scott.fortmann-roe.com/docs/BiasVariance.html

Let's take a look at our familiar King County housing data.

import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.read_csv('data/king_county.csv', index_col='id')
df = df.iloc[:,:12]
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors	waterfront	view	condition	grade	sqft_above	sqft_basement
id
7129300520	221900.0	3	1.00	1180	5650	1.0	0	0	3	7	1180	0
6414100192	538000.0	3	2.25	2570	7242	2.0	0	0	3	7	2170	400
5631500400	180000.0	2	1.00	770	10000	1.0	0	0	3	6	770	0
2487200875	604000.0	4	3.00	1960	5000	1.0	0	0	5	7	1050	910
1954400510	510000.0	3	2.00	1680	8080	1.0	0	0	3	8	1680	0

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
np.random.seed(42)

# Let's generate random subsets of our data

#Date  is not in the correct format so we are dropping it for now.
sample_point = df.drop('price', axis=1).sample(1)
point_preds = []

r_2 = []
simple_rmse = []

for i in range(100):
    
    df_sample = df.sample(5000, replace=True)
    y = df_sample.price
    X = df_sample.drop('price', axis=1)
    
    lr = LinearRegression()
    lr.fit(X, y)
    
    y_hat = lr.predict(X)
    simple_rmse.append(np.sqrt(mean_squared_error(y, y_hat)))
    r_2.append(lr.score(X,y))
    
    y_hat_point = lr.predict(sample_point)
    
    point_preds.append(y_hat_point)

print(f'simple mean {np.mean(simple_rmse)}')
print(f'simple variance {np.var(point_preds)}')

simple mean 228460.56183597515
simple variance 77297954.16271643

from sklearn.preprocessing import PolynomialFeatures


df = pd.read_csv('data/king_county.csv', index_col='id')

pf = PolynomialFeatures(2)

df_poly = pd.DataFrame(pf.fit_transform(df.drop('price', axis=1)))
df_poly.index = df.index
df_poly['price'] = df['price']

cols = list(df_poly)
# move the column to head of list using index, pop and insert
cols.insert(0, cols.pop(cols.index('price')))

df_poly = df_poly.loc[:,cols]

df_poly.head(10)

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	price	0	1	2	3	4	5	6	7	8	...	68	69	70	71	72	73	74	75	76	77
id
7129300520	221900.0	1.0	3.0	1.00	1180.0	5650.0	1.0	0.0	0.0	3.0	...	9.0	21.0	3540.0	0.0	49.0	8260.0	0.0	1392400.0	0.0	0.0
6414100192	538000.0	1.0	3.0	2.25	2570.0	7242.0	2.0	0.0	0.0	3.0	...	9.0	21.0	6510.0	1200.0	49.0	15190.0	2800.0	4708900.0	868000.0	160000.0
5631500400	180000.0	1.0	2.0	1.00	770.0	10000.0	1.0	0.0	0.0	3.0	...	9.0	18.0	2310.0	0.0	36.0	4620.0	0.0	592900.0	0.0	0.0
2487200875	604000.0	1.0	4.0	3.00	1960.0	5000.0	1.0	0.0	0.0	5.0	...	25.0	35.0	5250.0	4550.0	49.0	7350.0	6370.0	1102500.0	955500.0	828100.0
1954400510	510000.0	1.0	3.0	2.00	1680.0	8080.0	1.0	0.0	0.0	3.0	...	9.0	24.0	5040.0	0.0	64.0	13440.0	0.0	2822400.0	0.0	0.0
7237550310	1225000.0	1.0	4.0	4.50	5420.0	101930.0	1.0	0.0	0.0	3.0	...	9.0	33.0	11670.0	4590.0	121.0	42790.0	16830.0	15132100.0	5951700.0	2340900.0
1321400060	257500.0	1.0	3.0	2.25	1715.0	6819.0	2.0	0.0	0.0	3.0	...	9.0	21.0	5145.0	0.0	49.0	12005.0	0.0	2941225.0	0.0	0.0
2008000270	291850.0	1.0	3.0	1.50	1060.0	9711.0	1.0	0.0	0.0	3.0	...	9.0	21.0	3180.0	0.0	49.0	7420.0	0.0	1123600.0	0.0	0.0
2414600126	229500.0	1.0	3.0	1.00	1780.0	7470.0	1.0	0.0	0.0	3.0	...	9.0	21.0	3150.0	2190.0	49.0	7350.0	5110.0	1102500.0	766500.0	532900.0
3793500160	323000.0	1.0	3.0	2.50	1890.0	6560.0	2.0	0.0	0.0	3.0	...	9.0	21.0	5670.0	0.0	49.0	13230.0	0.0	3572100.0	0.0	0.0

10 rows × 79 columns

np.random.seed(42)

sample_point = df_poly.drop('price', axis=1).sample(1)


r_2 = []
point_preds_comp = []
complex_rmse = []
for i in range(100):
    
    df_sample = df_poly.sample(1000, replace=True)
    y = df_sample.price
    X = df_sample.drop('price', axis=1)
    
    lr = LinearRegression()
    lr.fit(X, y)
    y_hat = lr.predict(X)
    complex_rmse.append(np.sqrt(mean_squared_error(y, y_hat)))
    r_2.append(lr.score(X,y))
    
    y_hat_point = lr.predict(sample_point)
    
    point_preds_comp.append(y_hat_point)

print(f'simpl mean {np.mean(simple_rmse)}')
print(f'compl mean {np.mean(complex_rmse)}')

print(f'simp variance {np.var(point_preds)}')
print(f'comp variance {np.var(point_preds_comp)}')

simpl mean 228460.56183597515
compl mean 193508.74007561
simp variance 77297954.16271643
comp variance 1311682449.9857328

5. Train Test Split

from sklearn.model_selection import train_test_split

It is hard to know if your model is too simple or complex by just using it on training data.

We can hold out part of our training sample, and use it as a test sample and use it to monitor our prediction error.

This allows us to evaluate whether our model has the right balance of bias/variance.

training set —a subset to train a model.
test set—a subset to test the trained model.

import pandas as pd
df = pd.read_csv('data/king_county.csv', index_col='id')

y = df.price
X = df[['bedrooms', 'sqft_living']]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size = .25)

print(X_train.shape)
print(X_test.shape)

print(X_train.shape[0] == y_train.shape[0])
print(X_test.shape[0] == y_test.shape[0])

(16209, 2)
(5404, 2)
True
True

How do we know if our model is overfitting or underfitting?

If our model is not performing well on the training data, we are probably underfitting it.

To know if our model is overfitting the data, we need to test our model on unseen data. We then measure our performance on the unseen data.

If the model performs way worse on the unseen data, it is probably overfitting the data.

Word Play in groups

Fill in the variable to correctly finish the sentences.

b_or_v = 'add a letter'
over_under = 'add a number'

one = "The model has a high R^2 on both the training set, but low on the test " +  b_or_v + " " + over_under
two = "The model has a low RMSE on training and a low RMSE on test" + b_or_v + " " + over_under
three = "The model performs well on data it is fit on and well on data it has not seen" + b_or_v + " " + over_under
seven = "The model has high R^2 on the training set and low R^2 on the test"  + b_or_v + " " + over_under
four = "The model leaves out many of the meaningful predictors, but is consistent across samples" + b_or_v + " " + over_under
five = "The model is highly sensitive to random noise in the training set"  + b_or_v + " " + over_under
six = "The model has a low R^2 on training but high on the test set"  + b_or_v + " " + over_under


a = "The model has low bias and high variance."
b = "The model has high bias and low variance."
c = "The model has both low bias and variance"
d = "The model has high bias and high variance"

over = "In otherwords, it is overfit."
under = "In otherwords, it is underfit."
other = 'That is an abberation'
good = "In otherwords, we have a solid model"

#__SOLUTION__

b_or_v = 'add a letter'
over_under = 'add a number'

a = " The model has low bias and high variance."
b = " The model has high bias and low variance."
c = " The model has both low bias and variance"
d = " The model has high bias and high variance"

over = "In otherwords, it is overfit."
under = "In otherwords, it is underfit."
other = 'That is an abberation'
good = "In otherwords, we have a solid model"


one = "The model has a low RMSE on training and a low RMSE on test." + b + " " + under
two = "The model has a high R^2 on both the training set, but low on the test." +  a + " " + over
three = "The model performs well on data it is fit on and well on data it has not seen." + c + " " + good
four = "The model has a low R^2 on training but high on the test set."  + d + " " + other
seven = "The model has high R^2 on the training set and low R^2 on the test."  + b + " " + under
four = "The model leaves out many of the meaningful predictors, but is consistent across samples." + b + " " + under
six = "The model is highly sensitive to random noise in the training set."  + a + " " + over
print(one)
print(two)
print(three)
print(four)
print(five)
print(six)

The model has a low RMSE on training and a low RMSE on test. The model has high bias and low variance. In otherwords, it is underfit.
The model has a high R^2 on both the training set, but low on the test. The model has low bias and high variance. In otherwords, it is overfit.
The model performs well on data it is fit on and well on data it has not seen. The model has both low bias and variance In otherwords, we have a solid model
The model leaves out many of the meaningful predictors, but is consistent across samples. The model has high bias and low variance. In otherwords, it is underfit.
The model is highly sensitive to random noise in the training setadd a letter add a number
The model is highly sensitive to random noise in the training set. The model has low bias and high variance. In otherwords, it is overfit.

Should you ever fit on your test set?

Never fit on test data. If you are seeing surprisingly good results on your evaluation metrics, it might be a sign that you are accidentally training on the test set.

Let's go back to our KC housing data without the polynomial transformation.

df = pd.read_csv('data/king_county.csv', index_col='id')

#Date  is not in the correct format so we are dropping it for now.
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors	waterfront	view	condition	grade	sqft_above	sqft_basement
id
7129300520	221900.0	3	1.00	1180	5650	1.0	0	0	3	7	1180	0
6414100192	538000.0	3	2.25	2570	7242	2.0	0	0	3	7	2170	400
5631500400	180000.0	2	1.00	770	10000	1.0	0	0	3	6	770	0
2487200875	604000.0	4	3.00	1960	5000	1.0	0	0	5	7	1050	910
1954400510	510000.0	3	2.00	1680	8080	1.0	0	0	3	8	1680	0

Now, we create a train-test split via the sklearn model selection package.

from sklearn.model_selection import train_test_split
np.random.seed(42)

y = df.price
X = df[['bedrooms', 'sqft_living']]

# Here is the convention for a traditional train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=43, test_size=.25)

# Instanstiate your linear regression object
lr = LinearRegression()

# fit the model on the training set
lr.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

# Check the R^2 of the training data
lr.score(X_train, y_train)

0.5132349854445817

lr.coef_

array([-54632.9149931 ,    311.65365556])

A .513 R-squared reflects a model that explains aabout half of the total variance in the data.

Knowledge check

How would you describe the bias of the model based on the above training R^2?

# Your answer here

#__SOLUTION__
"A model with a .513 R^2 has a fairly high bias."

'A model with a .513 R^2 has a relatively high bias.'

Next, we test how well the model performs on the unseen test data. Remember, we do not fit the model again. The model has calculated the optimal parameters learning from the training set.

lr.score(X_test, y_test)

0.48688154021233165

The difference between the train and test scores are low.

What does that indicate about variance?

#__SOLUTION__
'The model has low variance'

'The model has low variance'

Now, let's try the same thing with our complex, polynomial model.

df = pd.read_csv('data/king_county.csv', index_col='id')
df.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	price	bedrooms	bathrooms	sqft_living	sqft_lot	floors	waterfront	view	condition	grade	sqft_above	sqft_basement
id
7129300520	221900.0	3	1.00	1180	5650	1.0	0	0	3	7	1180	0
6414100192	538000.0	3	2.25	2570	7242	2.0	0	0	3	7	2170	400
5631500400	180000.0	2	1.00	770	10000	1.0	0	0	3	6	770	0
2487200875	604000.0	4	3.00	1960	5000	1.0	0	0	5	7	1050	910
1954400510	510000.0	3	2.00	1680	8080	1.0	0	0	3	8	1680	0

poly_2 = PolynomialFeatures(3)

X_poly = pd.DataFrame(
            poly_2.fit_transform(df.drop('price', axis=1))
                      )

y = df.price
X_poly.head()

.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

</style>

	0	1	2	3	4	5	8	9	...	354	355	356	357	358	359	360	361	362	363
0	1.0	3.0	1.00	1180.0	5650.0	1.0	3.0	7.0	...	343.0	57820.0	0.0	9746800.0	0.0	0.0	1.643032e+09	0.000000e+00	0.0	0.0
1	1.0	3.0	2.25	2570.0	7242.0	2.0	3.0	7.0	...	343.0	106330.0	19600.0	32962300.0	6076000.0	1120000.0	1.021831e+10	1.883560e+09	347200000.0	64000000.0
2	1.0	2.0	1.00	770.0	10000.0	1.0	3.0	6.0	...	216.0	27720.0	0.0	3557400.0	0.0	0.0	4.565330e+08	0.000000e+00	0.0	0.0
3	1.0	4.0	3.00	1960.0	5000.0	1.0	5.0	7.0	...	343.0	51450.0	44590.0	7717500.0	6688500.0	5796700.0	1.157625e+09	1.003275e+09	869505000.0	753571000.0
4	1.0	3.0	2.00	1680.0	8080.0	1.0	3.0	8.0	...	512.0	107520.0	0.0	22579200.0	0.0	0.0	4.741632e+09	0.000000e+00	0.0	0.0

5 rows × 364 columns

X_train, X_test, y_train, y_test = train_test_split(X_poly, y, random_state=20, test_size=.25)
lr_poly = LinearRegression()

# Always fit on the training set
lr_poly.fit(X_train, y_train)

lr_poly.score(X_train, y_train)

0.7133044254532317

# That indicates a lower bias

lr_poly.score(X_test, y_test)

0.6119026292718351

# There is a large difference between train and test, showing high variance.

Pair Exercise

Link about data leakage and scalars

The link above explains that if you are going to scale your data, you should only train your scalar on the training data to prevent data leakage.

Perform the same train test split as shown aboe for the simple model, but now scale your data appropriately.

The R2 for both train and test should be the same.

from sklearn.preprocessing import StandardScaler
np.random.seed(42)

y = df.price
X = df[['bedrooms', 'sqft_living']]

# Train test split with random_state=43 and test_size=.25


# Scale appropriately

# fit and score the model

#__SOLUTION__
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=43, test_size=.25)

ss = StandardScaler()

X_train = ss.fit_transform(X_train)
X_test = ss.transform(X_test)

lr.fit(X_train, y_train)
print(lr.score(X_train, y_train))
print(lr.score(X_test, y_test))

0.5132349854445817
0.48688154021233154

Kfolds: Even More Rigorous Validation

For a more rigorous cross-validation, we turn to K-folds

image via sklearn

In this process, we split the dataset into train and test as usual, then we perform a shuffling train test split on the train set.

KFolds holds out one fraction of the dataset, trains on the larger fraction, then calculates a test score on the held out set. It repeats this process until each group has served as the test set.

We tune our parameters on the training set using kfolds, then validate on the test data. This allows us to build our model and check to see if it is overfit without touching the test data set. This protects our model from bias.

Fill in the Blank

mccalister = ['Adam', 'Amanda','Chum', 'Dann', 
 'Jacob', 'Jason', 'Johnhoy', 'Karim', 
'Leana','Luluva', 'Matt', 'Maximilian','Syd' ]

choice = np.random.choice(mccalister)
print(choice)

Maximilian

X = df.drop('price', axis=1)
y = df.price

from sklearn.model_selection import KFold

# Instantiate the KFold object
kf = KFold(n_splits=5)

train_r2 = []
test_r2 = []

# kf.split() splits the data via index
for train_ind, test_ind in kf.split(X,y):
    
    X_train, y_train = fill_in, fill_in
    X_test, y_test = fill_in, fill_in
    
    # fill in fit
    
    
    train_r2.append(lr.score(X_train, y_train))
    test_r2.append(lr.score(X_test, y_test))

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-75-0cd4b40a7ee8> in <module>
     10 for train_ind, test_ind in kf.split(X,y):
     11 
---> 12     X_train, y_train = fill_in, fill_in
     13     X_test, y_test = fill_in, fill_in
     14 


NameError: name 'fill_in' is not defined

#__SOLUTION__

from sklearn.model_selection import KFold

kf = KFold(n_splits=5)

train_r2 = []
test_r2 = []
for train_ind, test_ind in kf.split(X,y):
    
    X_train, y_train = X.iloc[train_ind], y.iloc[train_ind]
    X_test, y_test = X.iloc[test_ind], y.iloc[test_ind]
    
    lr.fit(X_train, y_train)
    train_r2.append(lr.score(X_train, y_train))
    test_r2.append(lr.score(X_test, y_test))

# Mean train r_2
np.mean(train_r2)

0.5068353530791848

# Mean test r_2
np.mean(test_r2)

0.5043547355695521

# Test out our polynomial model
poly_2 = PolynomialFeatures(2)

df_poly = pd.DataFrame(
            poly_2.fit_transform(df.drop('price', axis=1))
                      )

X = df_poly
y = df.price

kf = KFold(n_splits=5)

train_r2 = []
test_r2 = []
for train_ind, test_ind in kf.split(X,y):
    
    X_train, y_train = X.iloc[train_ind], y.iloc[train_ind]
    X_test, y_test = X.iloc[test_ind], y.iloc[test_ind]
    
    lr.fit(X_train, y_train)
    train_r2.append(lr.score(X_train, y_train))
    test_r2.append(lr.score(X_test, y_test))

# Mean train r_2
np.mean(train_r2)

0.6954523534689443

# Mean test r_2
np.mean(test_r2)

0.6606213729380702

Once we have an acceptable model, we train our model on the entire training set, and score on the test to validate.

lecritch / bias_variance-sea-chi-ds Goto Github PK

bias_variance-sea-chi-ds's Introduction

Bias-Variance Tradeoff

Agenda

1. Revisit the goal of model building, and relate it to expected value, bias and variance

What makes a model good?

What is a “Model”?

Remember Expected Value? How is it connected to bias and variance?

The expected value of a 6-sided die is:

2. Defining Error: prediction error and irreducible error

Regression fit statistics are often called “error”

Individual Code: Turn Off Screen

This error can be broken up into parts:

3. Define prediction error as a combination of bias and variance

Thought Experiment

4. Explore Bias Variance Tradeoff

Let's take a look at our familiar King County housing data.

5. Train Test Split

Word Play in groups

Should you ever fit on your test set?

Knowledge check

Now, let's try the same thing with our complex, polynomial model.

Pair Exercise

Link about data leakage and scalars

Kfolds: Even More Rigorous Validation

Fill in the Blank

Recommend Projects

Recommend Topics

Recommend Org