In this lab, we'll explore Maximum Likelihood Estimation and strategies for implementing it in python while making use of industry-standard tools such as the scipy
library!
In this lab, we will:
- Demonstrate a conceptual understanding of Maximum Likelihood Estimation, and what it is used for
- Demonstrate understanding as to why we use Negative Log Likelihood instead of Likelihood for MLE in python
- Write a general-purpose function for Maximum Likelihood Estimation by using industry-standard packages such as
scipy
Run the cell below to import everything we'll need for this lab.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
from scipy.optimize import minimize
Explain the difference between Probability and Likelihood below the line. Use the two graphs below as aids for your explanation..
We're going to generate two different datasets to test our MLE function. In the cell below:
- Create a sample Gaussian Distribution using numpy with 10,000 values in it.
- Use a distplot from seaborn to visualize the distribution of each.
We'll start by setting some true values, and then using these to generate a distribution of samples. The goal of this lab will be to see if we can use MLE to successfully estimate these (hidden) true values by using MLE and looking at the data.
In the cell below:
- Set
true_sigma
3 - Set
intercept
to 5 - Set
slope
to 8 - Generate an array of 50 evenly spaced x values between 0 and 50 using
np.linspace()
- Compute an array of y values using the values contained in x, along with
slope
,intercept
andtrue_sigma
. - Plot the newly generated data with a scatterplot
HINT: Remember the formula y = mx + b
. Also remember that the standard deviation accounts for random noise found in the dataset--if you don't add random noise, each y-value will line up perfectly with the equation of the line, making it too easy to discover the parameters for slope and intercept.
true_sigma = None
intercept = None
slope = None
x = None
y = None
plt.scatter(x, y)
plt.show()
In your own words, answer the following questions:
Why do we use the log of likelihood rather than just likelihood? In terms of optimization operations, what is the relationship between log likelihood and negative log likelihood?
Bonus question: Why do we typically use negative log likelihood in python instead of likelihood or log likelihood? (This question may take a little research)
Write your answer to these questions below this line:
In the cell below, complete the following negative log likelihood function. This function should take in an array of theta parameters and return the negative log likelihood for those parameters. This can be a bit tricky: follow the steps in the pseudocode below to do this successfully:
- Generate sample a y value called
mu
using our data (x
), the intercept (first element intheta
), and the slope (2nd element intheta
) - Get the
norm
ofmu
and the final element intheta
(use thenorm
function we imported fromscipy.stats
above) - For that norm, get the
sum
of thelogpdf
ofy
. This is the log likelihood. - Multiply the the log likelihood by negative 1 and return our
negative log likelihood
def neg_log_likelihood(theta):
pass
We're almost done. Now that we have a function that gets us the negative log likelihood, we can use an optimizer from scipy.optimize
to try different values until we find optimal ones to minimize
the output of our neg_log_likelihood
function.
In the cell below:
- Create an array called
starting_guesses
, and set it equal to[1, 1, 1]
. These are placeholder values that we will start with for ourtheta
array. - Set the
results
variable equal to a function call ofminimize()
call. The minimize function should take in theneg_log_likelihood
function we created above, our array ofstarting guesses
, and should also set themethod
parameter equal toNelder-Mead
(this specifies a type of optimization that is more likely to converge than the default, for our purposes in this lab.) - Inspect and interpret the
results
element.
starting_guesses = None
results = None
results
Examine and interpret the values in results.x
. What parameter does each value correspond to? How well did our MLE algorithm perform?
Write your answers below this line:
In this lab, we:
- Demonstrated understanding of general purpose behind Maximum Likelihood Estimation
- Calculated Negative Log Likelihood, and explored why MLE generally makes use of Negative Log Likelihood instead of Likelihood or Log Likelihood
- Used an optimizer from
scipy
to compute our MLE, and interpreted the results.