The Cumulative Distribution Function (CDF) - Lab
Introduction
In the previous lesson we saw how we can use a discrete random variable used for modeling fair die having a uniform probabilities for all possible values. In this lab, we shall try to calculate a cdf for this variable and visualize it for inspection.
Objectives
You will be able to:
- Calculate the cdf for a given discrete random variable
- Visualize the cdf using matplotlib
Calculating CDF in python
Recall the formula for calculate the cumulative probability from previous lesson:
So given a list of all possible values of x, We can easily calculate the cdf for a given possible value (X) by performing following steps:
- Build a function
calculate_cdf(lst,X)
, wherelst
is a list of all possible values in a discrete variable x (6 values for a die roll), and X is the value for which we want to calculate the cumulative distribution function. - Initialize a count variable
- for all values in
lst
, if a value is less than X, add one to count - do nothing otherwise. (this will tell us total number of values less than X) - Calculate the cumulative probability of X dividing the count with total possible values
- Round off by 3 decimals and return the cumulative probability of X.
def calculate_cdf(lst, X):
pass
# test data
test_lst = [1,2,3]
test_X = 2
calculate_cdf(test_lst, test_X)
# 0.667
Let's now use above function to calculate a cdf for each value in a die roll with an intention of plotting it.
Perform following steps in the cell below:
- Create a list
die_lst
with all possible values of a fair die - Initialize an empty list
die_cum
for storing cumulative probabilities for these values. - For each value in the
die_lst
calculate its cumulative probability using the function above and store indie_cum
list.
die_lst = None
die_cum = None
die_cum
# [0.167, 0.333, 0.5, 0.667, 0.833, 1.0]
cdfs are implemented with two sorted lists: xs, which contains the values, and ps, which contains the cumulative probabilities for xs.
Following this, we now have a list of possible values, and a second list containing cumulative probabilities for each value. Let's go ahead and plot these values in matplotlib using the stem plot.
- Use
die_lst
for x-axis anddie_cum
for y-axis
Text(0,0.5,'Cumulative Probabilities')
Level Up (optional)
cdfs (and pmfs) can be calculated using built in numpy and matplotlib methods. So we don't have create custom functions to calculate these. We can draw a histogram styled cdf as shown below using following methods.
You would need to perform these steps
-
Use
np.histogram()
to automatically calculate the histogram with probabilities. Here is numpy histogram documentation to help you dig deeper. -
Use
plt.step()
method withnp.cumsum()
to calculate and plot cumulative probabilities (just like we did above).
Summary
In this lesson we looked at developing a cdf a percentile function of a discrete random variable. We looked at how to calculate and visualize a cdf. This technique can also be applied to continuous random variables which we shall see later in this section.