Standard Deviation Lab

Problem Description

In this lab, we'll learn to calculate standard deviation and variance, and gain intuition for what it means and how it can be useful.

Objective

Calculate Standard Deviation of a sample or population
Calculate the Variance of a sample or population
Explore the relationship between Standard Deviation and Variance

Measures of Dispersion

In previous labs, we learned about Measures of Center such as mean and median. These metrics help give us a general understanding of where the values lie in the range of our data. However, they don't tell us the whole picture, and can often be misleading. To truly understand our data, we also need Measures of Dispersion--namely, Standard Deviation and Variance. These measures tell us how tightly or loosely clustered around the center our data is, and generally act as a measure of how "noisy" our dataset is or isn't.

In this lab, we'll manually calculate standard deviation and variance and explore the relationship between them, as well as their relationship with other summary statistics such as the mean.

Calculating Variance

In the cell below, write a function that takes an array of numbers as input and returns the Variance of the sample as output.

Recall that the formula for calculating variance is:

Where:

$\sigma^2 = Variance$

$N = Size\ of\ Sample$

$\bar{x} = Sample\ Mean$

import numpy as np

def variance(sample):
    pass

Calculating Standard Deviation

In the cell below, write a function that takes an array of numbers as input and returns the standard deviation of that sample as output.

Recall that the formula for Standard Deviation is:

Where:

$\sigma = Standard\ Deviation$

$\mu = Sample\ Mean$

$N = Size\ of\ Sample$

Hint: How are the these formulas related? Can knowing one help you calculate the other?

For a refresher on how to calculate the standard deviation, take a look at this tutorial. For the function below, only use numpy to calculate square roots as needed. Avoid using the library's std function to calculate standard deviation at this step--calculate everything as needed using only basic python.

def std_dev(sample):
    pass

Case Study: Life Expectancy

People often use the Mean as a summary statistic to encapsulate all relevant information about a topic. However, the mean is just a statistic--it deserves no special relevance, and can be misleading in many cases. An example where this can be misleading is life expectancy in the past.

Up until the 18th century, the mean life expectancy in most countries was between 30 and 40. However, the number of people that actually died between the ages of 30 and 40 was actually quite low. This average person that survived past childhood could expect to live well into the 50s, 60s, or even 70s. Why, then, is the average life expectancy around 35?

In the cells below, read in the data stored in ages.csv. Calculate the mean and standard deviation. Then, use matplotlib to create a histogram of the data with 8 bins.

When examining the data, consider the following questions:

Why did so few people actually die at the mean life expectancy age? Is the mean life expectancy a good metric or not? Why?
What does a high standard deviation tell us about the mean?

(Author's Note: Although the ranges in this case study are generally true to historical record, the data in ages.csv was made up for this problem.)

import pandas as pd

# read the stored data 'ages.csv'
ages = None

# calculate the mean and the variance and print
mean = None
std = None
print("Mean Life Expectancy: {}".format(mean))
print("Standard Deviation: {}".format(std))

import matplotlib.pyplot as plt
%matplotlib inline
# Plot a histogram of the data in ages.csv with 8 bins.  Bonus points for labeling and styling your graph!

Conclusion

In this lab, we learned:

How to calculate the variance of a sample
How to calculate the standard deviation of a sample
The relationship between standard deviation and variance
How we can use measures of dispersion to inform our understanding of measures of center

learn-co-curriculum / ds-measures-of-dispersion-lab Goto Github PK

ds-measures-of-dispersion-lab's Introduction

Standard Deviation Lab

Problem Description

Objective

Measures of Dispersion

Calculating Variance

Calculating Standard Deviation

Case Study: Life Expectancy

Conclusion

ds-measures-of-dispersion-lab's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent