Coder Social home page Coder Social logo

ds-measures-of-dispersion-lab's Introduction

Standard Deviation Lab

Problem Description

In this lab, we'll learn to calculate standard deviation and variance, and gain intuition for what it means and how it can be useful.

Objective

  • Calculate Standard Deviation of a sample or population
  • Calculate the Variance of a sample or population
  • Explore the relationship between Standard Deviation and Variance

Measures of Dispersion

In previous labs, we learned about Measures of Center such as mean and median. These metrics help give us a general understanding of where the values lie in the range of our data. However, they don't tell us the whole picture, and can often be misleading. To truly understand our data, we also need Measures of Dispersion--namely, Standard Deviation and Variance. These measures tell us how tightly or loosely clustered around the center our data is, and generally act as a measure of how "noisy" our dataset is or isn't.

In this lab, we'll manually calculate standard deviation and variance and explore the relationship between them, as well as their relationship with other summary statistics such as the mean.

Calculating Variance

In the cell below, write a function that takes an array of numbers as input and returns the Variance of the sample as output.

Recall that the formula for calculating variance is:

Where:

$\sigma^2 = Variance$

$N = Size\ of\ Sample$

$\bar{x} = Sample\ Mean$

import numpy as np

def variance(sample):
    pass

Calculating Standard Deviation

In the cell below, write a function that takes an array of numbers as input and returns the standard deviation of that sample as output.

Recall that the formula for Standard Deviation is:

Where:

$\sigma = Standard\ Deviation$

$\mu = Sample\ Mean$

$N = Size\ of\ Sample$

Hint: How are the these formulas related? Can knowing one help you calculate the other?

For a refresher on how to calculate the standard deviation, take a look at this tutorial. For the function below, only use numpy to calculate square roots as needed. Avoid using the library's std function to calculate standard deviation at this step--calculate everything as needed using only basic python.

def std_dev(sample):
    pass

Case Study: Life Expectancy

People often use the Mean as a summary statistic to encapsulate all relevant information about a topic. However, the mean is just a statistic--it deserves no special relevance, and can be misleading in many cases. An example where this can be misleading is life expectancy in the past.

Up until the 18th century, the mean life expectancy in most countries was between 30 and 40. However, the number of people that actually died between the ages of 30 and 40 was actually quite low. This average person that survived past childhood could expect to live well into the 50s, 60s, or even 70s. Why, then, is the average life expectancy around 35?

In the cells below, read in the data stored in ages.csv. Calculate the mean and standard deviation. Then, use matplotlib to create a histogram of the data with 8 bins.

When examining the data, consider the following questions:

  1. Why did so few people actually die at the mean life expectancy age? Is the mean life expectancy a good metric or not? Why?
  2. What does a high standard deviation tell us about the mean?

(Author's Note: Although the ranges in this case study are generally true to historical record, the data in ages.csv was made up for this problem.)

import pandas as pd

# read the stored data 'ages.csv'
ages = None

# calculate the mean and the variance and print
mean = None
std = None
print("Mean Life Expectancy: {}".format(mean))
print("Standard Deviation: {}".format(std))
import matplotlib.pyplot as plt
%matplotlib inline
# Plot a histogram of the data in ages.csv with 8 bins.  Bonus points for labeling and styling your graph!

Conclusion

In this lab, we learned:

  • How to calculate the variance of a sample
  • How to calculate the standard deviation of a sample
  • The relationship between standard deviation and variance
  • How we can use measures of dispersion to inform our understanding of measures of center

ds-measures-of-dispersion-lab's People

Contributors

loredirick avatar mike-kane avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.