Coder Social home page Coder Social logo

cv2-mod3-sec19-clt-lesson's Introduction

Questions

Objectives

YWBAT

  • apply the clt to find the mean and std of a population
  • define the clt
  • use sampling to find sampling statistics

Outline

  • Watch Mr. Nystrom
  • Make some data
  • apply clt to find stats on our data
import pandas as pd
import numpy as np

import scipy.stats as scs

import matplotlib.pyplot as plt

np.random.seed(42)
def make_hist(arr):
    plt.figure(figsize=(5, 5))
    plt.grid(linestyle='dashed')
    plt.hist(arr, color='r', alpha=0.5)
    plt.show()

Create some data

# height of a population
# we don't know what our data is
loc = np.random.randint(48, 72)
scale = np.random.randint(5, 9)
popsize = np.random.randint(1000, 1500)
pop = np.random.normal(loc=loc, scale=scale, size=popsize)

Calculate the population mean (with some certainty) using sampling distributions

sample_means = []
for i in range(30):
    samp = np.random.choice(pop, size=100, replace=False)
    sample_means.append(samp.mean())
plt.figure(figsize=(5, 5))
plt.grid(linestyle='dashed')
plt.hist(sample_means, color='r', alpha=0.5)
plt.show()

png

# how do we test for normality?
scs.skew(sample_means), scs.kurtosis(sample_means, fisher=False)
(-0.7136669566700377, 3.8101325840693745)
print(np.mean(sample_means) - 3*np.std(sample_means), np.mean(sample_means) + 3*np.std(sample_means))
51.79585182468757 56.739289606174104
np.mean(sample_means)
54.267570715430836

How do we calculate the population standard deviation?

sample_stds = []
for i in range(30):
    samp = np.random.choice(pop, size=100, replace=False)
    sample_stds.append(samp.std())
plt.figure(figsize=(5, 5))
plt.grid(linestyle='dashed')
plt.hist(sample_stds, color='r', alpha=0.5)
plt.show()

png

scs.skew(sample_stds), scs.kurtosis(sample_stds, fisher=False)
(0.07491090271341218, 2.2486691017922027)
print(np.mean(sample_stds) - 3*np.std(sample_stds), np.mean(sample_stds) + 3*np.std(sample_stds))
6.627438005035517 10.250677641064202
np.mean(sample_stds)
8.43905782304986
# spread of our sampling means? 
np.mean(sample_stds) / 10
0.8439057823049859

Verifying if our sampling estimated our mean/std correctly...

It did

np.mean(sample_means), pop.mean()
(54.267570715430836, 54.254315264939486)
np.mean(sample_stds), pop.std()
(8.43905782304986, 8.439896288978073)

Does our population need to be normal?

# hours of sleep students get at flatiron
students = np.random.normal(5, 0.2, size=1000)

# hours of sleep employees get at flatiron
employees = np.random.normal(8, 1.0, size=1000)
all_flatiron = np.concatenate([students, employees])
all_flatiron.shape
(2000,)
make_hist(all_flatiron)

png

sample_means = []
sample_stds = []

for i in range(30):
    samp = np.random.choice(all_flatiron, size=100)
    sample_means.append(np.mean(samp))
    sample_stds.append(np.std(samp))
make_hist(sample_means)
make_hist(sample_stds)

png

png

np.mean(all_flatiron), np.mean(sample_means)
(6.506361834201236, 6.494152262884501)
np.std(all_flatiron), np.mean(sample_stds)
(1.6728808853851138, 1.6536313679786818)
# What if we took a sample of people and got a mean of 7.0
# What is the probability that the sample came from flatiron?
mu = 7.0
scs.ttest_1samp(sample_means, 7.0)
Ttest_1sampResult(statistic=-16.993119437706515, pvalue=1.2900103036426018e-16)

Assessment

cv2-mod3-sec19-clt-lesson's People

Contributors

erdos2n avatar

Watchers

James Cloos avatar

Forkers

magnawhale

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.