Coder Social home page Coder Social logo

110920-hypothesis-testing's Introduction

Statistical Testing

You are working for a TexMex restaurant that recently introduced queso to its menu.

We have random samples of 1000 "no queso" order check totals and 1000 "queso" order check totals for orders made by different customers.

In the cell below, we load the sample data for you into the arrays no_queso and queso for the "no queso" and "queso" order check totals. Then, we create histograms of the distribution of the check amounts for the "no queso" and "queso" samples.

#run this cell without changes

# import the necessary libraries

#data manip
import numpy as np
import pandas as pd 
from scipy import stats

#viz
import matplotlib.pyplot as plt

#object import / export
import pickle

#jn command to run matplotlib inline
%matplotlib inline
#run this cell without changes

# __SOLUTION__
# import the necessary libraries

#data manip
import numpy as np
import pandas as pd 
from scipy import stats

#viz
import matplotlib.pyplot as plt

#object import / export
import pickle

#jn command to run matplotlib inline
%matplotlib inline
#run this cell without changes

# load the sample data 
no_queso = pickle.load(open("data/no_queso.pkl", "rb"))
queso = pickle.load(open("data/queso.pkl", "rb"))

# plot histograms

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.set_title('Sample of Non-Queso Check Totals')
ax1.set_xlabel('Amount')
ax1.set_ylabel('Frequency')
ax1.hist(no_queso, bins=20)

ax2.set_title('Sample of Queso Check Totals')
ax2.set_xlabel('Amount')
ax2.set_ylabel('Frequency')
ax2.hist(queso, bins=20)
plt.show()
# __SOLUTION__ 
# load the sample data 
no_queso = pickle.load(open("data/no_queso.pkl", "rb"))
queso = pickle.load(open("data/queso.pkl", "rb"))

# plot histograms

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

ax1.set_title('Sample of Non-Queso Check Totals')
ax1.set_xlabel('Amount')
ax1.set_ylabel('Frequency')
ax1.hist(no_queso, bins=20)

ax2.set_title('Sample of Queso Check Totals')
ax2.set_xlabel('Amount')
ax2.set_ylabel('Frequency')
ax2.hist(queso, bins=20)
plt.show()

png

1. Hypotheses and Errors

The restaurant owners want to know if customers who order queso spend significantly more or significantly less than customers who do not order queso.

1a) Describe the null $H_{0}$ and alternative hypotheses $H_{A}$ for this test.

# your written answer here
# __SOLUTION__

"""
Null hypothesis: Customers who order queso spend the same as those who do not order queso. 

Alternative hypothesis: Customers who order queso do not spend the same as those who do not order queso. 
"""
Null hypothesis: Customers who order queso spend the same as those who do not order queso. 

Alternative hypothesis: Customers who order queso do not spend the same as those who do not order queso. 

1b) What does it mean to make Type I and Type II errors in this specific context?

# your written answer here
# __SOLUTION__
"""
Type I: (Rejecting the null hypothesis given it's true): Saying queso customers' total check amounts are different 
than non-queso customers' total check amounts when they are the same.

Type II: (Failing to reject the null hypothesis given it's false): Saying queso customers' total check amounts are 
the same as non-queso customers' total check amounts when they are different.
"""

# Give partial credit to students who describe what type I and type II errors are. 
Type I: (Rejecting the null hypothesis given it's true): Saying queso customers' total check amounts are different 
than non-queso customers' total check amounts when they are the same.

Type II: (Failing to reject the null hypothesis given it's false): Saying queso customers' total check amounts are 
the same as non-queso customers' total check amounts when they are different.

2. Sample Testing

2a) Run a statistical test on the two samples. Use a significance level of $\alpha = 0.05$. You can assume the two samples have equal variance.

Hint: Use scipy.stats (imported as stats above).

# your code here 
# __SOLUTION__ 

# Run a two-tailed t-test
print(stats.ttest_ind(no_queso, queso))
print()
# Students may compute the critical t-statistics for the rejection region
critical_t = (stats.t.ppf(0.025, df=999), stats.t.ppf(0.975, df=999))
print(critical_t)
Ttest_indResult(statistic=-45.16857748646329, pvalue=1.29670967092511e-307)

(-1.962341461133449, 1.9623414611334487)

2b) Can you reject the null hypothesis at a significance level of $\alpha = 0.05$?

Why or why not?

# your written answer here
# __SOLUTION__
'''
We have enough evidence to reject the null hypothesis 
at a significance level of alpha = 0.05. We obtain a p-value
much smaller than 0.025 (two-tailed test). 

Alternatively: 
our t-statistic is smaller than the critical t-statistic.

Both answers (p-value or critical t-statistic) are valid. 
'''

110920-hypothesis-testing's People

Contributors

alexgriff avatar hoffm386 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.