The ds-stats-errors-data-science from learn-co-students

Understanding Statistical Errors (TYPE I and II)

SWBATS

Describe and differentiate between TYPE I and TYPE II errors.
Understand alpha and beta for representing false positive and false negative values.
Simulate TYPE I and TYPE II errors with alpha control to observe the output of an experiment.
Explain why alpha = 0.5 is chosen as a cut off point for rejecting NULL hypothesis in most scientific experiments.

The result of a statistical hypothesis test and the corresponding decision of whether to reject or accept the null hypothesis is not infallible. A test provides evidence for or against the null hypothesis and then you decide whether to accept or reject it based on that evidence, but the evidence may lack the strength to arrive at the correct conclusion. Incorrect conclusions made from hypothesis tests fall in one of two categories, i.e. Type 1 and Type 2 erros

Alpha and Beta

Alpha (α): is the probability of a type I error i.e. finding a difference when a difference does not exist.

Most medical literature uses an alpha cut-off of 5% (0.05), indicating a 5% chance that a significant difference is actually due to chance and is not a true difference.

Beta (β): is the probability of a type II error i.e. not detecting a difference when one actually exists.

Beta is directly related to study power (Power = 1 – β) which we shall see in the next lesson. Most medical literature uses a beta cut-off of 20% (0.2), indicating a 20% chance that a significant difference is missed.

Let's try to simulate and visualize this phenomenon using some Python code.

import numpy as np
import pandas as pd
import scipy.stats as stats
import matplotlib.pyplot as plt
import math
import random 

import seaborn as sns
sns.set(color_codes=True)

First, we create a population of 1000 elements with a mean of 100 and a standard deviation of 20.

# Create a population with mean=100 and sd=20 and size = 1000
pop = None

# Plot the visualisation using sns.distplot()
#sns.distplot(pop)

<matplotlib.axes._subplots.AxesSubplot at 0x1a1337fd30>

Lets take two sample from this population and comment of the difference between their and means and standard deviations. How would you ensure the independance between elements of these samples?

Lets take two samples from above population to compare them.

# Take a sample of size 100 with replacement from above population
sample1 = None

# Use stats.describe to describe the sample statistics

# Sample 1 Summary
# DescribeResult(nobs=100, minmax=(42.30492781555637, 153.13001398377526), mean=98.78710443895723, 
# variance=471.2198655338474, skewness=-0.23139125907111105, kurtosis=-0.015689230868860538)

sample2 = None

# Sample 1 Summary

# DescribeResult(nobs=100, minmax=(42.30492781555637, 148.41214418333334), mean=95.70404077387242, 
# variance=473.8268064217018, skewness=-0.12469782771105901, kurtosis=0.0399787353710388)

We can see can see that if we take two samples from this population, the difference between the mean of samples 1 and 2 is very small small (this can be tried repeatedly). We must sample with replacement in order to ensure the independance assumption between elements of the sample.

There is, however, still a probability of seeing very large difference between values, even though they’re estimates of the same population parameters. In a statistical setting we’d interpret these unusually large differences as evidence that the two samples are statistically different. It depends on how you define statistical significance. In statistical tests this is done by setting a significance threshold α (alpha). Alpha controls how often we’ll get a type 1 error. A type 1 error occurs when our statistical test erroneously indicates a significant result.

We can run two sample t-test with independance assumption on these sample and as expected, the null hypothesis will be proven true due to similarities between distributions. We can also visualize the distribution to confirm the similarity between means and SDs.

# test the sample means by running a t-test 

# Ttest_indResult(statistic=1.0028959198575151, pvalue=0.3171351379653653)

Ttest_indResult(statistic=1.0028959198575151, pvalue=0.3171351379653653)

# Visualise both samples as PDfs to view similarity/differences

Simulating Type I and II errors

Type I error

TYPE I error describes a situation where you reject the null hypothesis when it is actually true. This type of error is also known as a "false positive" or "false hit". The type 1 error rate is equal to the significance level α, so setting a higher confidence level (and therefore lower alpha) reduces the chances of getting a false positive.

How alpha affects the prevalence of TYPE I errors.

Next, we shall see how alpha affects the rate of type 1 errors.

Exercise: Write a routine in Python to encapsulate the code shown above in order to repeat hypothesis tests on two randomly drawn distribution. The t-test will mostly fail to reject the null hypothesis, except, when by random chance you get a set of extremely different samples thus reject the null hypothesis (TYPE I ERROR). The frequency of such bad results depends upon the value of alpha.

Step 1: Create a population distribution (as shown above)
Step 2: Specify a number of hypothesis tests in numTests = 1000
Step 3: Create a list of alpha values to explore (alpha_set) = [0.001, 0.01, 0.05, 0.1, 0.2, 0.5]
Step 4: Create a pandas dataframe (sig_tests) to store 1000x5 = 5000 test results.
Step 5: Repeatedly take two random samples from population and run independant t-tests.
Step 6: Store P_value, alpha and a boolean variable to show whether null hypothesis was rejected or not (i.e. if p-value is less than alpha), for each of 5000 tests.
Step 7: Summarize/aggregate the results for presentation in a meaningful manner.

# Solution 

import pandas as pd

numTests = None
alphaSet = None
columns = ['err', 'p_val', 'alpha']
sigTests = pd.DataFrame(columns=columns)

# Create a population with mean=100 and sd=20 and size = 1000
pop = None

# Run the t-test on samples from distribution numTests x slphaSet times

# Create a counter for dataframe index values
counter = 1

# run a for loop for range of tests in numTests

    # another for loop for all values of alpha in alphaSet
    

        # take two samples from the same population with replacement
            samp1 = None
            samp2 = None

            # test sample means through independant t test
            result = None

            # Evaluate whether Null hypothesis for TYPE I error and store success/failure towards rejecting null hypothesis, 
            # the p value and alpha in the dataframe defined above
            
  
            # Increment the counter

Now we have to summarize the results, this is done using pandas groupby() method which sums the “err” column for each level of alpha. The groupby method iterates over each value of alpha, selecting the type 1 error column for all rows with a specific level of alpha and then applies the sum function to the selection.

# group type 1 error by values of alpha
group_error = None

# plot Type 1 error with respect to value of alpha

<matplotlib.axes._subplots.AxesSubplot at 0x1a1c309a90>

Grouped data clearly shows that as value of alpha is increases from .001 to 0.5, the probability of TYPE I errors also increase.

### Type II error

This error describes a situation where you fail to reject the null hypothesis when it is actually false. Type II error is also known as a "false negative" or "miss". The higher your confidence level, the more likely you are to make a type II error.

How alpha affects the prevalence of TYPE II errors.

Exercise Write a code similar to above except samples should be taken from two different populations. introduce a new variable to represent the difference between two poulations. The hypothesis test should, in most cases, reject the Null hypothesis as samples belong to different populations, except, in extreme cases where there is no significant difference between samples i.e. a TYPE II error (False Negatives). Code should reflect how rate of false negatives is affected by alpha.

# Solution

numTests = None
diff = None
ahpha_set =  None
columns = ['err', 'p_val', 'alpha']
sigTests2 = pd.DataFrame(columns=columns)

counter = 1

# Change the logic defined above to calculate type two error

Count of number of TYPE II errors according to alpha

group_error2 = None

# Plot type 2 errors with respect to alpha

<matplotlib.axes._subplots.AxesSubplot at 0x1a1c36c518>

Grouped data clearly shows that as value of alpha is increases from .001 to 0.5, the probability of TYPE II errors decreases.

Why is an α level of 0.05 chosen as a cut-off for statistical significance?

The α level of 0.05 is considered the best balance to avoid excessive type I or type II errors.

If we decide to use a large value for alpha :

Increases the chance of rejecting the null hypothesis
The risk of a Type II error (false negative) is REDUCED
Risk of a Type I error (false positive) is INCREASED

similarly, if we decide to use a very small value of alpha, it'll change the outcome as:

Increases the chance of accepting the null hypothesis
The risk of a Type I error (false positive) is REDUCED
Risk of a Type II error (false negative) is INCREASED

From above, we can see that in statistical hypothesis testing, the more we try and avoid a Type I error (false positive), the more likely a Type II error (false negative) will occur.

Conclusion

The statstical key point here is that there is always a trade off between false positives and false negatives. By increasing alpha the number of false positives increases but the number of false negatives decreases as shown in bar graphs. The value of alpha=0.05 is considered a reasonable compromise between these two types of errors. Within the concept of “signifigance” there is embedded a trade-off between these two types of errors.

Think of “signifigance” as a compromise, between false positives and negatives, not as absolute determination.

learn-co-students / ds-stats-errors-data-science Goto Github PK

ds-stats-errors-data-science's Introduction

Understanding Statistical Errors (TYPE I and II)

SWBATS

Alpha and Beta

Simulating Type I and II errors

Type I error

How alpha affects the prevalence of TYPE I errors.

How alpha affects the prevalence of TYPE II errors.

Why is an α level of 0.05 chosen as a cut-off for statistical significance?

Conclusion

ds-stats-errors-data-science's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent