Statistical Distributions - Recap
Introduction
This short lesson summarizes the topics we covered in this section and why they'll be important to you as a data scientist.
Key Takeaways
In this section, we really dug into statistical distributions.
Key takeaways include:
- There are two types of distributions - continuous, where (subject to measurement and/or storage precision) there are effectively an infinite number of possible values, and discrete, where there are a distinct, non-infinite number of options. For example, a person's height is continuous - assuming a suitably precise tape measure - whereas the number of bedrooms in a house is discrete
- How to describe the distribution of data sets using Probability Mass Functions, Cumulative Distribution Functions, and Probability Density Functions
- One type of discrete distribution deals with a series of boolean events or trials - often called Bernoulli Trials
- A Normal distribution is the classic "bell curve" with 68% of the probability mass within 1 SD of the mean, 95% within 2 SDs and 99.7% within 3 SDs
- Differences between the normal and the standard normal distribution
- The uses of
$z$ -scores and p-values for describing a distribution - How a one sample
$z$ -test is a very simple form of hypothesis testing. - How skewness and kurtosis can be used to measure how different a given distribution is from a normal distribution
In the Appendix to this Module, you'll have the opportunity to learn about:
- the uniform distribution, which represents processes where each outcome is equally likely, like rolling a dice
- the Poisson distribution, which can be used to display the likelihood of a given number of successes over a given time period
- the exponential distribution, which can be used to describe the probability distribution of the amount of time it may take before a given event occurs