dsc-distributions-section-recap-v2-1-nyc01-dtsc-ft-051120's Introduction

Statistical Distributions - Recap

Introduction

This short lesson summarizes the topics we covered in this section and why they'll be important to you as a data scientist.

In this section, we really dug into statistical distributions.

Key takeaways include:

There are two types of distributions - continuous, where (subject to measurement and/or storage precision) there are effectively an infinite number of possible values, and discrete, where there are a distinct, non-infinite number of options. For example, a person's height is continuous - assuming a suitably precise tape measure - whereas the number of bedrooms in a house is discrete
How to describe the distribution of data sets using Probability Mass Functions, Cumulative Distribution Functions, and Probability Density Functions
One type of discrete distribution deals with a series of boolean events or trials - often called Bernoulli Trials
A Normal distribution is the classic "bell curve" with 68% of the probability mass within 1 SD of the mean, 95% within 2 SDs and 99.7% within 3 SDs
Differences between the normal and the standard normal distribution
The uses of $z$-scores and p-values for describing a distribution
How a one sample $z$-test is a very simple form of hypothesis testing.
How skewness and kurtosis can be used to measure how different a given distribution is from a normal distribution

In the Appendix to this Module, you'll have the opportunity to learn about:

the uniform distribution, which represents processes where each outcome is equally likely, like rolling a dice
the Poisson distribution, which can be used to display the likelihood of a given number of successes over a given time period
the exponential distribution, which can be used to describe the probability distribution of the amount of time it may take before a given event occurs

dsc-distributions-section-recap-v2-1-nyc01-dtsc-ft-051120's People