Section Recap
Introduction
This short lesson summarizes the topics we covered in section 21 and why they'll be important to you as a data scientist.
Objectives
You will be able to:
- Understand and explain what was covered in this section
- Understand and explain why this section will help you become a data scientist
Key Takeaways
Some of the key takeaways from this section include:
- A Sample space is a collection of every single possible outcome in a trial
- Independent events don't affect each other - e.g. consecutive coin tosses
- Dependent events do affect each other - e.g. picking consecutive colored marbles from a bag
- The Product rule is useful when the conditional probability is easy to compute, but the probability of intersections of events are not.
- The chain rule (also called the general product rule) permits the calculation of any member of the joint distribution of a set of random variables using only conditional probabilities.
- Bayes theorem described the probability of an event based on prior knowledge of conditions that might be related to the event
- The law of total probability states that the probability for a sample space is the sum of the probabilities for partitions of that sample space
- Sensitivity is the true positive rate. It is a measure of the proportion of correctly identified positives.
- Specificity is the true negative rate. It measures the proportion of correctly identified negatives.
- A perfect test would be 100 percent sensitive and specific. In reality, tests have a minimum error called the Bayes error rate
- Maximum Likelihood Estimation (MLE) primarily deals with determining the parameters that maximize the probability of the data
- Maximum Likelihood Estimation (MLE) and Maximum A Posteriori (MAP) are both methods for estimating some variable in the space of a probability distributions
- In Bayesian probability theory, if the posterior distributions p(θ | X) are in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function.