Well done! In this section, you learned about various resampling techniques and when it would be advantageous to use certain ones.
You will be able to:
- Understand and explain what was covered in this section
- Understand and explain why this section will help you become a data scientist
The key takeaways for this section include:
- The Kolmogorov-Smirnov Test can be used to test the "normality assumption" for a given data set
- More generally, the KS Test is a way to compare the similarity of two different distributions
- A one-sample KS Test (goodness of fit test) calculates the similarity between an observed data set and a theoretical distribution.
- A two-sample KS Test compares the similarity of two separate empirical distributions
- The
make_blobs()
method in sklearn is one way to generate a synthetic data set - The
make_moons()
method allows for the generation of data for binary classification problems - There are a number of other useful methods such as
make_circles()
andmake_regression()
for generating various types of data sets for testing your algorithms - Resampling methods allow for improved precision in estimating sample statistics and validating models by using random subsets
- Common resampling techniques include bootstrapping, jackknifing and permutation tests
- Monte Carlo Simulations are a powerful tool for running large numbers of simulations with various inputs to provide distributions of possible output values