For this lab, you will be using the .CSV files provided in the files_for_lab
folder. These are cleaned versions of the learningSet data from the Case Study 'Healthcare for All'.
You may continue in the Jupyter Notebook you created yesterday. There is no need to fork and clone this Repo.
- Apply the Random Forests algorithm AFTER upscaling the data to deal with the imbalance.
- Use Feature Selections that you have learned in class to decide if you want to use all of the features (Variance Threshold, RFE, PCA, etc.)
- Re-run the Random Forest algorithm to determine if the Feature Selection has improved the results.
- Discuss the output and its impact in the business scenario. Is the cost of a false positive equals to the cost of the false negative? How would you change your algorithm or data in order to maximize the return of the business?