Project for 'DS-GA 1001 Introduction to Data Science'
The opioid crisis is an unprecedented epidemic. National surveys have been conducted to better understand the drug abuse phenomenon. Critical insights can be uncovered by performing feature selection related to drug abuse behavior, so as to obtain a good understanding of a user‘s behavioral patterns, and build a model to predict the likelihood of opioid abuse such that healthcare providers, and families of at-risk users can take preventive measures.
2015 National Survey on Drug Use and Health (NSDUH-2015). Download the data from here.
we propose a novel iterative conditional mutual information feature selection algorithm (CMI-FS). The algorithm iteratively selects features that maximize their mutual information with the target variable conditionally to any other feature already selected.
A random forest of max_depth=10 and n_estimators=20 is selected as the best model from hyper-parameter tuning. We also utilize LIME to interpret the model on the user instance basis.