Data analysis conducted for the CANDEV Data Challenge 2022 using the data they provided for the 2020 Public Service Employee Survey.
We identified reasons behind the mobility of minority groups within the Canadian government through Machine Learning and Statistics. We used Python with a static website to generate and showcase our results. You can have access to our website through this link here.
We approached the data from multiple perspectives. First, we plotted the answers to the survey by demographic categories. This helped us identify which minority wanted to leave the most and why. Then, we ran many experiments on the data with different methods, such as k-means
, PCA
, Logistic Regression
and Gradient Boosting
. Only PCA
and Logistic Regression
gave us meaningful results. Subsequently, we extracted the discriminant features and identified groups of population which where more susceptible to mobility.
These scripts are flask-ready and can easily be put in a dynamic server, processing the data live. However, we lacked a server on which to deploy it.
- python 3.7
- Jupyter Notebook
- ubuntu or WSL
Our code was tested on Ubuntu 20.0
- src/ : scripts to generate the graphs for the visualization website
- Data/ : subset 3 from the 2020 Public Service Employee Survey
- www/ : script for the website
LogReg_for_Mobility_Pred
: Notebook that contains the code for the Logistic Regression experiment.data_preproc
: Notebook that contains the code for our data preprocessing scheme.Ponderation-Normalization-PCA
: Notebook that contains the code for our PCA experiment.
@Learningchipmunk
@Alexis-BX
@AlyZei
@KatiaJDL