HR-Analytics: Predict whether someone will quit the job

This is a Flask API developed by me to determine if a Data Scientist would leave their current job provided their previous information.
Contents:

Deploy the API on Heroku by clicking the button below.

python SVM.py

python Employee_api.py

The dataset used for training the SVM model was taken from Kaggle.
Link: HR Analytics: Job Change of Data Scientists: https://www.kaggle.com/arashnic/hr-analytics-job-change-of-data-scientists
Features:
- enrollee_id: Unique ID of the candidate
- city: City code
- city_development_index: Developement index of the city (scaled)
- gender: Gender of the candidate
- relevent_experience: Relevant experience of the candidate
- enrolled_university: Type of University course enrolled (if any)
- education_level: Education level of the candidate
- major_discipline: Education major discipline of the candidate
- experience: Candidate total experience in years
- company_size: No. of employees in current employer's company
- company_type: Type of the current employer
- last_new_job: Difference in years between previous job and current job
- training_hours: Training hours completed
Target:
- 0 : Not looking for job change,
- 1 : Looking for a job change

The dataset shows that this is clearly a classification task and can be solved by a myriad of classification algorithms such as Logistic Regression, Decision Trees and even Random Forests.
I chose Support Vector Machines(SVMs) because of the flexibility it shows during training.
After implementing Cross Validation using StratifiedKFold and doing parameter search using Grid Search CV on both SVM and Random Forest, I found out that the SVM performed slightly better in understanding the correlation between the features and the target.
Following is the table for their individual scores.

Scoring Parameter Random Forest SVM

Accuracy 0.77 0.77

Precision(wgt) 0.75 0.75

Recall(wgt) 0.77 0.77

F1-score(wgt) 0.75 0.76

Classes Random Forest SVM

0 0.92 0.90

1 0.32 0.38

Classes	Random Forest	SVM
0	0.92	0.90
1	0.32	0.38

Hyper Parameters chosen for:
- SVM: {'C': 411, 'kernel': 'rbf'}
- Random Forest: {'criterion': 'entropy', 'max_depth': 9, 'max_features': 'sqrt', 'n_estimators': 425}
Provided the dataset was slightly unbalanced, the SVM model gave a better Recall score for the negative classes as compared to Random Forest.
Hence, I chose SVM as the model to use for the API.

I have made an API for the SVM model so that users can interact and use the model with ease.
To make the API work I have used the Flask library which are mostly used for such tasks.
I have also connected a HTML form to the flask app to take in user input and a CSS file to decorate it.

The Flask API was deployed on the Heroku cloud platform so that anyone with the link to the app can access it online.
I have connected this GitHub repository to Heroku so that it can be run on the Heroku dyno.
I have used the Gunicorn package which lets Python applications run on any web server. The Procfile and requirements.txt should be defined with all the details required before the deployment.

Data Wrangling using Pandas
Feature Engineering to fit our data to our model
Selecting the right model using cross-validation
Hyperparameter tuning
Unbalanced datasets are hard to work with and finding the right scoring method
Saving the model and using it again with Pickle
Making a flask app
A little frontend web development
Making the app live by deploying it on cloud platforms

anityagan9urde / hr-analytics-will-someone-quit Goto Github PK