This project aims to analyze and predict employee retention using various machine learning techniques. The project includes exploratory data analysis (EDA), model selection and building, and the deployment of a REST API with a user-friendly UI.
- Project Structure
- Data Description
- Exploratory Data Analysis (EDA)
- Model Selection and Building
- Clustering and Classification
- Deployment
- How to Run
- Contributors
- License
Employee Retention/
├── .idea/
├── Analysis.ipynb
├── EndtoEndML_v11/
├── ModelSelectionandbuilfing.ipynb
├── hr_employee_churn_data.csv
├── readme.md
├── v11/
.idea/
: IDE specific filesAnalysis.ipynb
: Notebook for EDA and initial analysisEndtoEndML_v11/
: Directory containing end-to-end ML pipeline scriptsModelSelectionandbuilfing.ipynb
: Notebook for model selection and buildinghr_employee_churn_data.csv
: Datasetreadme.md
: Project README filev11/
: Directory containing version 11 of the project scripts
The dataset hr_employee_churn_data.csv
contains employee data with various features that help in analyzing and predicting employee retention. The key features include:
- Employee ID
- Age
- Department
- Job Role
- Monthly Income
- Attrition (Yes/No)
- ...and many more.
Initial data exploration and visualization are performed to understand the underlying patterns and relationships in the data. Key steps include:
- Data cleaning and preprocessing
- Statistical summary of features
- Visualizations (e.g., histograms, bar plots, box plots)
- Correlation analysis
Several machine learning models are evaluated for predicting employee retention. The models include:
- Logistic Regression
- Decision Trees
- Random Forest
- XGBoost
The performance of these models is compared using metrics such as accuracy, precision, recall, and F1-score.
K-means clustering is employed for employee segmentation. This helps in identifying distinct groups of employees based on their attributes, which can be useful for targeted retention strategies.
Random Forest and XGBoost classifiers are used for predicting employee retention. These models are chosen for their robustness and ability to handle complex data structures.
The final model is deployed using a REST API built with Flask. A user-friendly UI is developed using Streamlit, providing an interactive way to predict employee retention.
The application is thoroughly tested and deployed on AWS EC2 with Nginx as the web server. The deployment steps include:
- Setting up an EC2 instance
- Configuring Nginx as a reverse proxy
- Deploying the Flask app and Streamlit UI
- Python 3.7+
- Flask
- Streamlit
- Scikit-learn
- XGBoost
- AWS Account (for deployment)
-
Clone the repository:
git clone https://github.com/saurin16/Employee_retention_Analysis_and_prediction.git cd Repo_name
-
Install the required packages:
pip install -r requirements.txt
-
Run the Flask API:
python app.py
-
Run the Streamlit UI:
streamlit run ui.py
-
Open your browser and go to
http://localhost:8501
to interact with the UI.
- Launch an EC2 instance and SSH into it.
- Install necessary packages and clone the repository on the instance.
- Configure Nginx as a reverse proxy for the Flask app.
- Run the Flask app and Streamlit UI on the instance.
This project is licensed under the MIT License - see the LICENSE file for details.