This repository contains my solution for the Employee Attrition Prediction challenge presented at the Bluechip Summit hackathon. The objective of this challenge was to predict the probability of employee attrition within a given company using machine learning techniques.
This leaderboard represents the standings calculated with approximately 20% of the test data at the time of the hackathon. The final results, based on the remaining 80% of the test data, led to different standings.
- Position(out of 158): 85
- Accuracy: 0.90164
- Percentage Accuracy: 90.164%
The private leaderboard reflects the final standings calculated with approximately 80% of the test data at the time of the hackathon. As the competition concluded, this leaderboard showcases the ultimate rankings in the Employee Attrition Prediction challenge.
- Position(out of 158): 65
- Accuracy: 0.87396
- Percentage Accuracy: 87.396%
Employee attrition poses a significant challenge for organizations, impacting productivity and incurring high recruitment costs. The task was to develop a predictive model that anticipates the likelihood of an employee leaving the company based on provided data.
Employee-Attrition-Prediction-Bluechip-Hackathon-Solution/
├── assets/
│ └── images/
│ ├── private_leaderboard.png
│ └── public_leaderboard.png
├── data/
│ ├── train.csv
│ ├── test.csv
│ ├── sample_submission.csv
│ └── data_with_attrition2.csv
├── modeling/
│ └── models.ipynb
├── README.md
└── requirements.txt
The dataset provided for this challenge includes two main files:
- train.csv: Contains the training data with various features related to employees.
- test.csv: Holds the test data for which predictions need to be made.
- sample_submission.csv: A sample submission file format to follow for submitting predictions.
The solution involved several key steps:
- Exploratory Data Analysis (EDA): Understanding the data distribution, identifying patterns, and gaining insights into factors influencing attrition.
- Data Preprocessing: Handling missing values, encoding categorical variables, and feature scaling.
- Feature Engineering: Creating new features or transforming existing ones to improve model performance.
- Model Building: Utilizing machine learning algorithms to predict the probability of employee attrition.
- Evaluation: Assessing model performance using appropriate metrics like accuracy, precision, recall.
- Python 3
- Jupyter Notebook (optional for running the provided notebooks)
- Required Python libraries (listed in requirements.txt)
git clone https://github.com/francisojeah/Employee-Attrition-Prediction-Bluechip-Hackathon-Solution.git
pip install -r requirements.txt
- Place the provided train.csv and test.csv files in the project directory.
- Run the provided notebook, models.ipynb, to reproduce the analysis, feature engineering, model building, and predictions.
- Ensure that the submission file follows the format specified in sample_submission.csv.
The best-performing model achieved a probability prediction accuracy of 90.87% on the test set, demonstrating its effectiveness in predicting employee attrition.
This project is licensed under the MIT License.