This project will train you how to use SQL to analyze a real-world database, how to extract the most useful information from the dataset, how to pre-process the data using Python for improved performance, and how to use a structured query language to retrieve useful information from the database.
You will be using a real-world dataset of the Employee's records to complete this project. This project intends to evaluate the provided dataset, solve business problems on this dataset and mine information insights.
I have used the data available at HiCounselor website HR analytics data challenge, 2023. The dataset fulfils the requirements for the project and is in the CSV format.
- Data cleaning with Python:
- removing duplicates and
- handling null values,
- deletion or transformation of irrelevant values,
- data type transformation,
- data validations.
- SQL Queries:
- aggregating the data,
- grouping the data,
- ordering the data,
- using case
- use having
- sub queries etc
- Step 1: Removing duplicate rows.
- Step 2: Removing rows for which numeric columns are having irrelevant data type values
- Step 3: Remove irrelevant values from each column if any. Validation all values for a column, Check for any inconsistencies or discrepancies in data types, units, or formats.Feel free to add more validation checks which you might feel necessary for the dataset’s integrity
- Step 4: Export the cleaned dataset as a .csv file: prefer UTF-8 encoding.
We get cleaned dataset in csv.Then perform following steps.
- Step 5: Convert the pre-processed dataset into an SQL file.
- Step 6: Manually generate a table by utilizing the database information provided in the "Database Info" tab.
In this module, you will be working on performing data analysis on the pre-processed data from the previous module and conducting Data Analysis using SQL. You will generate queries for given problem statements.
- ( Round average age up to two decimal places if needed).
- ( Round average scores up to two decimal places if needed)
- (Round percentages up to two decimal places if needed)
Task 4:Show the number of employees who have met more than 80% of KPIs for each recruitment channel and education level.
Task 5:Find the average length of service for employees in each department, considering only employees with previous year ratings greater than or equal to 4.
- (Round percentages up to two decimal places if needed)
- ( Round average ratings up to two decimal places if needed)
Task 7:List the departments with more than 100 employees having a length of service greater than 5 years.
Task 8:Show the average length of service for employees who have attended more than 3 training, grouped by department and gender.
- ( Round average length up to two decimal places if needed)
Task 9:Find the percentage of female employees who have won awards, per department. Also show the number of female employees who won awards and total female employees.
- ( Round percentage up to two decimal places if needed)
Task 10:Calculate the percentage of employees per department who have a length of service between 5 and 10 years.
- ( Round percentage up to two decimal places if needed)
Task 11:Find the top 3 regions with the highest number of employees who have met more than 80% of their KPIs and received at least one award, grouped by department and region.
Task 12:Calculate the average length of service for employees per education level and gender, considering only those employees who have completed more than 2 trainings and have an average training score greater than 75.
- ( Round average length up to two decimal places if needed)
Task 13:For each department and recruitment channel, find the total number of employees who have met more than 80% of their KPIs, have a previous_year_rating of 5, and have a length of service greater than 10 years.
Task 14:Calculate the percentage of employees in each department who have received awards, have a previous_year_rating of 4 or 5, and an average training score above 70, grouped by department and gender .
- ( Round percentage up to two decimal places if needed).
Task 15:List the top 5 recruitment channels with the highest average length of service for employees who have met more than 80% of their KPIs, have a previous_year_rating of 5, and an age between 25 and 45 years, grouped by department and recruitment channel.
- ( Round average length up to two decimal places if needed).
To run this project, you will need to registered for this project with HiCounselor. They alloted a team and database with user and password. then you can access the sandbox.
- Mysql Server: phpMyAdmin
- Python
- SQL
- Excel
If you have any feedback, please reach out to us linkedin