- Installation
- Project Motivation
- File Descriptions
- Data Cleaning
- Results
- Licensing, Authors, and Acknowledgements
Packages: Beautiful Soup, pandas, numpy, sklearn, matplotlib, seaborn, wordcloud. The code should run with no issues using Python versions 3.*.
In this project, I collected the job posting information for Environmental Engineer from Indeed. The business questions that I am trying to understand are:
- Which companies have most job offerings?
- Which city and state have most job opportunities?
- Which sector has most job posting?
- How much should be expected to make as an Environmental Engineer in US?
- What are the keywords in the job summary?
- Is there a correlation between location and salary?
Indeed_scraper.ipyn
is the notebook for the webscrapper.
Use web scraper to scrape 1000 job postings from indeed.com. With each job, we got the following:
- Job title
- Company
- Location
- Salary
- Summary
EnvironmentalEngineerJobAnalysis.ipyn
is the main notebook for this analysis.
environmentalengineer.csv
is the dataset that contains the raw data from Indeed.com on March 9th, 2021.
I cleaned the data that was scrapped from the Indeed.
- Cleaned the location information and divided it into city and state.
- Classified the job into different categories based on the industry type: Water, Remediation, Air, Compliance, Civil.
- For Salary analysis, we will only keep the job infos with annual salary.
I looked at the distributions of the data and the value counts for the various categorical variables. Below are a few highlights from the analysis.
For more information, check out my Medium post here.
For this analysis, I analyzed the data from Indeed.com on March 9th, 2021.