Coder Social home page Coder Social logo

data-science-salary-prediction-project's Introduction

Data-Science-Salary-Prediction-Project : Overview

  • Collected 1000 Data Science job description of glassdoor which is scrapped using selenium.
  • Engineered features from the text of each job description to quantify the value companies put on a bachelor's degree, python, excel, aws, statistics and spark.
  • Optimized Linear, Lasso, and Random Forest Regressors using GridsearchCV to reach the best model.
  • Created a tool that predicts data science salaries worldwide.
  • Created new columns minimum, maximum and average salary for a specific job.
  • Created new columns for employer-provided salary and hourly wages.
  • Removed row without salary information.
  • Simplified company name.
  • Parsed jobs state and created a new column which consists of states' abbreviation name.
  • Perceived age of a company from the company foundation date.
  • Created columns for different skills, if required in the job description:
    • Python
    • R
    • Excel
    • AWS
    • Spark
    • Bachelors' Degree
    • Statistics
    • SQL
  • Created a new column for description length.
  • Removed unnecessary columns.

tried to analyze the distribution of qualitative and quantitative values. Also tried to find out the correlation between salaries and other variables.

alt text alt text alt text alt text

First, transformed the qualitative variables into dummies. Second, applied three different models:

  • Multiple Linear Regression.
  • Lasso Regression.
  • Random Forest

Model Performance

  • Random Forest : MAE - 18.69
  • Lasso Regression : MAE 24.20
  • Multiple Linear Regression: 122.02

Resources

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, pickle
Scraper Github: https://github.com/arapfaik/scraping-glassdoor-selenium
Article:

Guide: https://www.youtube.com/playlist?list=PL2zq7klxX5ASFejJj80ob9ZAnBHdz5O1t

data-science-salary-prediction-project's People

Contributors

shuchita-rahman avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.