Coder Social home page Coder Social logo

kshitiz14 / predicting-glassdoor-salary Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.85 MB

Creating a model to predict the salary of a data scientist based on the job description found on Glassdoor website.

Jupyter Notebook 100.00%
python machine-learning regression data-science data-analysis

predicting-glassdoor-salary's Introduction

Predicting-Data Scientist-Salary

Creating a model to predict the salary of a data scientist based on the job description found on Glassdoor website.

Python version: 3.7.0

Python Packages: pandas, numpy, matplotlib, seaborn, sklearn

Data Collection

Data can be collected from various sources. One of the most popular form of data is in CSV format but it is essential to know that most of the time data are not in CSV form. Beside CSV, data can be extracted from text file, pdf file or through API or web scrapping.

Data Cleaning

Data cleaning is the most important step in the data analysis process. Most of the time, data are in raw format. If not cleaned appropriately, our predictive model will not provide accurate and concise output. In data analysis, Garbage in is Garbage Out. So, it was important to perform data cleaning in our glassdoor data.

In our data, we will change the elements of dataframe into more readable format such as changing the salary estimate from $81k - $100K to 81 - 100. We have also derived new columns from the existing column which we believe will help us in making better prediction. For example, we created column with average salary, we created new columns containing how many times job description required knowledge of python, R, spark, etc.

Exploratory Data Analysis

Next step is to explore the data through visualization. We should always perform exploratory data analysis before moving to model building. EDA not only help us to summarize the main characteristics but also provides intuition for our statistical model. EDA is a simple way to find any discrepancy in our data through visualization.

Some of the visualization from our EDA are Histogram Wordcloud

Model Building

Now we are finally building our prediction model.There are various prediction algorithm such as regression, classification, clustering, etc. Based on the nature and characteristics of the data, needed outcome, we can choose our algorithm. Since, we are predicting quantitative amount, regression would be best choice. Even in regression we would use three model multiple linear regression, lasso regression and random forest. Among three model, we chose the random forest since it performed the best.

predicting-glassdoor-salary's People

Contributors

kshitiz14 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.