Coder Social home page Coder Social logo

shaktipanda1235 / kpmg_virtual_internship Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 2.47 MB

Sprocket Central is medium size bike company which requires analytical insights regarding marketing strategy and which customers to target from both current and future customers. A final visualisation input needed to be given to get a sign-off to work further.

Jupyter Notebook 100.00%
data-quality-assessment customer-segmentation data-visualization

kpmg_virtual_internship's Introduction

ScreenRecorderProject1.mp4

Inspiration

After working with more than 70 toy datasets, this internship provided a platform to get a detailed overview of how industry works. As most of dataset I worked on used to be cleaned ones, this internship helped in getting data in it's raw form. Starting from basic collected data to model building and collecting insigths from it was a full fledged work I was interested in.

Data Quality Assessment

  • As data was in its raw form, it was necessary to clean it and observe any disperancy in it before proceeding to deriving insigts, as insights from a vague value may in a misleading conclusions.
  • Dataset was evaluated based on Data Quality Framework Table and a draft was generated which was sent to client for double check. Also strategies were given to mitigate such disperancy.

Feature Engineering

  • A dataset was downloaded from Australian Bureau of Statistics where each pincode was segregated to different states. Most important finding of such feature engineered column was segregating Australian Capital Territory from 'New South Wales.

Insights on current customers

  • Pareto Principle was checked for current customers and it failed. image

  • A RFM analysis was carried out on all the customers and following conclusions were derived:

image

The company was loosing 39 important customers

Data Preparation before modelling

  • Dataset had no target variable, so a column was created from RFM analysis using monetary habits, i.e They were divided into two category based on spending.
  • As there were 12.6% missing value in job_industry_category which was found to be an important indicator in predicting importance of customer, instead of dropping the column a target based encoding is done which took care of the missing values in data. i.e. WOE(Weight of Evidence) Encoding

Model Building

  • Best model was found to be a Stacking Classifier which had two KNNClassifier as base models and GausianNB as final_estimator. image
  • A model was created using full datasets as previously train-test split was compromising patterns present in train dataset.

Prediction

  • The new customer dataset was converted to similar structure to that of train dataset and prediction was carried out.
  • Deciding factor job_industry_category of 'Manufacturing' has an important potential to tap on. image

Visualisation

  • A dashboard was created to make client understand about what was happening and what is going to happen among their current customers and new customers.

kpmg_virtual_internship's People

Contributors

shaktipanda1235 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.