Coder Social home page Coder Social logo

voldemort-project-1's Introduction

Voldemort-Project

Team : Tony - Jerrit - Andrea

April 2021

Project exploring data cleaning, vizualisation and analysis of credit card data for mid-term planning of marketing activities.

Content

Project Outline

This repository contains data and additional information by the Lily Potter team for the Ironhack Mid-Bootcamp project. Our objective was to understand the features and characteristics of the bank's customers and make predictions of those who accept a credit card offer.

df.head

The Data

The dataset was checked for duplicates. The identifier field (Customer Number) was dropped. The data columns were checked for null values. Null values were present in the numerical features were dropped since they represent less than one percent of the total. The data type of each column was checked. The unique values and distribution of each categorical feature were checked. The distributions of each numerical variable were checked. The data fields credit cards held, homes owned, household size, bank accounts open were of numeric data type but are in fact categoricals. They will be treated as categories in our analysis. Numericals variables were checked for correlation using heatmap and some were dropped due to high redundancies. Data columns were renamed and stylized as snake_case. Checking the target variable unique values we have seen that the data is strongly unbalanced. We will deal with this during the several preprocessing iterations.

Heatmap

The Database

The marketingcreditcard.csv file has been imported in to SQL workebench in order to answer to some questions related to the groupwork.

SQL - In this folder you can find the .sql query file with annotations. Every query is marked with question number and query results for better orientation in the file.

Visualisation

Using Tableau we were able to easily visualise different aspects of the data and to perform some basic distribution techniques as well as more complex ones. The imbalancement of the data represent an obstacle to the visualisation and to overcome the problem we used avg value more often than the count one. Later, through the Heatmap we have seen high correlation among the variables Q1, 2, 3, 4 balances.

Question 2 - Vizualisation of the imbalance in the dataset

Question 3 - Imbalance in dataset in percentage ot total number of people in dataset

Question 4 - Analyzing certain customer characteristics in one dashboard

Question 5 - Avg balance - trend over 4 quarters

Question 6 - Average balances per quarter and houshold - explanation for household size 8

Question 7 - Cross tabs for features and characteristics in one dashboard

Question 8 - Average balance grouped

Statistical Analysis

We decided to use a logistic regression model to solve our classification problem. The target variable y, represented by "offer accepted", has been isolated and later predicted with the model. we transformed the data and split the data into train and test sets with a test size of 30%. In total 5 rounds of iteration have been run with with an accuracy span which ranges bwtween 0.69 to 0.94, and the application of different over-, under-, hybrid sampling has been applied.

In details:

Linear Regression Model

  • Original Sampling: accuracy of 0.94, but no ability to recognise the offer accepted equal to yes results.
  • SMOTE: accuray of 0.73, yes recognesed but less accurate
  • Tomek: accuray of 0.69, yes recognesed but less accurate
  • SMOTTomek (HYBRID): accuray of 0.73, yes recognesed but less accurate

Random Forest

  • SMOTTomek (HYBRID): accuray of 0.86 --> SELECTED ONE

Confusion Matrix
ROC
Metrics

Delivering Insight

Using our dataset we were able to analyse the carateristics of bank customers who respond positively or not to credit cart offers. Furthermore, with the creation of a predictive model, we are now able to better target customer who are more likely to activate these offers and in turn improve marketing results while reducing company's costs.

Next Steps

Transfer the developed method into business processes for marketing:

  • Setup data streams from database into CRM system
  • Programming data processing
  • Creating dashboards for different marketing user groups
  • Training staff on how to run reports and amend them if necessary
  • developing personas

Jupyter notebook results and backups.

Presentation slides containing: problem description, machine learning process, results, next steps.

voldemort-project-1's People

Contributors

profgeller avatar tognolia avatar tonyhathuc avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.