Coder Social home page Coder Social logo

laurentveyssier / udacity-predict-customer-churn-with-clean-code Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 2.0 6.33 MB

Udacity project#1 machine Learning DevOps Engineer Nano degree

License: GNU General Public License v3.0

Jupyter Notebook 97.92% Python 2.08%
autopep8 clean-code pylint shapley-value sklearn-classify churn-prediction logistic-regression-classifier random-forest-classifier

udacity-predict-customer-churn-with-clean-code's Introduction

Predict Customer Churn

  • Project Predict Customer Churn of ML DevOps Engineer Nanodegree Udacity

Project Description

This is the first project of Udacity's Machine Learning DevOps Engineer Nanodegree. The project objective is to produce production-ready clean code using best practices. The project itself aims at predicting customer churn for banking customers. This is a classification problem. The project proposes the following approach:

  • Load and explore the dataset composed of over 10k samples (EDA)
  • Prepare data for training (feature engineering resulting into 19 features)
  • Train two classification models (sklearn random forest and logistic regression)
  • Identify most important features influencing the predictions and visualize their impact using SHAP library
  • Save best models with their performance metrics

the script .py file was adjusted to the PEP8 standard using autopep8 module. In addition it scores above 8.0 using pylint clean code module.

Files and data description

Overview of the files and data present in the root directory

The project is organized with the following directory architecture:

  • Folders

    • Data
      • eda --> contains output of the data exploration
      • results --> contains the dataset in csv format
    • images --> contains model scores, confusion matrix, ROC curve
    • models --> contains saved models in .pkl format
    • logs --> log generated druing testing of library.py file
  • project files

    • churn_library.py
    • churn_notebook.ipnyb
    • requirements.txt
  • pytest files (unit test file and configuration files)

    • test_churn_script_logging_and_tests.py
    • pytest.ini
    • conftest.py

Running Files

  • The project should be executed with python 3.8 and the appropriate python packages
  • The required packages are provided in the requirements.txt file
  • To run the project, execute the script python churn_library.py from the project folder
  • Alternatively, the project can be executed using the jupyter notebook for a step-by-step approach
  • The project script churn_library.py was tested using pytest python package
    • To run the unit tests, simply type pytest from the main project folder in the command line
    • Project functions will be automatically tested with log file generated in the logs folder

Classification performance

Random Forest achieves the best performance on the test set:

  • superior ROC curve:

  • strong confusion matrix although still generating false negative which could be an issue given our objective to detect churn likelyhood:

The influence of each feature on the prediction to churn can be visualized using SHAP module (feature pushing towards churn to the right of the y-axis):

udacity-predict-customer-churn-with-clean-code's People

Contributors

laurentveyssier avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.