Coder Social home page Coder Social logo

vikram-raju / permutation-importance-and-shap-on-fraud-classification Goto Github PK

View Code? Open in Web Editor NEW
0.0 0.0 2.0 357 KB

A take on highly imbalanced fraud classification using permutation importance to select top features and explaining the model using SHAP.

Jupyter Notebook 100.00%
classification fraud-detection permutation-importance shap smote

permutation-importance-and-shap-on-fraud-classification's Introduction

Introduction

A take on highly imbalanced fraud classification using permutation importance to select top features and explaining the model using SHAP. Here's the kernel on Kaggle.

This is an interesting problem in flagging credit risk when all variables are dimensionally reduced (likely PCA as hinted in the problem). It becomes a pure data science exercise to see what can be done with the data and how to handle highly imbalanced classes.

We'll explore the below -

  • look at imbalanced classes
  • test performance of the permutation importance in selecting important features
  • and finally look at model explanation using shap and eli5

SMOTE

When dealing with highly imbalanced classes it is important to balance the classes either by under sampling, over sampling, or over sampling using synthetic data generation. In this notebook we use SMOTE to balance classes and then look at the relationships within the features in the data.

Imbalanced data

Using SMOTE - Balanced data

Correlation

Permutation Importance

We'll use random forests to see what are the important features and compare against permutation importance. Unline random forests where we remove each column and estimate loss to weight importance, in permutation importance, we'll randomize the feature values in the respective column and estimate the loss in prediction to identify important features.

Random forest feature importance

RF

Permutation importance

PI

SHAP

Partial dependency plot

Let's see how changing the feature value of one feature affects the predictions. The y-axis shows the change in contribution to predictions over the spectrum of change in feature value.

PDP

SHAP

SHAP allows us to look at a row of data and see which feature contributed to it's respective prediction and by what magnitude.

SHAP

SHAP spread on important features

We can also see the spectrum of how SHAP values of a feature changes, the magnitude of the spread, and it's dependency with other variables.

SHAP importance

SHAP dep

permutation-importance-and-shap-on-fraud-classification's People

Contributors

vikram-raju avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.