Coder Social home page Coder Social logo

sjwedlund / credit_risk_analysis Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 16.04 MB

Apply machine learning to solve the challenge of credit risk

Jupyter Notebook 100.00%
machine-learning imbalanced-learning scikit-learn randomoversampler smote clustercentroids smoteen balancedrandomforestclassifier easyensembleclassifier

credit_risk_analysis's Introduction

Credit Risk Analysis

Overview of the Analysis

The purpose of this analysis is to apply machine learning to solve the challenge of credit risk. Credit risk is an inherently unbalanced classification problem, because good loans easily outnumber risky loans. Therefore it is necessary to employ different techniques to train and evaluate models with unbalanced classes. In this challenge, I have used the imbalanced-learn and scikit-learn libraries to build and evaluate models using resampling. Using the credit card dataset from LendingClub, I have oversampled the data using the RandomOverSampler and SMOTE algorithms, and undersampled the data using the ClusterCentroids algorithm. Then I used a combination approach of over- and undersampling using the SMOTEEN algorithm. After that, I compared two new machine learning models that reduce bias, BalancedRandomForestClassifier and EasyEnsembleClassifier, to predict credit risk.

Results

Naive Random Oversampling

Naive_Random_Oversampling

- The balanced accuracy score is 66.3%. The precision of high-risk loans is 1%, and the recall is 64%.

SMOTE Oversampling

SMOTE Oversampling

- The balanced accuracy score is 64.6%. The precision of high-risk loans is again 1%, while the recall this time is 63%.

ClusterCentroids

Cluster_Centroids_Resampler

- Using the ClusterCentroids model of Undersampling, the balanced accuracy score is 51%. The precision is of high-risk loans still 1% and the recall is 59%.

SMOTEENN

SMOTEEN

- For this method of combination over and undersampling, the balanced accuracy score is 62.4%. The precision of high-risk loans is 1%, while the recall is 70%.

BalancedRandomForestClassifier

Random_Forest

- Using the BalancedRandomForestClassifier ensemble method of resampling, the balanced accuracy score is 78.7%. The precision of high-risk loans is a slight improvement at 4%, and the recall of high-risk loans is 67%.

EasyEnsemble AdaBoost Classifier

AdaBoost_Classifier

- Using the EasyEnsemble AdaBoost Classifier model, the balanced accuracy score is 92.5%. The precision of high-risk loans is 7%, and the recall of high-risk loans is 91%.

Summary

None of these results were very accurate in predicting high-risk loans. The EasyEnsemble AdaBoost Classifier model was slightly better at detecting high-risk credit, but it also falsely predicted 979 high-risk loans. Therefore, I do not recommend any of these models for predicting credit risk.

credit_risk_analysis's People

Contributors

sjwedlund avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.