Coder Social home page Coder Social logo

jharvey09 / risky_business_peer_to_peer_lending Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1006 KB

In this project, I will use credit risk models to assess the credit risk using peer-to-peer lending. Algorithms such as SMOTE, Naive Random Sampling, etc.

Jupyter Notebook 100.00%
time-series-analysis linear-regression peer-to-peer credit-risk credit-fraud smote-sampling smoteenn-combination naive-random-oversampler

risky_business_peer_to_peer_lending's Introduction

Risky Business [Peer To Peer Lending]

image

Table Of Contents:

Background

Files

Steps

Conclusion

Background:

Mortgages, student and auto loans, and debt consolidation are just a few examples of credit and loans that people seek online. Peer-to-peer lending services such as Loans Canada and Mogo let investors loan people money without using a bank. For this project, I will build and evaluate several machine learning models; to predict credit risk using data typically seen from peer-to-peer lending services. Credit risk is an inherently imbalanced classification problem (the number of good loans is much larger than the number of at-risk loans). I will need to employ different techniques for training and evaluating models with imbalanced classes.

Files:

  1. Resampling Notebook
  2. Ensemble Notebook
  3. Lending Club Loans Data

Steps:

  1. Read the data into a DataFrame

image

  1. Split the data into training and testing sets

image

  1. Scale the training and testing data using the StandardScaler
  2. Generate a classification report using the imbalanced_classification_report

image

  1. For the balanced random forest classifier only, print the feature importance sorted in descending order

Conclusion:

Credit Risk Resampling:

For this analysis, I needed to use many different algorithms to form a cohesive conclusion on the best model for credit risk resampling. The model would fit best in the current peer-to-peer lending market, and to figure this out. I needed to ask myself three questions. First, I would ask Which model had the best-balanced accuracy score? I would then ask myself as I peered through the data. Which model had the best recall score? The final determining factor would be in my last question, Which model had the best geometric mean score? After the initial imports, I trained and tested the data with a StandardScaler. Next, I loaded that data into a simple logistic regression model. Therefore, using that data to calculate the balanced accuracy score. Display the data in a confusion matrix before printing the data in an imbalanced classification report. Next, I would need to train the data with the two algorithms separately, such as the "Naive Random Oversampling algorithm" first. Following, I loaded individual oversampled algorithms into a Logistic Regression model. Next, I calculated the balanced accuracy score; before running the data through a confusion matrix. My last step was to print through the imbalanced classification report, repeat the same steps for the SMOTE Oversampling algorithm. Now to undersample, I would need to resample the data using the ClusterCentriods algorithm. Train and fit the Logistic Regression model, calculate the balanced accuracy score. Finally, display the data in a confusion matrix and print the imbalanced classification report. To conclude my findings, I would need to use the over-under-sampling algorithm to determine if the algorithm results in the best performance compared to the other sampling algorithms above. The best credit risk resampling that provided the best-balanced accuracy score was the Logistic Regression using SMOTEENN. The model that had the best recall was the Naive Random Sampling. Last, the model with the best geometrical mean score was the Logistic Regression using SMOTEENN.

Credit Risk Sampling:

For this analysis, I needed to use a few different algorithms to conclude my findings. I asked myself the same three questions and one additional one. This time I was curious, what were the top three features I used to help me conclude my finding? After initial imports and basic data cleaning, I split and trained data before using a StandardScaler. Next, the algorithm used was the "Balanced Random Forest Classifier". I would next calculate the balanced accuracy score before running the data through a confusion matrix. Next, print the data on an imbalanced classification report; I listed the features in descending order. I completed the same steps above for the next algorithm entitled the "Easy Ensemble Classifier". The best credit risk sampling that provided the best-balanced accuracy score was the Easy Ensemble Classifier. The model that had the best recall was the Easy Ensemble Classifier. Last, the model with the best geometrical mean score was the Easy Ensemble Classifier. And, the final question was, What were the top three features? After further review, I feel that the top three features are:

Regression | Classification | Predict

risky_business_peer_to_peer_lending's People

Contributors

jharvey09 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.