Coder Social home page Coder Social logo

capstone_project's Introduction

Capstone_Project

Capstone Project for Flatiron School DS 021720

Home Loan Default Predictions

The Project: For my Capstone project I chose to do a data analysis over the Home Default Credit Risk competition on Kaggle.

The Goal:   The focus of this project is to create a model that can predict if an individual will default on their home loan or not. Once the predictions are done we will list the features that affect the result the most and present it as a potential risk factor to the customer.

The Problem:

  • There are many factors that affect the results
  • There are many underlying circumstances that aren't apparent at first glance
  • There are factors that the data won't be able to capture

The Solution:

  • The model will be able to comb through the data and find the biggest factors that alter the result
  • In Depth Analysis to filter the data to find the best results

The Process

  1. Explore & Clean data
  2. Model the data
  3. Find the feature importances to list out to see which factors were the biggest contributors

The Data:

The data was provided by the Home Default Credit Risk Competition on Kaggle

The Metrics:

The main metric used was ROC_AUC scoring.

The Models Chosen:

  • The baseline model - Random Forest
  • Other models used - Linear Regression

Conclusion

The largest portion of the individuals that defaulted were from the ages of 20 to 30. The 5 Most important features for the age range of 20-30 is as follows:

  1. How many days before the application the person started current employment
    • Majority of the individuals were only employed for a short amount of time
  2. The population density of the individual's home address
    • Most of the individuals lived in low density areas so outside of cities or other non crowded residential areas
  3. The amount of the loan
    • This amount flucuated a lot but it was most of the time 3 times higher than the income of the individual
  4. If the household had a child or not
    • Most of the individuals that defaulted had 0 or only 1 child
  5. Occupation Type
    • The data shows most individuals that defaulted were ones that were in lower paying jobs designated "laborer"

The 5 Least important features is:

  1. The number of payments left on the loan
    • The rate at which the individual defaulted on the loans wasn't dependant on how much longer they had left to pay off the loan
  2. The Size of the family
    • The number of family members living under the same household didn't affect the risk much
  3. The region rating
    • Value set by the original dataset rating the quality of the living establishment
  4. If an individual had multiple addresses listed under different categories
    • Many of them had different addresses listed under their work/personal/contact seperately
  5. If an individual had a phone or not

Future Recommendations

  1. Gather more data with less null values
  2. Apply feature engineering to improve model performance
  3. New products to accomodate the high risk clientele (Loan Forgiveness Programs)

Repository Guide

Notebooks

CSV Files

Presentation Canva Link

Resources

The Data:

The raw datasets were too big for github so they have not been added to this Repository.

The data is based off of this Kaggle Competition.

Models:

Below you will find model documentation

Human Resources

  • My Last Data Bender Cohort 02/17/20 classmates
  • Lindsey Berlin DS 02/17/20 Instructor
  • Bryan Arnold DS 002/17/20 Instructor

capstone_project's People

Contributors

tyasuoka avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.