Coder Social home page Coder Social logo

loan-prediction's Introduction

Loan Prediction

Predict whether or not loans acquired by Fannie Mae will go into foreclosure. Fannie Mae acquires loans from other lenders as a way of inducing them to lend more. Fannie Mae releases data on the loans it has acquired and their performance afterwards here.

Installation

Download the data

  • Clone this repo to your computer.
  • Get into the folder using cd loan-prediction.
  • Run mkdir data.
  • Switch into the data directory using cd data.
  • Download the data files from Fannie Mae into the data directory.
    • You can find the data here.
    • You'll need to register with Fannie Mae to download the data.
    • It's recommended to download all the data from 2012 Q1 to present.
  • Extract all of the .zip files you downloaded.
    • On OSX, you can run find ./ -name \*.zip -exec unzip {} \;.
    • At the end, you should have a bunch of text files called Acquisition_YQX.txt, and Performance_YQX.txt, where Y is a year, and X is a number from 1 to 4.
  • Remove all the zip files by running rm *.zip.
  • Switch back into the loan-prediction directory using cd ...

Install the requirements

  • Install the requirements using pip install -r requirements.txt.
    • Make sure you use Python 3.
    • You may want to use a virtual environment for this.

Usage

  • Run mkdir processed to create a directory for our processed datasets.
  • Run python assemble.py to combine the Acquisition and Performance datasets.
    • This will create Acquisition.txt and Performance.txt in the processed folder.
  • Run python annotate.py.
    • This will create training data from Acquisition.txt and Performance.txt.
    • It will add a file called train.csv to the processed folder.
  • Run python predict.py.
    • This will run cross validation across the training set, and print the accuracy score.

Extending this

If you want to extend this work, here are a few places to start:

  • Generate more features in annotate.py.
  • Switch algorithms in predict.py.
  • Add in a way to make predictions on future data.
  • Try seeing if you can predict if a bank should have issued the loan.
    • Remove any columns from train that the bank wouldn't have known at the time of issuing the loan.
      • Some columns are known when Fannie Mae bought the loan, but not before
    • Make predictions.
  • Explore seeing if you can predict columns other than foreclosure_status.
    • Can you predict how much the property will be worth at sale time?
  • Explore the nuances between performance updates.
    • Can you predict how many times the borrower will be late on payments?
    • Can you map out the typical loan lifecycle?

loan-prediction's People

Contributors

vikparuchuri avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

loan-prediction's Issues

assemble.py crashes due to pandas update

Assembe.py was crashing when executing. Fixed the issue by changing the value of index_col=False to index_col=None when reading into pandas.

Like this:
data = pd.read_csv(os.path.join(settings.DATA_DIR, f), sep="|", header=None, names=HEADERS[pref
ix], index_col=None)

IndexError: list index out of range

I am following your Machine learning project but get the following error when execute assemle.py. I know its index out of bound error but I cant figure out where?

python assemble.py
Traceback (most recent call last):
File "assemble.py", line 87, in
concatenate("Performance")
File "assemble.py", line 77, in concatenate
data = pd.read_csv(os.path.join(settings.DATA_DIR, f), sep="|", header=None, names=HEADERS[prefix], index_col=False)
File "/home/khan/anaconda3/envs/mllearn/lib/python3.6/site-packages/pandas/io/parsers.py", line 646, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/khan/anaconda3/envs/mllearn/lib/python3.6/site-packages/pandas/io/parsers.py", line 401, in _read
data = parser.read()
File "/home/khan/anaconda3/envs/mllearn/lib/python3.6/site-packages/pandas/io/parsers.py", line 939, in read
ret = self._engine.read(nrows)
File "/home/khan/anaconda3/envs/mllearn/lib/python3.6/site-packages/pandas/io/parsers.py", line 1508, in read
data = self._reader.read(nrows)
File "pandas/parser.pyx", line 848, in pandas.parser.TextReader.read (pandas/parser.c:10415)
File "pandas/parser.pyx", line 870, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:10691)
File "pandas/parser.pyx", line 947, in pandas.parser.TextReader._read_rows (pandas/parser.c:11728)
File "pandas/parser.pyx", line 1023, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:12805)
File "pandas/parser.pyx", line 1289, in pandas.parser.TextReader._get_column_name (pandas/parser.c:17512)
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.