Coder Social home page Coder Social logo

fdic-call-reports's Introduction

Domestically-chartered banks regulated by the FDIC are required to file quarterly financial reports to regulators, known colloquially as "call reports". This data is made publically available by the FFIEC but is difficult to download efficiently and hard to analyze in aggregate.

This project aims to simplify data access and open up new methods for analysis.

It currently consists of 3 major pieces:

  1. A downloader to retrieve all bulk call report data,
  2. A MapReduce job that processes raw data to produce aggregate data per bank, and
  3. A Django-powered web application that allows for exploration and review of aggregated data

Downloading raw call report data

While the current FFIEC download site does offer minimal bulk downloading capabilities, it's not easy to retrieve a detailed set of all call report data over time. Without an API, the only way to do this is to use a browser to retrieve each dataset.

The bin/download_all_reports.py script automates this process by using Selenium to iteratively download all available single quarterly call reports. See the comments in that script for instructions on how to setup Selenium locally.

Files downloaded from this script are stored in their original archive format (.zip).

Preparing call report data for MapReduce

The bin/extract.py script extracts all raw text files from the downloaded call report archives and prepares them for processing by MapReduce. Each file is converted to line-JSON format, and optionally stored in a bucket on S3. Files are organized first by report date and then by report type.

Note that all code that leverages AWS resources like S3 follows standard convention and assumes that you have your AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables set.

Running MapReduce

A MapReduce job is written to handle the numerous input files, group data by report type and bank, and output in a normalized way.The excellent mrjob library is used so that the job can be written in Python.

See bin/mrjob-transform.py for the code; it currently handles two report types: "Bulk POR" (general bank metadata) and "Schedule RC" (balance sheet information).

To run locally, you can do something like:

python jobs/banks.py input/*/*Bulk*.txt input/*/*RC\ *.txt > output.txt

Optionally, you can also run on Elastic MapReduce. See mrjob documentation for details.

Loading data into MySQL

MySQL is used as a datastore for the Django web application. A loader script is provided to read in the file generated in the previous MapReduce step.

To import data, simply run:

./manage.py load -f output.txt

This requires that you have the DJANGO_RDS_HOST, DJANGO_RDS_USER, and DJANGO_RDS_PASSWORD environment variables set to appropriate database credentials.

The Django web application

The Django web application allows for browsing of all data generated in previous steps.

The home page shows a list of all banks over all time, alphabetized by name.

index

Each bank's ID links to a page displaying the details of its balance sheet over time.

bank

fdic-call-reports's People

Contributors

chosak avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.