Coder Social home page Coder Social logo

stjordanis / fraudhacker Goto Github PK

View Code? Open in Web Editor NEW

This project forked from dchannah/fraudhacker

0.0 2.0 0.0 7.03 MB

Anomaly detection system for medical insurance claims data

Jupyter Notebook 80.44% Python 2.38% CSS 3.06% JavaScript 0.54% HTML 13.58%

fraudhacker's Introduction

FraudHacker

FraudHacker is an anomaly detection system for Medicare insurance claims data. I built FraudHacker using Python3 along with various scientific computing and machine learning packages (numpy, scikit-learn, and many others). For more background about why I built FraudHacker, please see my blog post on the subject. I will focus on the technical details here.

Structure

  • data/: Contains a CSV file displaying the outlier count data generated by the anomaly labeling engine.
  • notebooks/: Jupyter notebooks demonstrating various aspects of FraudHacker's workflow, including the outlier detection, physician ranking, and hyperparameter sweeping.
  • src/: The actual source code for FraudHacker and the Flask app that displays its results to users.

Each directory has its own README file with more information.

Overview

FraudHacker ultimately utilizes clustering to perform outlier detection on Medicare claims data from the Center for Medicare and Medicaid Services (CMS). Each record contains aggregated information about one type of procedure (for example, a blood draw) performed by one physician. This data was downloaded in CSV format and loaded directly into a PostgreSQL database, which is the starting point of FraudHacker's interaction with the data. FraudHacker extracts numerical values from this database and uses these to perform clustering on the data for all of the physicians of a particular specialty (e.g. Neurology) in a particular state. The number of fraudulent procedures associated with each physician is tallied and output into a second database. The tallying could, in principle, be done on fly by operating directly on the PostgreSQL database containing the CMS data, but is much faster to pre-run the model and access the results. The outlier counts for each physician are then displayed to the user using the FraudHacker dashboard, which runs as a Javascript-driven Flask app. I currently have a copy of FraudHacker running on an AWS EC2 instance. It can be found at http://www.fraudhacker.site.

Workflow

A reader class, PandasDBReader (implemented in database_tools.py), reads the data from the PostgreSQL database (whose info is specified in an external YAML file) and loads it into a Pandas DataFrame. Then, this dataframe is ingested by an AnomalyDetector sub-class (depending on the desired algorithm; these are implemented in anomaly_tools.py). The AnomalyDetector performs the actual clustering and outlier labeling, produces an outlier score for each record. A threshold on the outlier scores is used to formally label certain records as outliers. The AnomalyDetector class also adds up the outlier counts for each physician.

The next step is currently done semi-manually (this could be improved in the future). I export the outlier counts for each physician to a CSV file (an example of what this data looks like can be found in the data folder). The outlier count data is in turn imported to another PostgreSQL database, which is ultimately directly read by the Flask app. This accomplished via another class, the OutlierCountDBReader (also implemented into database_tools.py). The OutlierCountDBReader produces the values that are ultimately displayed in the Flask app.

fraudhacker's People

Contributors

dchannah avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.