Coder Social home page Coder Social logo

cschank / yhack2017 Goto Github PK

View Code? Open in Web Editor NEW
4.0 4.0 1.0 214.04 MB

Our entry for the 2017 YHack Hackathon at Yale University, December 1st-3rd, 2017. Submission to challenge issued by Vitech Systems Group, Inc. Built from scratch in 36 hours by Chris Schankula, Sophia Tao, Daniel Cefaratti and Matthew D'Cruz.

Home Page: https://devpost.com/software/vitech-insurance-quote-predictor-and-plan-recommender

Python 4.23% HTML 28.39% JavaScript 44.91% CSS 0.23% GLSL 0.85% C 0.67% C++ 0.98% Batchfile 0.01% Shell 0.08% MAXScript 0.63% RPC 19.03%
neural-networks machine-learning insurance python javascript yhack hackathon

yhack2017's Introduction

Vitech Insurance Quote Predictor and Plan Recommender: McMaster Team #1 (The Beard)

See full description and pretty pictures on Devpost

Vitech challenge description (from YHack 2017 Devpost)

Insurance company Intellisurance provides its life insurance product to its 1.4M customers. They offer 4 plans - Bronze, Silver, Gold and Platinum. The base price for all these 4 plans is fixed, but the monthly premium each individual pays for the plan varies based on factors like age, tobacco usage, pre-conditions, city, state etc. The dataset contains the details of 1.4M customers, their demographic information and other details that affect their monthly premiums. The dataset also gives the information regarding which plans were purchased. Note: The customer can purchase only one plan.

Challenge: Create data visualizations, use machine learning to predict the purchased plans or even the pricing of premiums, make an interface for customers to give their details and compare the plans. More details: https://v3v10.vitechinc.com/yhack

Description of our solution

Quoting the Price (Machine Learning):

A wide neural network was used to model the quoted price for each plan. Unfortunately there was little time/processing power to play around with the hyperparameters of the network, however for price prediction were able to achieve a low mean squared error (between 10-12 each plan) which would be an average absolute error of around $3 in the montly plan price. Four seperate regression networks were for each plan using 5500 randomly selected entries from the database of people. As there were too many ICD codes to process, we condensed the codes into their hierarchical buckets and tracked the number of "high", "medium", and "low" risk conditions per bucket and fed that into the network. The network parameters and hyperparameters were saved to two files per network and loaded as needed. Network could be improved with more training time and data

Code:

  • bronze_quote_model.py: model for the network used to quote prices for the bronze plan (Mean Square Error ~ 10)
  • silver_quote_model.py: model for the network used to quote prices for the silver plan (Mean Square Error ~ 11)
  • gold_quote_model.py: model for the network used to quote prices for the gold plan (Mean Square Error ~ 12)
  • plat_quote_model.py: model for the network used to quote prices for the platinum plan (Mean Square Error ~ 12)
  • fitData.py: used by the server to compile the saved network data and make predictions

Guessing the Purchased Plan (Machine Learning):

Guessing the plan the customer picks was one of our ambitious goals. We were able to construct a basic wide neural network for classification(no reason for width instead of depth) however there was not enough time or processing power to tune the hyperparameters. Prediction accuracy was low for the test data we used, indicating that either not enough training data was used, or that there was no relation between parameters and plan chosen Initially the quoted price plans were included in the parameters, however we found that these did not actually affect the accuracy of our network for the limited training data we had. In the end, this classification network made use of the same 5500 randomly selected entries as the regression networks.

Code:

  • picker_model.py: model for the network used to quote prices for the bronze plan (Accuracy ~ 35%)
  • fitData.py: used by the server to compile the saved network data and make predictions

Dependencies:

  • Main architecture: Anaconda Python 5.0.1.
  • simplejson for encoding and decoding JSON requests.
  • The default http.server Python package reponses to requests.
  • The server is hosted on DigitalOcean's Cloud Ubuntu servers.
  • neural network makes use of scipy, numpy, pandas, scikit-learn, tensorflow, and kerasss
  • This program makes use of the Keras package with tensorflow for the neural network.

Problems Encountered and Overcome:

Unfortunately there were many limiting factors towards training our model. The primary issue we ran into was the lack of internet connectivity which restricted our ability to work on this problem, given the big data nature of the problem. Our initial intention was to host all the parameters we needed in a Google Cloud bucket and make use of the Google Cloud Platform's machine learning to make use of the platforms power and robustness.

Future Work and Improvements:

  • Most statistical analysis should be performed to analyze the networks and the correlations amongst variables.
  • Given additional time, we would like to tune the neural network parameters/hyperparameters as well as explore different classifiers/clustering algorithms to find more emergent features.

yhack2017's People

Contributors

cschank avatar danielcefaratti avatar mattdcr avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

strategist922

yhack2017's Issues

Vulnerability report

We are a group of researchers from Leiden University, and we conduct research on vulnerabilities in open-source software. We have discovered and verified a high-severity vulnerability in your project(CSchank/YHack2017). Explaining the vulnerability further in this issue could allow malicious users to access details, so we recommend enabling private vulnerability reporting on GitHub to discuss this matter confidentially.
After you have enabled this feature, please add a comment to this issue so we can continue our discussion. If you have any questions, feel free to leave a reply here or send an email to: j.akhoundali [at] liacs.leidenuniv.nl

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.