Coder Social home page Coder Social logo

allstate's Introduction

allstate

Predict a purchased policy based on transaction history - a data mining competition that was hosted on Kaggle.com

http://www.kaggle.com/c/allstate-purchase-prediction-challenge

Introduction

Data can be downloaded from http://www.kaggle.com/c/allstate-purchase-prediction-challenge/data

Run prepareData.R with training=T (this will produce the training set), then update currentTrainPath at the beginning of the file and run it again with training=F (to produce the test set). When you have both data sets processed load them together and add auxiliary variables (code is at the end of prepareData.R). Then run doSubmission.R to train models and generate a submission CSV.

Preprocessing

Conversion of time and day into periods. Proper representation of missing values in the car_value variable. Fixed observations in which there was a married couple and yet the group size was 1.

Extracted the purchase qoutes from the data set. The rest was processed as follows:

New features added to each observation:

  • age difference
  • mean age
  • T/F if a person is a senior
  • T/F if clients are suspected to be teen + parents
  • T/F if a person is suspected to be a teen
  • T/F if a person is suspected to be a student
  • T/F if clients are suspected to be a young marriage
  • T/F if the car is relatively new
  • T/F if the car is relatively old
  • T/F if the applicant retained the same C option as she had before
  • average cost
  • popularity of the selected option
  • location popularity
  • a few auxiliary variables created as linear combinations of car age, car value and mean age that were selected on the basis on their correlation with coverage options (it wasn't much, but models definitely found these variables helpful)

Time-varying variables:

  • gradient of the cost between quotes
  • cumulative gradient of the cost
  • absolute cumulative gradient of the cost
  • number of coverage option changes
  • T/F did the client had the coverage option A/B/E/F in any of her previous quotes?
  • coverage option A/B/C/D/E/F/G in previous quote
  • T/F if an observation is the last known quote of the client (this variable was not used in predictions, it is only for convenience)

Models

Data was divided customer-wise into 3 folds. Classificators (GBMs) were trained for each coverage option for each fold (21 models in total). Their task was to assess whether a certain coverage option will differ in the final purchase from the current one. On their predictions on the training data another 21 GBMs were trained on the basis of multinomial distribution - their task was to identify which particular coverage option will be the final purchase.

Therefore, for each coverage option A-G an ensemble of 3 classifiers was assesing whether a customer will change her mind regarding that particular option. If it is decided that she will change that option, then another ensemble of 3 GBMs is deciding which option will she change to.

allstate's People

Contributors

smartgamer avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.