Coder Social home page Coder Social logo

drivendata-pumpitup's Introduction

DrivenData-PumpItUp

This repository contains R code for the Pump it Up: Data Mining the Water Table competition on Driven Data.

The data is provided by Taarifa and the Tanzanian Ministry of Water. The goal is to predict whether a water pump is functional, functional but needs repairs or non functional.

I use H2O's random forest to get a score 0.821. I have uploaded my best (current) submission but not the data. Sign up at Driven Data to download the following files:

  • SubmissionFormat.csv
  • Test set values.csv
  • Training set labels.csv
  • Training set values.csv

Read the data and do some preprocessing

The first step is to read the data and set some values to missing (NA in R): See read-data.md.

Engineer features

The next step is to clean up the features (transform some, remove others) and possibly engineer some new features: See transform-data.md.

Predict status with a random forest

Use a random forest to predict the functionality status of pumps in the test set: See predict-data.md.

Update

Added a Makefile, which spins the R scripts to produces the md files. See how to Build a report based on an R script.

drivendata-pumpitup's People

Contributors

dipetkov avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.