Coder Social home page Coder Social logo

imloo / kdd2013authorpaperidentification Goto Github PK

View Code? Open in Web Editor NEW

This project forked from benhamner/kdd2013authorpaperidentification

0.0 1.0 0.0 89 KB

Benchmark and sample code for the Author Paper Identification Challenge on Kaggle, a part of the 2013 KDD Cup

License: Other

kdd2013authorpaperidentification's Introduction

KDD Cup 2013 - Author Paper Identification Challenge

This repo contains a benchmark and sample code in Python for the Author Paper Identification Challenge, a machine learning challenged hosted by Kaggle and organized by Microsoft Research in conjunction with the 2013 KDD Cup Committee and Kaggle.

It also contains the transformation code used to create the competition data files from the raw data in the Transform directory. This code is provided for your information only (and does not need to be looked at or run by competition participants).

This version of the repo contains the Basic Coauthor Benchmark. It adds a coauthor-based feature to the Basic Python Benchmark. Future benchmarks may be included here as well and will be marked with git tags.

This benchmark is intended to provide a simple example of reading the data and creating the submission file, not as a state of the art benchmark on this problem.

Executing this benchmark requires Python 2.7 along with PostgreSQL 9.2, the Python package sklearn version 0.13, and psycopg2 version 2.4.6 (other versions may work, but this has not been tested).

To run the benchmark,

  1. Download data.postgres from the competition page. This contains the dataset as a PostgreSQL backup (as an alternative format, the data are provided as csv files as well, but these are not used in this benchmark).

  2. Restore the backup to your local Postgres database. This can be done by creating a new database named Kdd2013AuthorPaperIdentification and then running the following command:

    pg_restore -Fc -U postgres -d Kdd2013AuthorPaperIdentification dataRev2.postgres

  3. Switch to the "PythonBenchmark" directory

  4. Modify SETTINGS.json to include the login information to the PostgreSQL database, as well as a place to save the trained model and a place to save the submission

  5. Train the model by running python train.py

  6. Make predictions on the validation set by running python predict.py

  7. Make a submission with the output file

This benchmark took less than 10 minutes to execute on a Windows 8 laptop with 8GB of RAM and 4 cores at 2.7GHz.

kdd2013authorpaperidentification's People

Contributors

benhamner avatar bolo1729 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.