This is a simple solution of 2012 KDD Cup Track 1, which implemented Latent Factor Model by using Stochastic Gradient Descent algorithm, and most idea is came from 2.2 and 3.1 sections of paper Context-aware Ensemble of Multifaceted Factorization Models for Recommendation Prediction in Social Networks.

Run

For saving your time, I strongly recommend you to install PyPy which is roughly three times faster than CPython in my test.

Change config.py file to tell the program where to find the datasets.
./run.sh
Press Ctrl-C in terminal whenever you want to end the training loop.

Dataset

There are four datasets needed for running:

I have made some little changes to the orignal datasets:

remove header from each file
replace separator from \t (tab) to , (comma)

If you download datasets from above links, you will found some .lrz files and you need use lrzip to uncompress.

# install `lrzip`
apt-get install lrzip 
# if you are OSX user, run below command to install `lrzip`
# brew install lrzip

lrzip -d *.lrz

Running log

Getting summary of training dataset...
======================== Summary of 'rec_log_train.csv' ========================
Users: 1392873  Items: 4710     Users/Items: 295.73
+1: 5253828     -1: 67955449    +1/-1: 0.08
Begin time: 1318348785  End time:1321027199     Interval: 2678414s = 744.00 h = 31.00 d
================== Distribution of user active time (in hour) ==================
 00: |
 01: |
 07: |
 08: ||
 09: |||
 10: |||
 11: |||
 12: |||
 13: |||
 14: |||
 15: |||
 16: |||
 17: |||
 18: |||
 19: |||
 20: |||
 21: |||
 22: |||
 23: ||
Getting summary of user profile...
============================= Distribution of age ==============================
  0: |
  1: |
  2: |
  3: |
 12: |
 13: |
 14: ||
 15: ||
 16: ||
 17: ||
 18: ||
 19: ||
 20: |||
 21: |||
 22: ||||
 23: |||
 24: |||
 25: |||
 26: ||
 27: ||
 28: |
 29: |
 30: |
 31: |
 32: |
 33: |
============================ Distribution of gender ============================
  0: |
  1: |||||||||||||||||||||||||
  2: ||||||||||||||||||||||||
============================ Distribution of tweet =============================
  0: ||||
  1: |
  2: |
  3: |
  4: |
  5: |
  6: |
  7: |
  8: |
  9: |
 10: |
 11: |
 12: |
 13: |
 14: |
============================= Distribution of tags =============================
  1: ||||||||||||||||||||||||||||||||||||
  2: |
  3: |
  4: |
  5: |
  6: |
  7: |
  8: |
  9: ||
 10: ||||
Preprocessing...
Training...
init LFM...             26.158s
408th trainning used 21.1ss     |e[u][i]| = 0.251114^C
Exit program after finish current work!
409th trainning used 22.6s      |e[u][i]| = 0.251115
predict and write result...             500.564s
Converting predicted result to submission format...
convert predict result to dict...               155.102s
convert to submission format...                 50.497s
Computing mAP@3...
 Public rank: 412       mAP@3: 0.31774
Private rank: 422       mAP@3: 0.30857

timor1988 / kdd_2012_track1 Goto Github PK

kdd_2012_track1's Introduction

Run

Dataset

Running log

kdd_2012_track1's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent