Coder Social home page Coder Social logo

cafm's Introduction

CAFM

Code for Context Aware Factorization Machine (CAFM), as described in ACM MM17 paper: How Personality Affects our Likes: Towards a Better Understanding of Actionable Images

CAFM can run with any of:

  • rating information and context features (as in this paper) - sparse
  • user features (e.g. demographics, personality traits) - dense
  • item features (e.g. sentiment, concepts) - dense

For data size limitation, input data is not provided in this page. See instruction below for how to download the data and run the code.

REQUIREMENTS

Required Python packages: tensorflow, sklearn, numpy, h5py Note: computation on cpu can be slow.

DOWNLOAD DATA

Download the data input here Unpack the folder data/ in the project folder (same level as CAFM.py). data/ can be used either to replicate the experiments in the paper, or to discover the input data format for new inputs.

RUNNING THE CODE

After installing the required Python packages and downloading the input data, run the code with:

python CAFM.py (ratings&context only) or python CAFM.py --user_ft 1 (ratings&context and user personality traits)

Additional parameters (batch size, learning rate, etc.) can be listed running: python CAFM.py --help

INCLUDING ITEM FEATURES

Due to file size, image dense features (distribution over sentiment visual concepts) are here not included. In order to run the code with image sentiment features, please create and include the two files in the project folder:

data/training/item_dense.h5 data/testing/item_dense.h5

Each of these need to be a hdf5 file with a single dataset named "output". Such dataset is a floating-point multidimensional array of size (num_instances, feature_dimensionality). In case of sentiment features, feature_dimensionality is 4342 if the English Visual Sentiment Ontology is used. The oder of the instances should match the order in image_index.txt

In order to replicate the experiments in the paper, please crawl the raw image files with Twitter API (list provided in data/training/image_index.txt and data/testing/image_index.txt), extract the sentiment features using the English concept detector (caffe model can be download here: mvso.cs.columbia.edu) and create a hdf5 file as explained above.

It can happen that some of the image tweets cannot be crawled (e.g. user was deleted or images were removed). In that case, the missing tweets must be removed from the ratings&context files as well. For that, follow these instructions below.

HOW TO UPDATE CONTEXT DATASET

In case some image tweets are missing you may need to update the ratings&context dataset. The ratings&context dataset is a sparse representation and the libFM format is used. If image tweet TWEET_ID is missing, please search each line with TWEET_ID in data/context/index_tr.csv and data/context/index_ts.csv. Each of these lines correspond to a rating (either positive of negative sample) that a user did on a missing image tweet. Such lines must be eliminated both from the .csv and the corresponding .libfm file in the same folder. If such lines are not removed, CAFM.py --item_ft 1 will search for the item features for such missing images and will crash because not able to find them.

You can finally run the code with: python CAFM.py --item_ft 1

cafm's People

Contributors

gellifrancesco avatar

Stargazers

 avatar Shubham Pachori avatar  avatar

Watchers

James Cloos avatar Ke Ma avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.