Coder Social home page Coder Social logo

ice's Introduction

ICE: Item Concept Embedding via Textual Information

1. Introduction

The ICE toolkit is designed to embed the concepts of items into an embedding representation such that the resulted embeddings can be compared in terms of overall conceptual similarity regardless of item types (ICE: Item Concept Embedding via Textual Information, SIGIR 2017). For example, a song can be used to retrieve conceptually similar songs (homogeneous) as well as conceptually similar concepts (heterogeneous).

In specific, ICE incorporates items and their representative concepts (words extracted from the item's textual information) using a heterogeneous network and then learns the embeddings for both items and concepts in terms of the shared concept words. Since items are defined in terms of concepts, adding expanded concepts into the network allows the learned embeddings to be used to retrieve conceptually more diverse and yet relevant results.

1.1. System Requirements

  • gcc 6.4
  • python3
  • cython

1.2. Getting Started

Download:

$ git clone https://github.com/cnclabs/ICE
$ cd ./ICE/ICE

Compile command line interface:

$ make ice

Compile python3 API:

This is an alternative way to use the toolkit via its APIs. For the usage, please refer to Section 2.2.2.

$ make python

[Note: The API is only tested with Python 3.]

2. Usages

2.1. ICE Network Construction

Users need to provide an entity-text network and a text-text network to construct an ICE network. For more details, please refer to our paper.

Entity-text network format: "item word weight"

Toy_Story toys 1
Toy_Story stuffed_animals 1
Star_Wars jedi 1
Star_Wars rebel 1

Text-text network format: "word word weight"

toys toys 1
toys stuffed_animals 1
stuffed_animals toys 1
stuffed_animals stuffed_animals 1
jedi jedi 1
rebel rebel 1
Run:
$ python3 construct_graph.py -et ../data/movie_et.edge -tt ../data/movie_tt.edge -ice movie_ice.edge
Parameters:
    -et <string>, --et_network <string>
        Input Entity-text Network
    -tt <string>, --tt_network <string>
        Input Text-text Network
    -ice <string>, --ice_network <string>
        Output ICE Network

For sample files, please see data/movie_et.edge and data/movie_tt.edge.

2.2. ICE Embedding Learning

2.2.1 Command line interface usage

Run:
./ice -train movie_ice.edge -save movie.embd -dim 4 -sample 10 -neg 5 -thread 1 -alpha 0.025
Parameters:
Options:
    -train <string>
        Path to the network used for embedding learning
    -save <string>
        Path to save the embedding file
    -dim <int>
        Dimension of embedding; default is 64
    -neg <int>
        Number of negative examples; default is 5
    -sample <int>
        Number of training samples *Million; default is 10
    -thread <int>
        Number of training threads; default is 1
    -alpha <float>
        Initial learning rate; default is 0.025

2.2.2 python3 API usage

After compiling, please use python3 example.py for running the following codes.

from pyICE import pyICE

ice = pyICE()
network = {
    'MAYDAY': {'Taiwanese': 1, 'rock': 1,'band': 1},
    'MAYDAY@': {'Taiwanese': 1, 'rock': 1, 'band': 1},
    'Sodagreen': {'Taiwanese': 1, 'indie': 1, 'pop_rock': 1, 'band': 1},
    'SEKAI_NO_OWARI': {'Japanese': 1, 'indie': 1, 'pop_rock': 1, 'band': 1},
    'The_Beatles': {'England': 1, 'rock': 1, 'pop': 1}
}
ice.load_dict(network)
ice.init(dimension=4)
ice.train(sample=11, neg=5, alpha=0.025, workers=1)
ice.save_weights(model_name='example.embd')

3. Experimental Results

Here, we report the average performance based on 10 embeddings trained under the same setting. For more details, please refer to our paper.

  • IMDB word-to-movie retrieval task:
    • Graph construction: 20 representative words per item and 5 expanded words per representative word.
    • Embedding learning: dim=256, sample=200, neg=2
Genre Horror Thriller Western Action Short Sci-Fi Average
Precision@50 0.322 0.206 0.318 0.449 0.100 0.386 0.297
Precision@100 0.316 0.203 0.281 0.423 0.080 0.382 0.281

4. Citation

@inproceedings{Wang:2017:IIC:3077136.3080807,
    author = {Wang, Chuan-Ju and Wang, Ting-Hsiang and Yang, Hsiu-Wei and Chang, Bo-Sin and Tsai, Ming-Feng},
    title = {ICE: Item Concept Embedding via Textual Information},
    booktitle = {Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval},
    series = {SIGIR '17},
    year = {2017},
    isbn = {978-1-4503-5022-8},
    location = {Shinjuku, Tokyo, Japan},
    pages = {85--94},
    numpages = {10},
    url = {http://doi.acm.org/10.1145/3077136.3080807},
    doi = {10.1145/3077136.3080807},
    acmid = {3080807},
    publisher = {ACM},
    address = {New York, NY, USA},
    keywords = {concept embedding, conceptual retrieval, information network, textual information},
} 

ice's People

Contributors

guacore avatar chihming avatar lonsilent avatar h324yang avatar jerewang avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.