Coder Social home page Coder Social logo

leam's Introduction

LEAM

This repository contains source code necessary to reproduce the results presented in the paper Joint Embedding of Words and Labels for Text Classification (ACL 2018):

@inproceedings{wang_id_2018_ACL,
  title={Joint Embedding of Words and Labels for Text Classification},
  author={Guoyin Wang, Chunyuan Li, Wenlin Wang, Yizhe Zhang, Dinghan Shen, Xinyuan Zhang, Ricardo Henao, Lawrence Carin},
  booktitle={ACL},
  year={2018}
}

Comparison Illustration of proposed LEAM with traditional methods for text sequence representations

Traditional Method           LEAM: Label Embedding Attentive Model
Directly aggregating word embedding V for text sequence representation z We leverage the “compatibility” G between embedded words V and labels C to derive the attention score β for improved z.

Contents

There are four steps to use this codebase to reproduce the results in the paper.

  1. Dependencies
  2. Prepare datasets
  3. Training
    1. Training on standard dataset
    2. Training on your own dataset
  4. Reproduce paper figure results

Dependencies

This code is based on Python 2.7, with the main dependencies being TensorFlow==1.7.0 and Keras==2.1.5. Additional dependencies for running experiments are: numpy, cPickle, scipy, math, gensim.

Prepare datasets

We consider the following datasets: Yahoo, AGnews, DBPedia, yelp, yelp binary. For convenience, we provide pre-processed versions of all datasets. Data are prepared in pickle format. Each .p file has the same fields in same order: train text, val text, test text, train label, val label, test label, dictionary and reverse dictionary.

Datasets can be downloaded here. Put the download data in data directory. Each dataset has two files: tokenized data and corresponding pretrained Glove embedding.

To run your own dataset, please follow the code in preprocess_yahoo.py to tokenize and split train/dev/test datsset. To build pretrained word embeddings, first download Glove word embeddings and then follow glove_generate.py.

Training

1. Training on standard dataset

To run the test, use the command python -u main.py. The default test is on Yahoo dataset. To run other default datasets, change the [Option class] attribute dataset to corresponding dataset name. Most the parameters are defined in the Option class part.

Reproduce paper figure results

Jupyter notebooks in plots folders are used to reproduce paper figure results.

Note that without modification, we have copyed our extracted results into the notebook, and script will output figures in the paper. If you've run your own training and wish to plot results, you'll have to organize your results in the same format instead.

leam's People

Contributors

chunyuanli avatar guoyinwang avatar jianqiaol avatar

Watchers

James Cloos avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.