Coder Social home page Coder Social logo

clips / cat Goto Github PK

View Code? Open in Web Editor NEW
73.0 3.0 15.0 43 KB

cat🐈: the repo for the paper "Embarrassingly Simple Unsupervised Aspect extraction"

License: GNU General Public License v3.0

Python 100.00%
aspect-based-sentiment-analysis attention-mechanism rbf rbf-kernel sentiment-analysis restaurant-reviews semeval

cat's Introduction

cat🐈

This is the repository for the ACL 2020 paper Embarrassingly Simple Unsupervised Aspect Extraction. In this work, we extract aspects from restaurant reviews with attention that uses RBF kernels.

Authors

Requirements

  • numpy
  • gensim (for training embeddings)
  • sklearn
  • reach (for reading embeddings and vectorizing sentences)
  • pyconll (for reading conll files)
  • tqdm
  • pandas
  • matplotlib

Install these with pip install -r requirements.txt

Using

If you want to apply cat to your data, you need a couple of things.

  1. An aspect set, i.e., the set of labels you would like to predict.
  2. A set of in-domain word embeddings. This is really important, as we show in the paper.
  3. A set of aspect terms which you think correspond to the aspects you want to extract. These do not need to be grouped by their aspect.
  4. A set of instances for which you want to predict the labels you define in step 1. We expect these to be tokenized, one sentence per line.

If you have all these things, you can simply look at example_pipeline/run.py and replace the paths in this file with the paths to the appropriate files/instances. cat🐈 has two hyperparameters: the gamma of the kernel, and the set of aspect words on which the attention is computed.

If you do not have access to pre-trained embeddings or aspect words, but you do have access to in-domain text, you will need a parser to extract either nouns or tree fragments. For maximum portability, we adopt the CoNLLu format, a format that many parsers output. If you use spacy, you can use the spacyconllu script to convert text to CoNLLu format.

To obtain the nouns and embeddings for a given set of text in CoNLLu format, run example_pipeline/preprocessing.py, and replace the paths with the appropriate paths to your CoNLLu parsed file. This will train your embeddings and extract the aspect words, which you can then use in example_pipeline/run.py.

Adapting

If you just want to use or adapt cat🐈 in your own project, check out cat/simple.py. This contains all the relevant code for computing the attention distribution.

Reproducing

You can reproduce the experiments by obtaining the data, putting it in the data/ folder and running the experiments from experiments/. In the paper, we use the SemEval 2014, 2015 and citysearch dataset, which you can do here:

If you extract the text from these XML files and put the tokenized training data in data/, you can rerun our experiments.

Citing

If you use the code or the techniques therein, please cite the paper:

@inproceedings{tulkens2020embarrassingly,
    title = "Embarrassingly Simple Unsupervised Aspect Extraction",
    author = "Tulkens, St{\'e}phan  and  van Cranenburgh, Andreas",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.290",
    doi = "10.18653/v1/2020.acl-main.290",
    pages = "3182--3187",
}

License

GPL-V3

cat's People

Contributors

andreasvc avatar stephantul avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cat's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.