Coder Social home page Coder Social logo

speciteller's Introduction

SPECITELLER is a tool to predict sentence specificity.

The models in this package are obtained using co-training as described in Li and Nenkova, Fast and Accurate Prediction of Sentence Specificity, AAAI 2015.

Dependencies

Speciteller is implemented using Python 2.7. It depends on the following packages:

  • numpy
  • liblinear (in particular, liblinearutil.py; be sure you have a liblinear.so.<x> file in its python/ directory. If not, type make in python/)

Data and resources

Word lexicons for the models are available for download here. Please note that these resources come with license(s). Decompress the tar ball under this (i.e., the speciteller) directory.

Running Speciteller

Call:

$ python speciteller.py --inputfile inputfile --outputfile predfile
  • <inputfile> should consists of word-tokenized sentences, one sentence per line;
  • <predfile> will be the destination file which Speciteller will write the specificity scores to, one score per line in the same order as sentences in <inputfile>.
  • An optional argument is --write_all_preds. When flagged this will generate two addtional files: <predfile>.s (prediction from the shallow model) and <predfile>.w (prediction from the word representation model).

For example:

$ python speciteller.py --inputfile sents_test --outputfile test.probs

This will give you specificity scores for the two sentences in sents_test in test.probs.

The scores range from 0 to 1, with 0 being most general and 1 being most specific.

Practical notes

  • It is best that you word-tokenize your sentences. If you don't, you will still get a score, but less good (~4% less accurate if you translate them into labels with a cutoff at 0.5).

  • Note that the word embedding file is a compressed ~190mb .gz file. Each run of speciteller.py will load the file to generate features. Thus it is best to avoid loading it multiple times, or modify predict.py and tailor it for your data loading purpose.

Citation and contact

Please cite the following paper:

Junyi Jessy Li and Ani Nenkova. 2015. Fast and Accurate Prediction of Sentence Specificity. Twenty-Ninth Conference on Artificial Intelligence (AAAI). [bibtex]

Please send comments and feedback to Jessy Li.

speciteller's People

Contributors

jjessyli avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.