Coder Social home page Coder Social logo

gec-ranking's Introduction

Ground Truth for Grammatical Error Correction Metrics

This repository contains a python implementation of the GLEU metric (General Language Evaluation Understanding), which can be used for any monolingual "translation" task. It also contains human rankings of the CoNLL-14 Shared Task system output as well as scripts to evaluate the rankings to extract an absolute system ranking.

These results were described in the ACL 2015 paper:

Ground Truth for Grammatical Error Correction Metrics by Courtney Napoles, Keisuke Sakaguchi, Joel Tetreault, and Matt Post

Please cite this work when using this data or the GLEU metric.

@InProceedings{napoles-EtAl:2015:ACL-IJCNLP,
  author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Post, Matt  and  Tetreault, Joel},
  title     = {Ground Truth for Grammatical Error Correction Metrics},
  booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
  month     = {July},
  year      = {2015},
  address   = {Beijing, China},
  publisher = {Association for Computational Linguistics},
  pages     = {588--593},
  url       = {http://www.aclweb.org/anthology/P15-2097}
}

GLEU Update

As of May 2, 2016, we have identified a problem with the GLEU metric as the number of references increases. To resolve this issue, we made a minor adjustment to the metric so that it no longer has a tunable weight and is reliable using any number of reference sets. This update to GLEU is reflected in scripts/compute_gleu and scripts/gleu.py. The original GLEU scripts can be found in scripts/original_gleu/. We do not recommend using the original GLEU code. The new GLEU should be used instead.

The changes to GLEU and updated results to our ACL 2015 paper are described in the eprint, GLEU Without Tuning. The citation for the updated metric is

@Article{napoles2016gleu,
  author    = {Napoles, Courtney  and  Sakaguchi, Keisuke  and  Post, Matt  and  Tetreault, Joel},
  title     = {{GLEU} Without Tuning},
  journal   = {eprint arXiv:1605.02592 [cs.CL]},
  year      = {2016},
  url       = {http://arxiv.org/abs/1605.02592}
}

Instructions

1. Obtain the raw system output

The rankings found in the gec-ranking-data correspond to the 12 system outputs from the CoNLL-14 Shared Task on Grammatical Error Correction, which can be downloaded from http://www.comp.nus.edu.sg/~nlp/conll14st.html.

Human judgments are located in gec-ranking/data.

2. Run TrueSkill

To get the human rankings, run TrueSkill (which can be downloaded from https://github.com/keisks/wmt-trueskill) on all_judgments.csv, following the instructions in the TrueSkill readme.

3. Calculate metric scores

GLEU is included in gec-ranking/scripts. To obtain the GLEU scores for system output, run the following command:

./compute_gleu -s source_sentences -r reference [reference ...] \
        -o system_output [system_output ...] -n 4 -l 0.0

where each file contains one sentence per line. GLEU can be run with multiple references. To get the GLEU scores of multiple outputs, include the path to each system output file. GLEU was developed using Python 2.7.

I-measure scores were taken from Felice and Briscoe's 2015 NAACL paper, Towards a standard evaluation method for grammatical error detection and correction. The I-measure scorer can be downloaded from https://github.com/mfelice/imeasure.

M2 scores were calculated using the official scorer (3.2) of the CoNLL-2014 Shared Task (http://www.comp.nus.edu.sg/~nlp/sw/).


Errata

There was an error in the calculation of the GLEU denominator, which was corrected in the 10 March 2016 commit.


Please contact Courtney Napoles (courtneyn[at]jhu[dot]edu) or Keisuke Sakaguchi (keisuke[at]cs[dot]jhu[dot]edu) with any questions.

Last updated 10 May 2016

gec-ranking's People

Contributors

cnap avatar keisks avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.