Coder Social home page Coder Social logo

imagoodman-aa / explainable_gec Goto Github PK

View Code? Open in Web Editor NEW

This project forked from lorafei/explainable_gec

0.0 0.0 0.0 5.44 MB

The official code of the 2023 ACL paper "Enhancing Grammatical Error Correction Systems with Explanations"

Home Page: https://arxiv.org/pdf/2305.15676.pdf

Python 100.00%

explainable_gec's Introduction

Explainable Grammmar Error Correction

This repository provides the dataset and code for the explainable grammmar error correction task as reported in this paper:

Enhancing Grammatical Error Correction Systems with Explanations

Yuejiao Fei∗, Leyang Cui†, Sen Yang, Wai Lam, Zhenzhong Lan, Shuming Shi

2023 The 61st Annual Meeting of the Association for Computational Linguistics(ACL)(Oral)(PDF)

🚀Introduction

To help language learners understand why the Grammmar Error Correction system makes a certain correction, we present EXPECT, a large dataset with evidence words and grammatical error types labeled.

We also put forward several robust benchmarks for this task.

📄Examples

Examples of each error type and corresponding evidence words in EXPECT.

📉Dataset Statistics

Train Dev Test Outputs
Number of sentences 15,187 2,413 2,416 1001
Number of words 435,503 70,111 70,619 27,262
Avg. w.p.s 28.68 29.06 29.23 27.23
With evidence rate 74.15 59.10 59.77 72.73
Total evidence words 29,187 4,280 4,340 1736
Avg. evidence w.p.s 2.59 3.00 3.01 2.38

📝 Dataset Description

data/json/train.json is the training set. data/json/dev.json and data/json/test.json are the dev set and test set, which are splited from the dev set of W&I+LOCNESS. The format of the samples is shown below:

{
"target": ["It", "has", "a", "high", "-", "density", "population", "because", "of", "its", "small", "territory", "."], 
"source": ["It", "has", "a", "high", "-", "density", "population", "because", "[NONE]", "its", "small", "territory", "."], 
"correction_index": [8, 22], 
"evidence_index": [7, 9, 10, 11, 21, 23, 24, 25], 
"error_type": "Preposition", 
"predicted_parsing_order": {"1": 3, "5": 2, "7": 2, "8": 1, "9": 3, "10": 2, "15": 3, "19": 2, "21": 2, "22": 1, "23": 3, "24": 2}, 
"origin": "A"
}
  1. "target" is the corrected sentence, with the corrected words indexed with "correction_index".
  2. "source" is the erroneous sentence, with the erroneous words also indexed with "correction_index".
  3. "correction_index" is the index of corrected and erroneous words, where the target and source sentences are concatenated with a separation token in between.
  4. "evidence_index" is the index of evidence words for both the target and source sentences, where the target and source sentences are concatenated with a separation token in between.
  5. "error_type" is the class of the error type.
  6. "predicted_parsing_order" represents first-order dependent words (Number 2) and second-order dependent words (Number 3) in the dependency parse tree. Number 1 represents the position of the corrections.
  7. "origin" is the learner's CEFR proficiency, which corresponds to W&I+LOCNESS.

📥 How to Use the Data

We provide a script for processing the raw data into NER input format. Specify data_file and save_file in read_jsonl.py .

python read_jsonl.py --data_file data/json/test.json --save_file data/ner/train.pkl

And use the processed data as the input of the model.

📈 Train and Evaluate Models

  1. To train and evaluate Labeling-based Error+Correction model
python run.py cfgs/train_error_correction.py
python run.py cfgs/eval_error_correction.py
  1. To train and evaluate Labeling-based Error+Correction+CE model
python run.py cfgs/train_error_correction_ce.py
python run.py cfgs/eval_error_correction_ce.py
  1. To train and evaluate Labeling-based Error+Correction+CE+Syntax model
python run.py cfgs/train_error_correction_ce_syntax.py
python run.py cfgs/eval_error_correction_ce_syntax.py

📜 License

This dataset is released under the MIT License.

📚 Citation

If you use this dataset in your research, please cite it as follows:

@inproceedings{fei-cui-2023-enhancing,
title = "Enhancing Grammatical Error Correction Systems with Explanations",
author = “Fei, Yuejiao  and
   Cui, Leyang and
   Yang, Sen and
   Lam, Wai and
   Lan, Zhenzhong and
   Shi, Shuming”,
month = jul,
year = "2023”,
address = “Toronto, Canada”,
publisher = "Association for Computational Linguistics",
}

explainable_gec's People

Contributors

lorafei avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.