Evaluating Factual Consistency of Texts with Semantic Role Labeling

Jing Fan^*, Dennis Aumiller^*, and Michael Gertz
Institute of Computer Science, Heidelberg University
^* These authors contributed equally to this work.

You can reach us via the Github issues, or write us a mail to [email protected]!

2023-05-23: A pre-print of our work is now available on arXiv.
2023-05-15: Our work has been accepted at *SEM 2023! We will update the citation once the proceedings become available.

Installation

We provide an exhaustive list of required packages through the requirements.txt file. However, given the finicky dependency issues surrounding the (nowadays deprecated) AllenNLP release, as well as the spaCy versions required, we strongly suggest creating a new environment in which to install this package.

You can install the required core dependencies with

python3 -m pip install -r requirements.txt

This works (guaranteed) for Python versions 3.8 and 3.9; we do not guarantee a full compatibility with 3.10. Furthermore, we encountered some (temporary?) issues regarding the dependency on typing-extensions==4.6.0, respectively pydantic. More information can be found in this Github issue. Should you encounter a similar problem, consider manuall downgrading your typing-extensions version to typing-extensions==4.5.0.

Usage

The general usage of our metric SRLScore is as follows:

from SRLScore import SRLScore

# Default values are reasonable for most cases
scorer = SRLScore()

scorer.score(input_text, summary_text)

You can also see the example_usage.py file. Note that SRLScore heavily relies on annotations generated by a (neural) SRL tagger. This means that, if you have a GPU available, the processing time should be significantly faster.

Experimental Results & Data from the Paper

To repeat experiments that we performed, you may run the eval.sh script in this folder. We further experimented with leaeve-one-argument-out variants of our weights, which is documented in eval_leave_out-exp.sh.

Scripts to reproduce the baseline scores (particularly for BARTScore and CoCo, the two most competitive methods with implementations available), can be found in baselines/. For CoCo, you may further need to clone the respective paper's code repository, copy our coco_commands.sh script in their main folder, and run from there.

significance_testing.py will re-compute the significance of differences between various methods. Note that we apply Bonferroni correction, which makes the significance threshold fairly small!

Citation

If you found this repository helpful, please consider citing our work:

@article{fan-etal-2023-evaluating,
  title={{Evaluating Factual Consistency of Texts with Semantic Role Labeling}}, 
  author={Jing Fan and Dennis Aumiller and Michael Gertz},
  journal={CoRR},
  volume={abs/2305.13309},
  year={2023},
  eprint={2305.13309},
  eprinttype={arXiv},
  primaryClass={cs.CL}
}

heyjing / srlscore Goto Github PK

srlscore's Introduction

Evaluating Factual Consistency of Texts with Semantic Role Labeling

Installation

Usage

Experimental Results & Data from the Paper

Citation

srlscore's People

Contributors

Stargazers

Watchers

srlscore's Issues

Provide Repository description

Optimize performance for main use case

Dependency Issue

Fix version dependencies

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent