Coder Social home page Coder Social logo

srlscore's Introduction

Evaluating Factual Consistency of Texts with Semantic Role Labeling

Jing Fan*, Dennis Aumiller*, and Michael Gertz
Institute of Computer Science, Heidelberg University
* These authors contributed equally to this work.

You can reach us via the Github issues, or write us a mail to [email protected]!

2023-05-23: A pre-print of our work is now available on arXiv.
2023-05-15: Our work has been accepted at *SEM 2023! We will update the citation once the proceedings become available.

Installation

We provide an exhaustive list of required packages through the requirements.txt file. However, given the finicky dependency issues surrounding the (nowadays deprecated) AllenNLP release, as well as the spaCy versions required, we strongly suggest creating a new environment in which to install this package.

You can install the required core dependencies with

python3 -m pip install -r requirements.txt

This works (guaranteed) for Python versions 3.8 and 3.9; we do not guarantee a full compatibility with 3.10. Furthermore, we encountered some (temporary?) issues regarding the dependency on typing-extensions==4.6.0, respectively pydantic. More information can be found in this Github issue. Should you encounter a similar problem, consider manuall downgrading your typing-extensions version to typing-extensions==4.5.0.

Usage

The general usage of our metric SRLScore is as follows:

from SRLScore import SRLScore

# Default values are reasonable for most cases
scorer = SRLScore()

scorer.score(input_text, summary_text)

You can also see the example_usage.py file. Note that SRLScore heavily relies on annotations generated by a (neural) SRL tagger. This means that, if you have a GPU available, the processing time should be significantly faster.

Experimental Results & Data from the Paper

To repeat experiments that we performed, you may run the eval.sh script in this folder. We further experimented with leaeve-one-argument-out variants of our weights, which is documented in eval_leave_out-exp.sh.

Scripts to reproduce the baseline scores (particularly for BARTScore and CoCo, the two most competitive methods with implementations available), can be found in baselines/. For CoCo, you may further need to clone the respective paper's code repository, copy our coco_commands.sh script in their main folder, and run from there.

significance_testing.py will re-compute the significance of differences between various methods. Note that we apply Bonferroni correction, which makes the significance threshold fairly small!

Citation

If you found this repository helpful, please consider citing our work:

@article{fan-etal-2023-evaluating,
  title={{Evaluating Factual Consistency of Texts with Semantic Role Labeling}}, 
  author={Jing Fan and Dennis Aumiller and Michael Gertz},
  journal={CoRR},
  volume={abs/2305.13309},
  year={2023},
  eprint={2305.13309},
  eprinttype={arXiv},
  primaryClass={cs.CL}
}

srlscore's People

Contributors

dennlinger avatar heyjing avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

srlscore's Issues

Provide Repository description

Since I cannot modify the description myself, it would be great if you can adjust the description (appearing in the top right of the repository, as well as when searching for it).
In my mind, we could add something like

"An SRL-based approach to factuality estimation [for text summarization]", where the part in [] is optional.

Thanks in advance!

Optimize performance for main use case

As discussed in our future work section, a key limitation is currently the wall clock time for a single instance.
Given that we (likely) do not want to overly use the co-reference component (marginal gains in most scenarios, with heavily increased computation).

Instead, we could provide the following:

  • A stable release for the proposed system, replicating the results either with/without co-reference resolution.
  • From there on, we can prune the current implementation to only support the (probably preferable) SRL standalone system.
    • Remove the tuple explosion algorithm.
    • Move the processor to the SRLScore __init__ function (primarily for model loading times; even with lru caching, I'm suspicious of the performance).
    • Remove Goodrich method and other baselines from main class.
  • Include test suite with differing samples (particularly for varying input/output length combinations.
  • Analyze performance profile to identify any other potential bottlenecks.
  • Provide proper packaging and setup.py.

Dependency Issue

python3 -m pip install -r requirements.txt

ERROR: Cannot install -r requirements.txt (line 2), -r requirements.txt (line 8) and spacy<=3.5 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested spacy<=3.5
    allennlp 2.10.1 depends on spacy<3.4 and >=2.1.0
    en-core-web-sm 3.4.1 depends on spacy<3.5.0 and >=3.4.0

Fix version dependencies

Given the complicated way of installing all necessary dependencies, we should provide a fixed version dependency script, that ensures a running version is installed.

Could also be done via Docker.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.