Coder Social home page Coder Social logo

qanom's Introduction

QANom - Annotating Nominal Predicates with QA-SRL

QANom is a research project aiming for a natural representation of nominalization's predicate-argument relations. It extends the Question Answer driven Semantic Role Labeling (QASRL) framework (see website), which tackled verbal predicates, to the more challenging space of deverbal nominalizations.

This repository is the reference point for the data and software described in the paper QANom: Question-Answer driven SRL for Nominalizations (COLING 2020). To find information for replicating the work described by the QANom paper (crowdsourcing a QANom dataset, identifying nominalization candidates, training and evaluating the baseline models), please refer to the paper_reference_readme.md.

The repo also consists software for using QANom downstream. This mainly includes pipelines for easy usage of the nominalization detection model and of the QANom parsers. This README will guide you through using this software.

Pre-requisite

  • Python 3.7

Installation

From pypi: pip install qanom

If you want to install from source, clone this repository and then install requirements:

git clone https://github.com/kleinay/QANom.git
cd QANom
pip install requirements.txt

End-to-End Pipeline

If you wish to parse sentences with QANom, the best place to start is the QANomEndToEndPipeline class from the qanom.qanom_end_to_end_pipeline module.

This pipeline is first running the Nominalization Detector for identifying the nominal predicates in the sentence (see demo). Then, it sends each nominal predicate to the QAnom-Seq2Seq model (see demo) to parse them with Question-Answer driven Semantic Role Labeling (QASRL).

Usage Example

from qanom.qanom_end_to_end_pipeline import QANomEndToEndPipeline
pipe = QANomEndToEndPipeline(detection_threshold=0.75)
sentence = "The construction of the officer 's building finished right after the beginning of the destruction of the previous construction ."
print(pipe([sentence]))

Output:

[[{'QAs': [{'question': 'what was constructed ?',
     'answers': ["the officer 's"]}],
   'predicate_idx': 1,
   'predicate': 'construction',
   'predicate_detector_probability': 0.7623529434204102,
   'verb_form': 'construct'},
  {'QAs': [{'question': 'what began ?',
     'answers': ['the destruction of the']}],
   'predicate_idx': 11,
   'predicate': 'beginning',
   'predicate_detector_probability': 0.8923847675323486,
   'verb_form': 'begin'},
  {'QAs': [{'question': 'what was destructed ?', 
     'answers': ['the previous']}],
   'predicate_idx': 14,
   'predicate': 'destruction',
   'predicate_detector_probability': 0.849774956703186,
   'verb_form': 'destruct'}]]

Nominalization Detection Model

This model identifies "predicative nominalizations", that is, nominalizations that carry an eventive (or "verbal") meaning in context. It is a bert-base-cased pretrained model, fine-tuned for token classification on top of the "nominalization detection" task as defined and annotated by the QANom project.

The model is trained as a binary classifier, classifying candidate nominalizations. The candidates are extracted using a POS tagger (filtering common nouns) and additionally lexical resources (e.g. WordNet and CatVar), filtering nouns that have (at least one) derivationally-related verb. In the QANom annotation project, these candidates are given to annotators to decide whether they carry a "verbal" meaning in the context of the sentence. The current model reproduces this binary classification.

Under the hood, the NominalizationDetector class encapsulates the full nominalization detection pipeline (i.e. candidate extraction + predicate classification). It leverages the qanom.candidate_extraction.candidate_extraction.py module, and additionally downloads and wraps the nominalization-candidate-classifier model, hosted at Huggingface model hub.

Usage Example

from qanom.nominalization_detector import NominalizationDetector
detector = NominalizationDetector()

raw_sentences = ["The construction of the officer 's building finished right after the beginning of the destruction of the previous construction ."]

print(detector(raw_sentences, return_all_candidates=True))
print(detector(raw_sentences, threshold=0.75, return_probability=False))

Outputs:

[[{'predicate_idx': 1,
   'predicate': 'construction',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.7626778483390808,
   'verb_form': 'construct'},
  {'predicate_idx': 4,
   'predicate': 'officer',
   'predicate_detector_prediction': False,
   'predicate_detector_probability': 0.19832570850849152,
   'verb_form': 'officer'},
  {'predicate_idx': 6,
   'predicate': 'building',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.5794129371643066,
   'verb_form': 'build'},
  {'predicate_idx': 11,
   'predicate': 'beginning',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.8937646150588989,
   'verb_form': 'begin'},
  {'predicate_idx': 14,
   'predicate': 'destruction',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.8501205444335938,
   'verb_form': 'destruct'},
  {'predicate_idx': 18,
   'predicate': 'construction',
   'predicate_detector_prediction': True,
   'predicate_detector_probability': 0.7022264003753662,
   'verb_form': 'construct'}]]
[[{'predicate_idx': 1, 'predicate': 'construction', 'verb_form': 'construct'},
  {'predicate_idx': 11, 'predicate': 'beginning', 'verb_form': 'begin'},
  {'predicate_idx': 14, 'predicate': 'destruction', 'verb_form': 'destruct'}]]

SpaCy Custom Component 'nominalization_detector'

If you are using SpaCy, you can easily plug-in our nominalization detection algorithm as a custom component into the SpaCy pipeline. Load the qanom.spacy_component_nominalization_detector module to have our "nominalization_detector" component registered by spacy.

For example:

from qanom.spacy_component_nominalization_detector import *
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("nominalization_detector", after="tagger", 
             config={"threshold": 0.7, "device": -1}) # you may specify config settings or stay with these defaults
# Now you `nlp` pipeline also identifies verbal nominalizations:
doc = nlp("The medical student asked about the progress in Luke's treatment.")
print(doc._.nominalizations)  # a Doc extension attribute with the list of tokens identified as verbal nominalizations
print([(nn.text, nn._.verb_form, nn._.is_nominalization_confidence) for nn in doc._.nominalizations]) # Token extension attributes
[progress, treatment]
[('progress', 'progress', 0.8063599467277527),
 ('treatment', 'treat', 0.8211929798126221)]

QANom Sequence-to-Sequence Models

We have finetuned T5, a pretrained Seq-to-Seq language model, on the task of parsing QANom QAs. Given a sentence and a highlighted nominal predicate, the models produce an output sequence consisting of the QANom-formatted question-answer pairs for this predicate.

We currently have two models:

  • qanom-seq2seq-model-baseline (HF repo) - trained only on the QANom dataset. Performance: 57.6 Unlabled Arg F1, 34.9 Labeled Arg F1.
  • qanom-seq2seq-model-joint (HF repo) - trained jointly on the QANom and verbal QASRL. Performance: 60.1 Unlabled Arg F1, 40.6 Labeled Arg F1.

We provide the QASRL_Pipeline class (at `qanom.qasrl_seq2seq_pipeline) which is a Huggingface Pipeline for applying the models out-of-the-box on new texts:

from pipeline import QASRL_Pipeline
pipe = QASRL_Pipeline("kleinay/qanom-seq2seq-model-baseline")
pipe("The student was interested in Luke 's <predicate> research about see animals .", verb_form="research", predicate_type="nominal")

Which will output:

[{'generated_text': 'who _ _ researched something _ _ ?<extra_id_7> Luke', 
  'QAs': [{'question': 'who researched something ?', 'answers': ['Luke']}]}]

You can learn more about using transformers.pipelines in the official docs.

Notice that you need to specify which word in the sentence is the predicate, about which the question will interrogate. By default, you should precede the predicate with the <predicate> symbol, but you can also specify your own predicate marker:

pipe("The student was interested in Luke 's <PRED> research about see animals .", verb_form="research", predicate_type="nominal", predicate_marker="<PRED>")

In addition, you can specify additional kwargs for controling the model's decoding algorithm:

pipe("The student was interested in Luke 's <predicate> research about see animals .", verb_form="research", predicate_type="nominal", num_beams=3)

Cite

@inproceedings{klein2020qanom,
 title={QANom: Question-Answer driven SRL for Nominalizations},
 author={Klein, Ayal and Mamou, Jonathan and Pyatkin, Valentina and Stepanov, Daniela and He, Hangfeng and Roth, Dan and Zettlemoyer, Luke and Dagan, Ido},
 booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
 pages={3069--3083},
 year={2020}
}

qanom's People

Contributors

kleinay avatar jmamou avatar yuvalkry avatar rubenwol avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.