Coder Social home page Coder Social logo

re1st's Introduction

Relation Extraction First - Using Relation Extraction to Identify Entities

Code for AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First - Using Relation Extraction to Identify Entities

Table of Contents

  1. Data
  2. Dependencies
  3. Reproducing Results (+ link to trained models)
  4. Training
  5. Relevant Code per Section in Paper
  6. Bibtex for Citing

Data

The task data can be found here. To use it with the code in this repo, place the training, dev, test (json-)files into the corresponding folders in the data directory.

Dependencies

Dependencies:

  • torch (1.8.0)
  • transformers (4.18.0)
  • tqdm (4.64.0)
  • wandb (0.12.16)
  • pylatexenc (2.10)
python3 -m venv venv
source venv/bin/activate
python -m pip install --upgrade pip
python -m pip install wheel
python -m pip install -U setuptools
python -m pip install torch==1.8.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install transformers tqdm wandb pylatexenc

Reproducing Results

Checkpoints for 4 trained models can be found here. Commands for using the models with k=400:

python eval.py --checkpoint checkpoints/max_no_preprocessing.pt --pooling max --k_mentions 400
python eval.py --checkpoint checkpoints/mean_no_preprocessing.pt --pooling mean --k_mentions 400
python eval.py --checkpoint checkpoints/max_latex2text.pt --pooling max --preprocessing latex2text --k_mentions 400
python eval.py --checkpoint checkpoints/mean_latex2text.pt --pooling mean --preprocessing latex2text --k_mentions 400

Training

Example of training a model with max pooling and no preprocessing:

python train.py --learning_rate 7e-5 --seed_model 3 --num_epochs 60 --k_mentions 50 --pooling max --candidate_downsampling 1000

Example of training a model with max pooling and latex2text preprocessing:

python train.py --learning_rate 5e-5 --seed_model 1 --num_epochs 60 --k_mentions 50 --pooling max --candidate_downsampling 1000 --preprocessing latex2text

Relevant Code per Section in Paper

Section 3.1 Input Encoding

Covered in models/data.py (lines 126-147)

Section 3.2 Soft Mention Detection

Covered in models/base_model.py (lines 68-127)

Section 3.3 Relation Extraction

Covered in models/base_model.py (lines 129-151)

Section 3.4 Entity Type Classification

Covered in models/base_model.py (lines 232-244)

Cite

If you use the code in this repo, please cite this paper:

@inproceedings{popovic_semeval_2022, 
 title = "AIFB-WebScience at SemEval-2022 Task 12: Relation Extraction First - Using Relation Extraction to Identify Entities", 
 author = "Popovic, Nicholas and Laurito, Walter and Färber, Michael",
 booktitle = "Proceedings of the 16th International Workshop on Semantic Evaluation ({S}em{E}val-2022)", 
 year = "2022", 
 publisher = "Association for Computational Linguistics"
}

re1st's People

Contributors

nicpopovic avatar

Stargazers

 avatar  avatar  avatar Michael Färber avatar  avatar

Watchers

 avatar

re1st's Issues

About the code

Thanks for your amazing work!
I am very interested in your work, so when will the code be released? I can't wait to see it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.