Coder Social home page Coder Social logo

esmailza / deer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jeffhj/deer

0.0 0.0 0.0 89 KB

The implementation for "DEER: Descriptive Knowledge Graph for Explaining Entity Relationships" (EMNLP '22)

License: Apache License 2.0

Shell 0.23% Python 99.77%

deer's Introduction

DEER🦌

The code and data for "DEER: Descriptive Knowledge Graph for Explaining Entity Relationships" (EMNLP '22)

Introduction

We propose DEER🦌 (Descriptive Knowledge Graph for Explaining Entity Relationships) - an open and informative form of modeling entity relationships. In DEER, relationships between entities are represented by free-text relation descriptions. For instance, the relationship between entities of machine learning and algorithm is represented as "Machine learning explores the study and construction of algorithms that can learn from and make predictions on data."

image

Requirements

See requirements.txt

Data

Data are available on this link

Relation Description Extraction

  1. Suppose you already have downloaded the Wikipedia dump and preprocess it with WikiExtractor, you may extract the candidate relation descriptions by running.

    python extract_wiki.py preprocess_wikipedia [Wikipedia/folder]
    

    This code may run 26 hours to finish and you will get a digraph.pickle file under an extract_wiki folder.

    You may directly download the digraph.pickle file from this link

  2. [Option] The DEER is a directed graph. You may find it useful to have a undirected version for later steps. To convert the directed graph into an undirected one, you need to first make sure there exist a file extract_wiki/digraph.pickle, then run

    python extract_wiki.py convert_dir_to_undir
    

    and you will get a graph.pickle file under an extract_wiki folder.

  3. To generate the dataset, run

    python extract_wiki.py collect_sample [source graph directed(true/false)] [target graph directed(true/false)] [context_threshold]
    

    To get the dataset used to train the model in the paper, just use the default value by running

    python extract_wiki.py collect_sample false true 0.75
    

    and you will get a dataset file named dataset_0.50_undir2dir_0.75.json under the extract_wiki folder.

  4. To get the train/dev/test split, run

    python extract_wiki.py split_dataset [dataset_file] [prefix]
    

    where [dataset_file] is the dataset file generated from last step and [prefix] is the string prepended to the generated files, which is optional. You will get [prefix]_train/dev/test.json files when this command is finished.

Relation Description Generation

Train

Train model

python model/train_reader.py --config config/train3.yaml

Test

Generate relation descriptions for entity pairs in test set

python model/test_reader.py --config config/test3.yaml

Evaluation

Use Automatic metrics to evaluate the generated sentences in test set

python split_eval.py [model_path/in/test/config]/final_output.tsv

A output.txt and a target.txt file will be generated after running the above command. Then, run

bash RM-scorer.sh output.txt target.txt

to compare the generation outputs with the targets.

Citation

The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:

@inproceedings{huang2022deer,
  title={DEER: Descriptive Knowledge Graph for Explaining Entity Relationships},
  author={Huang, Jie and Zhu, Kerui and Chang, Kevin Chen-Chuan and Xiong, Jinjun and Hwu, Wen-mei},
  booktitle={Proceedings of EMNLP},
  year={2022}
}

deer's People

Contributors

jeffhj avatar zhukerui avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.