Coder Social home page Coder Social logo

gear's Introduction

GEAR

Source code and dataset for the ACL 2019 paper "GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification".

Requirements:

Please make sure your environment includes:

python (tested on 3.6.7)
pytorch (tested on 1.0.0)

Then, run the command:

pip install -r requirements.txt

Evidence Extraction

We use the codes from Athene UKP TU Darmstadt in the document retrieval and sentence selection steps.

Our evidence extraction results can be found in Tsinghua Cloud or Google Cloud.

Download these files and put them in the data/retrieved/ folder. Then the folder will look like

data/retrieved/
    train.ensembles.s10.jsonl
    dev.ensembles.s10.jsonl
    test.ensembles.s10.jsonl

Data Preparation

# Download the fever database
wget -O data/fever/fever.db https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever.db

# Extract the evidence from database
cd scripts/
python retrieval_to_bert_input.py

# Build the datasets for gear
python build_gear_input_set.py

cd ..

Feature Extraction

First download our pretrained BERT-Pair model (Tsinghua Cloud or Google Cloud) and put the files into the pretrained_models/BERT-Pair/ folder.

Then the folder will look like this:

pretrained_models/BERT-Pair/
    	pytorch_model.bin
    	vocab.txt
    	bert_config.json

Then run the feature extraction scripts.

cd feature_extractor/
chmod +x *.sh
./train_extracor.sh
./dev_extractor.sh
./test_extractor.sh
cd ..

GEAR Training

cd gear
CUDA_VISIBLE_DEVICES=0 python train.py
cd ..

GEAR Testing

cd gear
CUDA_VISIBLE_DEVICES=0 python test.py
cd ..

Results Gathering

cd gear
python results_scorer.py
cd ..

Cite

If you use the code, please cite our paper:

@inproceedings{zhou2019gear,
  title={GEAR: Graph-based Evidence Aggregating and Reasoning for Fact Verification},
  author={Zhou, Jie and Han, Xu and Yang, Cheng and Liu, Zhiyuan and Wang, Lifeng and Li, Changcheng and Sun, Maosong},
  booktitle={Proceedings of ACL 2019},
  year={2019}
}

gear's People

Contributors

cronopioelectronico avatar dependabot[bot] avatar jayzzhou-thu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gear's Issues

请问evidence Graph 是怎么构建的?

请问evidence Graph 是怎么构建的? 是每一个短句算是一条evidence吗? 看上去因为直接用attention学习边的权值, 那个图到底是什么样子的呢? 论文里简单的提及了一下但没具体讲

figure 2咨询

同学你好,论文中figure2 代码数据可以提供一下吗?我学习一下。谢谢。

预训练模型

你好,请问从哪里可以下载到GEAR的预训练模型?

The the number of MLP in ERNet

论文中提到在ERNet中利用MLP计算attention,从论文来理解是每层ERNet会包含两个参数$W_{0}^{t}$和$W_{1}^{t}$用于MLP。
但是从代码实现上,好像是为每层ERNet的每个节点都初始化了两个参数$W_{0}$和$W_{1}$:

# each SelfAttentionLayer cantains two Linear
self.attentions = [SelfAttentionLayer(nhid=nhid * 2, nins=nins) for _ in range(nins)]

所以MLP的参数在层内不是共享的吗?

Session aborted when running feature extraction script

Hi, this error occurs when I running any feature extractor script.

Here is the last few lines of output when I do bash test_extractor.sh

...
7f8aa4cc4000-7f8aa4cc8000 rw-p 00000000 00:00 0
7f8aa4ccc000-7f8aa5782000 r-xp 00000000 07:0b 31286264                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so
7f8aa5782000-7f8aa5981000 ---p 00ab6000 07:0b 31286264                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so
7f8aa5981000-7f8aa5995000 r--p 00ab5000 07:0b 31286264                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so
7f8aa5995000-7f8aa59b1000 rw-p 00ac9000 07:0b 31286264                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so
7f8aa59b1000-7f8aa5a05000 rw-p 00000000 00:00 0
7f8aa5a05000-7f8aa5a71000 rw-p 00d72000 07:0b 31286264                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_python.so
7f8aa5a71000-7f8aa5a79000 r-xp 00000000 07:0b 31286262                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libshm.so
7f8aa5a79000-7f8aa5c78000 ---p 00008000 07:0b 31286262                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libshm.so
7f8aa5c78000-7f8aa5c79000 r--p 00007000 07:0b 31286262                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libshm.so
7f8aa5c79000-7f8aa5c7a000 rw-p 00008000 07:0b 31286262                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libshm.so
7f8aa5c7a000-7f8aa5c80000 rw-p 0000c000 07:0b 31286262                   /root/anaconda3/lib/python3.7/site-packages/torch/lib/libshm.so
7f8aa5c80000-7f8aa5c95000 r-xp 00000000 07:0b 3255664141                 /root/anaconda3/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so
7f8aa5c95000-7f8aa5e94000 ---p 00015000 07:0b 3255664141                 /root/anaconda3/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so
7f8aa5e94000-7f8aa5e95000 r--p 00014000 07:0b 3255664141                 /root/anaconda3/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so
7f8aa5e95000-7f8aa5e96000 rw-p 00015000 07:0b 3255664141                 /root/anaconda3/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so
7f8aa5e96000-7f8aa5e9d000 rw-p 00026000 07:0b 3255664141                 /root/anaconda3/lib/python3.7/site-packages/torch/_C.cpython-37m-x86_64-linux-gnu.so
7f8aa5e9d000-7f8aa5e9f000 r--p 00000000 07:0b 2192615613                 /root/anaconda3/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.so
7f8aa5e9f000-7f8aa5ea0000 r-xp 00002000 07:0b 2192615613                 /root/anaconda3/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.so
7f8aa5ea0000-7f8aa5ea1000 r--p 00003000 07:0b 2192615613                 /root/anaconda3/lib/python3.7/lib-dynload/_queue.cpython-37m-x86_64-linux-gnu.sotest_extractor.sh: line 8:  4114 Aborted                 (core dumped) CUDA_VISIBLE_DEVICES=0 python extractor.py --input_file ../data/gear/gear-test-set-0_001.tsv --output_file ../data/gear/gear-test-set-0_001-features.tsv --bert_model ../pretrained_models/BERT-Pair/ --do_lower_case --max_seq_length 128 --batch_size 512

Does anyone know what could cause such error?

Specs:
3 core cpu, 8 GB memory, Ubuntu 16.04, 1080Ti

Document Retrieval and Knowledge Graph

您好,在阅读paper的时候paper有提到说这个模型error的主要来源是document retrieval和sentence selection这两个步骤。想请问如果能够将大量的Wikipedia document连接成一个knowledge graph 是否有助于提升模型的accuracy呢?(目前手上有约100万笔的维基条目资料)谢谢!

download fever.db problem

wget -O data/fever/fever.db https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever.db
--2023-04-22 23:19:50-- https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever.db
Resolving s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)... 52.218.28.147, 52.218.46.24, 52.218.31.35, ...
Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|52.218.28.147|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-04-22 23:19:52 ERROR 403: Forbidden.

train.ensembles.s10.jsonl broken

Hi,

Thanks for the sharing the code. During running retrieval_to_bert_input.py, I found the file train.ensembles.s10.jsonl broken was broken. The last line is

{"id": 105959, "verifiable": "VERIFIABLE", "label": "REFUTES", "claim": "Julie Christie was nominated for an Oscar for the 2004 film Afterglow.", "evidence": [[[124380, 138737, "Julie_Christie", 12]]], "noun_phrases": ["an Oscar", "Julie Christie", "Julie Christie was nominated for

which can't be loaded by json. Please check.

Thanks

Feature Extraction problem

Hi, I tried to follow the steps and had a question at this step.
When I implemented the code ' ./train_extractor.sh', I got this error

image

There was no file named gear-train-set-0_001.tsv in the specific folder.(/data/gear/ is empty)
Where can I get these files?
Thank you!

403forbiden

你好!

Download the fever database

wget -O data/fever/fever.db https://s3-eu-west-1.amazonaws.com/fever.public/wiki_index/fever.db
下载数据时报错:
Connecting to s3-eu-west-1.amazonaws.com (s3-eu-west-1.amazonaws.com)|52.218.25.219|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2023-03-23 23:04:59 ERROR 403: Forbidden.
试了很多办法无法解决

core dumped

when i run ./train_extractor.sh, "core dumped "issue always occured.

GEAR model

Is there a trained model we could use just for predictions?

Bug in requirements installation

The current requirements include git+git://github.com/sheffieldnlp/fever-scorer@master that trigger the following bug pypa/setuptools#2741 if setuptools it is not in the correct version.

This can be fixed as explained in the previous link, by using setuptools==56.1.0

The PR #16 fix it by adding it to the requirements of GEAR.

同样的问题

database disk image is malformed
145449
0%| | 0/145449 [00:00<?, ?it/s]
Traceback (most recent call last):
File "retrieval_to_bert_input.py", line 123, in
process('../data/retrieved/train.ensembles.s10.jsonl', '../data/bert/bert-nli-train-retrieve-set.tsv')
File "retrieval_to_bert_input.py", line 40, in process
(utils.normalize(article),)
sqlite3.DatabaseError: database disk image is malformed
我也是那个文件的问题,不知道这是文件的问题吗?

hi

145000
../data/retrieved/train.ensembles.s10.jsonl
145449
7%|███▌ | 10000/145449 [00:08<02:06, 1073.78it/s]Error: cant find empty 0 for 10130
7%|███▊ | 10507/145449 [00:08<01:37, 1381.50it/s]Error: cant find empty 0 for 10633
13%|██████▋ | 18805/145449 [00:15<01:53, 1120.60it/s]Error: cant find empty 0 for 18963
14%|███████▍ | 20926/145449 [00:16<01:23, 1494.50it/s]
我又从谷歌云盘上下载了文件 但是出现这个问题 不知道您知道为什么吗

hi

git+git://github.com/j6mes/drqa@parallel
git+git://github.com/sheffieldnlp/fever-scorer@master
这两个是不是无法git了呀

BERT-Pair

您好!请问有预训练 BERT-Pair 的代码和数据吗?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.