Coder Social home page Coder Social logo

comqagpt's Introduction

ComQA:Compositional Question Answering via Hierarchical Graph Neural Networks arxiv



About the ComQA

Compositional Question Answering (ComQA), where the answer is composed of different and discontinuous parts from a document, is ubiquitous in current QA system. An example is shown below, where the answer is composed of two subtitles from the page and a sentence from the first paragraph, it makes the current span-extraction based machine reading comprehension system hard to apply:



We proposed a large scale QA dataset containing more than 200,000 compostional question answer pairs. The questions in ComQA are either from the user queries issued to the Sogou Search or from the page title in a web page. We obtained nearly 300,000 web pages and employed the crowdworkers to annotated them. We develop a web based annotation interface and a snapshot is shown below. The answers are discontinuous nodes from the page's HTML.



Finally, we get nearly 230,000 question-answer pairs:

Training data Development Data Test Data
117,343 5,000 2,054

The questions in ComQA spans a large range of category such as medical, education, entertainment, etc.



You can obtain the ComQA data in the data directory (distributed under the CC BY-SA 4.0 知识共享许可协议 licence).

Run the models

In this repository we provided four types of models for ComQA, i.e. the BERT, our re-implemented BERT that fixed long sequence, QANet, and our proposed hirachical graph neural networks.

requirements: You should install the Nvidia Apex before running the code:

$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_multihead_attn" ./
$ pip install -r requirements.txt

Training and evaluation:

sh train_xxx.sh

where xxx stands for model name, such as bert. The bash scripts will first preporcess the data, and then run the training code. Finally, evaluation is conducted on the test set. An instance of the bert training in shown below.

#!/bin/bash
base_dir=`pwd`
jobname='bert for ComQA'
train_features_path=${base_dir}/data/train.bert.obj
dev_features_path=${base_dir}/data/dev.bert.obj
test_features_path=${base_dir}/data/test.bert.obj
model_save_path=${base_dir}/model/bert.comqa.base.th
echo $jobname
echo "start processing data"
echo $train_features_path
echo $dev_features_path
echo $test_features_path

cd process
python3 process_bert.py --train=${train_features_path} --dev=${dev_features_path} --test=${test_features_path}
cd ../train

echo "start training"
python3.6 -m torch.distributed.launch --nproc_per_node=4 bert.py \
--train_file_path=${train_features_path} \
--dev_file_path=${dev_features_path} \
--model_save_path=${model_save_path} \
--epoch=10 \
--pretrain_model=${base_dir}/model/bert.base.th

cd ../evaluation
echo "start evaluation"
python3 evaluate_bert.py ${test_features_path} ${model_save_path}

Results

  • Test set
Precision Recal F1 Accuracy BLUE-4
LSTM 83.4 46.1 59.4 28.2 41.4
QANet 71.6 62.4 66.3 35.7 48.8
BERT_official 81.1 66.2 72.9 38.4 54.9
BERT_base 79.6 74.4 77.5 45.2 58.3
BERT_large 80.4 75.5 78.3 47.3 61.2
HGNN_base 80.2 75.4 78.0 46.2 60.7
HGNN_large 81.3 76.9 79.6 48.3 61.9

Citation

If you use ComQA in your research, please cite our work with the following BibTex Entry

@inproceedings{comqabingningwang,
  author    = {Bingning Wang and
               Ting Yao and
               Weipeng Chen and
               Jingfang Xu and
               Xiaochuan Wang},
  title     = {ComQA:Compositional Question Answering via Hierarchical Graph Neural Networks},
  booktitle = {The Web Conference 2021 (WWW2021)},  
  year      = {2021}
}

License

知识共享许可协议
本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。

comqagpt's People

Contributors

benywon avatar yxk9810 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.