seominjoon / piqa Goto Github PK

View Code? Open in Web Editor NEW

95.0 6.0 9.0 2.05 MB

Phrase-Indexed Question Answering (PIQA)

Home Page: https://pi-qa.com

License: Apache License 2.0

Python 91.55% Shell 3.33% CSS 0.38% HTML 4.75%

emnlp2018

piqa's Introduction

Phrase-Indexed Question Answering (PIQA)

This is the official github repository for Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension (EMNLP 2018).
Much of the work and code is heavily influenced by our former project at Google AI.
For inquiries, please contact Minjoon Seo (@seominjoon).
For citation, please use:

@inproceedings{seo2018phrase,
 title={Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension},
 author={Seo, Minjoon and Kwiatkowski, Tom and Parikh, Ankur P and Farhadi, Ali and Hajishirzi, Hannaneh},
 booktitle={EMNLP},
 year={2018}
}

Introduction

We will assume that you have read the paper, though we will try to recap it here. PIQA challenge is about approaching (existing) extractive question answering tasks via phrase retrieval mechanism (we plan to hold the challenge for several extractive QA datasets in near future, though we currently only support PIQA for SQuAD 1.1.). This means we need:

document encoder: enumerates a list of (phrase, vector) pairs from the document,
question encoder: maps each question to the same vector space, and
retrieval: retrieves the (phrasal) answer to the question by performing nearest neighbor search on the list.

While the challenge shares some similarities with document retrieval, a classic problem in information retrieval literature, a key difference is that the phrase representation will need to be context-based, which is more challenging than obtaining the embedding by its content.

An important aspect of the challenge is the constraint of independence between the document encoder and the question encoder. As we have noted in our paper, most existing models heavily rely on question-dependent representations of the context document. Nevertheless, phrase representations in PIQA need to be completely independent of the input question. Not only this makes the challenge quite difficult, but also state-of-the-art models cannot be directly used for the task. Hence we have proposed a few reasonable baseline models as the starting point, which can be found in this repository.

Note that it is also not so straightforward to strictly enforce the constraint on an evaluation platform such as CodaLab. For instance, current SQuAD 1.1 evaluator simply provides the test dataset (both context and question) without answers, and ask the model to output predictions, which are then compared against the answers. This setup is not great for PIQA because we cannot know if the submitted model abides the independence constraint. To resolve this issue, a submission should consist of the two encoders with explicit independence, and the retrieval is performed on the evaluator side. While it is not as convenient as a vanilla SQuAD submission, it strictly enforces independence constraint.

Tasks

Phrase-Indexed SQuAD (PI-SQuAD)

piqa's People

Contributors

Stargazers

Watchers

Forkers

yucoian jhyuklee heath-lee phileasme arthurcamara colinsongf raivnlab sarshaw sunminyu

piqa's Issues

TypeError: argument of type 'method' is not iterable

Training using

$ python main.py baseline --cuda

and testing with

$ python main.py baseline --cuda --mode test --iteration 501

makes an error as follows:

...
Model loaded from /tmp/piqa/squad/save/501/model.pt
Traceback (most recent call last):
  File "main.py", line 257, in <module>
    main()
  File "main.py", line 236, in main
    test(args)
  File "main.py", line 162, in test
    test_dataset = tuple(processor.preprocess(example) for example in test_examples)
  File "main.py", line 162, in <genexpr>
    test_dataset = tuple(processor.preprocess(example) for example in test_examples)
  File "/home/jinhyuk/github/piqa/squad/baseline/processor.py", line 124, in preprocess
    context_word_idxs = tuple(map(self._word2idx, context_words))
  File "/home/jinhyuk/github/piqa/squad/baseline/processor.py", line 276, in _word2idx
    return self._word2idx_dict[word] if word in self._word2idx_dict else 1
TypeError: argument of type 'method' is not iterable

Code version: fa113ab

For the initial run, nltk download needed

Error occurs if nltk.download('punkt') was not called.
Maybe edit README.md or download.sh ?

DataLoader with args.cache does not load elmo idx

https://github.com/uwnlp/piqa/blob/3a3404d82bf61a07241035eaf64be10233e266dd/squad/baseline/processor.py#L215-L237

Collate fn uses self._elmo to check the use of elmo, but the cached Processor's _elmo is set to False.
(Cache was saved when processing SA+Elmo)

Error log is as follows:

$ python main.py baseline --cuda --mode embed_question --iteration 501 --test_path $SQUAD_DEV_QUESTION_PATH --elmo --num_heads 2 --batch_size 32 --cache
...
 'train_path': '/home/jinhyuk/data/squad/train-v1.1.json',
 'train_steps': 0,
 'word_vocab_size': 10000}
Model loaded from /tmp/piqa/squad/save/501/model.pt
Saving embeddings
Traceback (most recent call last):
  File "main.py", line 277, in <module>
    main()
  File "main.py", line 258, in main
    embed(args)
  File "main.py", line 240, in embed
    question_output = model.get_question(**test_batch)
  File "/home/jinhyuk/github/piqa/squad/baseline/model.py", line 285, in get_question
    q = self.question_embedding(question_char_idxs, question_glove_idxs, question_word_idxs, ex=question_elmo_idxs)
  File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jinhyuk/github/piqa/squad/baseline/model.py", line 98, in forward
    elmo, = self.elmo(ex)['elmo_representations']
  File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 133, in forward
    original_shape = inputs.size()
AttributeError: 'NoneType' object has no attribute 'size'

Performance difference between evaluate.py vs piqa_evaluate.py

Performances of two evaluation scripts differ as follows:

$ python evaluate.py $SQUAD_DEV_PATH /tmp/piqa/pred.json 
{"exact_match": 52.81929990539262, "f1": 63.28879733489547}
$ python piqa_evaluate.py $SQUAD_DEV_PATH /tmp/piqa/context_emb/ /tmp/piqa/question_emb/
{"exact_match": 52.28949858088931, "f1": 62.72236634535493}

Difference is about 0.5~0.6, and tested model is LSTM+SA+ELMo.

leaderboard?

Hey,
The arxiv paper links to this repo for the leaderboard, yet none is available. Have no valid submissions been received yet?

Thanks

Allennlp does not support 3.7 yet

Error while running piqa_evaluate.py

$ python piqa_evaluate.py $SQUAD_DEV_PATH /tmp/piqa/context_emb/ /tmp/piqa/question_emb/
Traceback (most recent call last):
File "piqa_evaluate.py", line 151, in
progress=args.progress)
File "piqa_evaluate.py", line 123, in get_predictions
m = sim.max(1)
File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/numpy/core/_methods.py", line 28, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial)
numpy.core._internal.AxisError: axis 1 is out of bounds for array of dimension 1

Changing codes from

https://github.com/uwnlp/piqa/blob/c6e871919a6c664d53db58fd2d7c047843ce2e75/piqa_evaluate.py#L119-L125
to

119         else:                                                                                                
120             q_emb = q_emb['arr_0']                                                                           
121             c_emb = c_emb['arr_0']                                                                           
122             m = np.matmul(c_emb, q_emb.T)                                                                  
123             # m = sim.max(1)                                                                                 
124                                                                                                              
125         argmax = m.argmax(0)

simply solves the problem, but don't know why the original code was in that form.
q_emb.shape = (1024,), c_emb.shape = (1008, 1024) leads to sim.shape = (1008),
and just taking argmax(0) seems fine.

About computation of Elmo

Hi,

Thanks for uploading the code!

It seems that in model.py you are maintaining the scale and mixing parameters for computing the averaged Elmo representation. However, from my understanding of allennlp.modules.elmo.Elmo this should be already taken care of by the Elmo class.

Is there any reason for re-mixing the already mixed Elmo representations returned? Or am I misunderstanding something?

Thanks!
Bhuwan

seominjoon / piqa Goto Github PK

piqa's Introduction

Phrase-Indexed Question Answering (PIQA)

Introduction

Tasks

piqa's People

Contributors

Stargazers

Watchers

Forkers

piqa's Issues

Recommend Projects

Recommend Topics

Recommend Org