Coder Social home page Coder Social logo

piqa's Introduction

Phrase-Indexed Question Answering (PIQA)

@inproceedings{seo2018phrase,
 title={Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension},
 author={Seo, Minjoon and Kwiatkowski, Tom and Parikh, Ankur P and Farhadi, Ali and Hajishirzi, Hannaneh},
 booktitle={EMNLP},
 year={2018}
}

Introduction

We will assume that you have read the paper, though we will try to recap it here. PIQA challenge is about approaching (existing) extractive question answering tasks via phrase retrieval mechanism (we plan to hold the challenge for several extractive QA datasets in near future, though we currently only support PIQA for SQuAD 1.1.). This means we need:

  1. document encoder: enumerates a list of (phrase, vector) pairs from the document,
  2. question encoder: maps each question to the same vector space, and
  3. retrieval: retrieves the (phrasal) answer to the question by performing nearest neighbor search on the list.

While the challenge shares some similarities with document retrieval, a classic problem in information retrieval literature, a key difference is that the phrase representation will need to be context-based, which is more challenging than obtaining the embedding by its content.

An important aspect of the challenge is the constraint of independence between the document encoder and the question encoder. As we have noted in our paper, most existing models heavily rely on question-dependent representations of the context document. Nevertheless, phrase representations in PIQA need to be completely independent of the input question. Not only this makes the challenge quite difficult, but also state-of-the-art models cannot be directly used for the task. Hence we have proposed a few reasonable baseline models as the starting point, which can be found in this repository.

Note that it is also not so straightforward to strictly enforce the constraint on an evaluation platform such as CodaLab. For instance, current SQuAD 1.1 evaluator simply provides the test dataset (both context and question) without answers, and ask the model to output predictions, which are then compared against the answers. This setup is not great for PIQA because we cannot know if the submitted model abides the independence constraint. To resolve this issue, a submission should consist of the two encoders with explicit independence, and the retrieval is performed on the evaluator side. While it is not as convenient as a vanilla SQuAD submission, it strictly enforces independence constraint.

Tasks

piqa's People

Contributors

jhyuklee avatar seominjoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

piqa's Issues

TypeError: argument of type 'method' is not iterable

Training using

$ python main.py baseline --cuda

and testing with

$ python main.py baseline --cuda --mode test --iteration 501

makes an error as follows:

...
Model loaded from /tmp/piqa/squad/save/501/model.pt
Traceback (most recent call last):
  File "main.py", line 257, in <module>
    main()
  File "main.py", line 236, in main
    test(args)
  File "main.py", line 162, in test
    test_dataset = tuple(processor.preprocess(example) for example in test_examples)
  File "main.py", line 162, in <genexpr>
    test_dataset = tuple(processor.preprocess(example) for example in test_examples)
  File "/home/jinhyuk/github/piqa/squad/baseline/processor.py", line 124, in preprocess
    context_word_idxs = tuple(map(self._word2idx, context_words))
  File "/home/jinhyuk/github/piqa/squad/baseline/processor.py", line 276, in _word2idx
    return self._word2idx_dict[word] if word in self._word2idx_dict else 1
TypeError: argument of type 'method' is not iterable

Code version: fa113ab

DataLoader with args.cache does not load elmo idx

https://github.com/uwnlp/piqa/blob/3a3404d82bf61a07241035eaf64be10233e266dd/squad/baseline/processor.py#L215-L237

Collate fn uses self._elmo to check the use of elmo, but the cached Processor's _elmo is set to False.
(Cache was saved when processing SA+Elmo)

Error log is as follows:

$ python main.py baseline --cuda --mode embed_question --iteration 501 --test_path $SQUAD_DEV_QUESTION_PATH --elmo --num_heads 2 --batch_size 32 --cache
...
 'train_path': '/home/jinhyuk/data/squad/train-v1.1.json',
 'train_steps': 0,
 'word_vocab_size': 10000}
Model loaded from /tmp/piqa/squad/save/501/model.pt
Saving embeddings
Traceback (most recent call last):
  File "main.py", line 277, in <module>
    main()
  File "main.py", line 258, in main
    embed(args)
  File "main.py", line 240, in embed
    question_output = model.get_question(**test_batch)
  File "/home/jinhyuk/github/piqa/squad/baseline/model.py", line 285, in get_question
    q = self.question_embedding(question_char_idxs, question_glove_idxs, question_word_idxs, ex=question_elmo_idxs)
  File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jinhyuk/github/piqa/squad/baseline/model.py", line 98, in forward
    elmo, = self.elmo(ex)['elmo_representations']
  File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/allennlp/modules/elmo.py", line 133, in forward
    original_shape = inputs.size()
AttributeError: 'NoneType' object has no attribute 'size'

Performance difference between evaluate.py vs piqa_evaluate.py

Performances of two evaluation scripts differ as follows:

$ python evaluate.py $SQUAD_DEV_PATH /tmp/piqa/pred.json 
{"exact_match": 52.81929990539262, "f1": 63.28879733489547}
$ python piqa_evaluate.py $SQUAD_DEV_PATH /tmp/piqa/context_emb/ /tmp/piqa/question_emb/
{"exact_match": 52.28949858088931, "f1": 62.72236634535493}

Difference is about 0.5~0.6, and tested model is LSTM+SA+ELMo.

leaderboard?

Hey,
The arxiv paper links to this repo for the leaderboard, yet none is available. Have no valid submissions been received yet?

Thanks

Error while running piqa_evaluate.py

$ python piqa_evaluate.py $SQUAD_DEV_PATH /tmp/piqa/context_emb/ /tmp/piqa/question_emb/
Traceback (most recent call last):
File "piqa_evaluate.py", line 151, in
progress=args.progress)
File "piqa_evaluate.py", line 123, in get_predictions
m = sim.max(1)
File "/home/jinhyuk/anaconda3/envs/p3.6/lib/python3.6/site-packages/numpy/core/_methods.py", line 28, in _amax
return umr_maximum(a, axis, None, out, keepdims, initial)
numpy.core._internal.AxisError: axis 1 is out of bounds for array of dimension 1

Changing codes from

https://github.com/uwnlp/piqa/blob/c6e871919a6c664d53db58fd2d7c047843ce2e75/piqa_evaluate.py#L119-L125
to

119         else:                                                                                                
120             q_emb = q_emb['arr_0']                                                                           
121             c_emb = c_emb['arr_0']                                                                           
122             m = np.matmul(c_emb, q_emb.T)                                                                  
123             # m = sim.max(1)                                                                                 
124                                                                                                              
125         argmax = m.argmax(0)

simply solves the problem, but don't know why the original code was in that form.
q_emb.shape = (1024,), c_emb.shape = (1008, 1024) leads to sim.shape = (1008),
and just taking argmax(0) seems fine.

About computation of Elmo

Hi,

Thanks for uploading the code!

It seems that in model.py you are maintaining the scale and mixing parameters for computing the averaged Elmo representation. However, from my understanding of allennlp.modules.elmo.Elmo this should be already taken care of by the Elmo class.

Is there any reason for re-mixing the already mixed Elmo representations returned? Or am I misunderstanding something?

Thanks!
Bhuwan

model.init(metadata) for test/embed

Error occurs due to no initialized model.elmo when loading the models with elmo.
model.init(metadata) should be called in test(), embed() functions like in train() function.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.