allenai / document-qa Goto Github PK

License: Apache License 2.0

Python 97.50% HTML 2.50%

document-qa's Introduction

Document QA

This repo contains code for our paper Simple and Effective Multi-Paragraph Reading Comprehension. It can be used to train neural question answering models in tensorflow, and in particular for the case when we want to run the model over multiple paragraphs for each question. Code is included to train on the TriviaQA and SQuAD datasets.

A demo of this work can be found at documentqa.allenai.org

Small forewarning, this is still much more of a research codebase then a library. we anticipate porting this work in allennlp where it will enjoy a cleaner implementation and more stable support.

Setup

Dependencies

We require python >= 3.5, tensorflow 1.3, and a handful of other supporting libraries. Tensorflow should be installed separately following the docs. To install the other dependencies use

pip install -r requirements.txt

The stopword corpus and punkt sentence tokenizer for nltk are needed and can be fetched with:

python -m nltk.downloader punkt stopwords

The easiest way to run this code is to use:

export PYTHONPATH=${PYTHONPATH}:`pwd`

Data

By default, we expect source data to be stored in "~/data" and preprocessed data to be stored in "./data". The expected file locations can be changed by altering config.py.

Word Vectors

The models we train use the common crawl 840 billion token GloVe word vectors from here. They are expected to exist in "~/data/glove/glove.840B.300d.txt" or "~/data/glove/glove.840B.300d.txt.gz".

For example:

mkdir -p ~/data
mkdir -p ~/data/glove
cd ~/data/glove
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip
rm glove.840B.300d.zip

SQuAD Data

Training or testing on SQuAD requires downloading the SQuAD train/dev files into ~/data/squad. This can be done as follows:

mkdir -p ~/data/squad
cd ~/data/squad
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v1.1.json
wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v1.1.json

then running:

python docqa/squad/build_squad_dataset.py

This builds pkl files of the tokenized data in "./data/squad"

TriviaQA Data

The raw TriviaQA data is expected to be unzipped in "~/data/triviaqa". Training or testing in the unfiltered setting requires the unfiltered data to be download to "~/data/triviaqa-unfiltered".

mkdir -p ~/data/triviaqa
cd ~/data/triviaqa
wget http://nlp.cs.washington.edu/triviaqa/data/triviaqa-rc.tar.gz
tar xf triviaqa-rc.tar.gz
rm triviaqa-rc.tar.gz

cd ~/data
wget http://nlp.cs.washington.edu/triviaqa/data/triviaqa-unfiltered.tar.gz
tar xf triviaqa-unfiltered.tar.gz
rm triviaqa-unfiltered.tar.gz

To use TriviaQA we need to tokenize the evidence documents, which can be done by

python docqa/triviaqa/evidence_corpus.py

This can be slow, we support multi-processing

python docqa/triviaqa/evidence_corpus.py --n_processes 8

This builds evidence files in "./data/triviaqa/evidence" that are split into paragraphs, sentences, and tokens. Then we need to tokenize the questions and locate the relevant answers spans in each document. Run

python docqa/triviaqa/build_span_corpus.py {web|wiki|open} --n_processes 8

to build the desired set. This builds pkl files "./data/triviaqa/{web|wiki|open}"

Training

Once the data is in place our models can be trained by

python docqa/scripts/ablate_{triviaqa|squad|triviaqa_wiki|triviaqa_unfiltered}.py

See the help menu for these scripts for more details. Note that since we use the Cudnn RNN implementations, these models can only be trained on a GPU. We do provide a script for converting the (trained) models to CPU versions:

python docqa/scripts/convert_to_cpu.py

Modifying the hyper-parameters beyond the ablations requires building your own train script.

Testing

SQuAD

Use "docqa/eval/squad_eval.py" to evaluate on paragraph-level (i.e., standard) SQuAD. For example:

python docqa/eval/squad_eval.py -o output.json -c dev /path/to/model/directory

"output.json" can be used with the official evaluation script, for example:

python docqa/squad/squad_official_evaluation.py ~/data/squad/dev-v1.1.json output.json

Use "docqa/eval/squad_full_document_eval.py" to evaluate on the document-level. For example

python docqa/eval/squad_full_document_eval.py -c dev /path/to/model/directory output.csv

This will store the per-paragraph results in output.csv, we can then run:

python docqa/eval/ranked_scores.py output.csv

to get ranked scores as more paragraphs are used.

TriviaQA

Use "docqa/eval/triviaqa_full_document_eval.py" to evaluate on TriviaQA datasets, like:

python docqa/eval/triviaqa_full_document_eval.py --n_processes 8 -c web-dev --tokens 800 -o question-output.json -p paragraph-output.csv /path/to/model/directory

Then the "question-output.json" can be used with the standard triviaqa evaluation script, the "paragraph-output.csv" contains per-paragraph output, we can run

python docqa/eval/ranked_scores.py paragraph-output.csv

to get ranked scores as more paragraphs as used for each question, or

python docqa/eval/ranked_scores.py --per_doc paragraph-output.csv

to get ranked scores as more paragraphs as used for each (question, document) pair, as should be done for TrivaQA web.

User Input

"docqa/scripts/run_on_user_documents.py" serves as a heavily commented example of how to run our models and pre-processing pipeline on other kinds of text. For example:

python docqa/scripts/run_on_user_documents.py /path/to/model/directory "Who wrote the satirical essay 'A Modest Proposal'?" ~/data/triviaqa/evidence/wikipedia/A_Modest_Proposal.txt ~/data/triviaqa/evidence/wikipedia/Jonathan_Swift.txt

Pre-Trained Models

We have four pre-trained models

"squad" Our model trained on the standard SQuAD dataset, this model is listed on the SQuAD leaderboard as BiDAF + Self Attention
"squad-shared-norm" Our model trained on document-level SQuAD using the shared-norm approach.
"triviaqa-web-shared-norm" Our model trained on TriviaQA web with the shared-norm approach. This is the model we used to submit scores to the TriviaQA leader board.
"triviaqa-unfiltered-shared-norm" Our model trained on TriviaQA unfiltered with the shared-norm approach. This is the model that powers our demo.

The models can be downloaded here

The models use the cuDNN implementation of GRUs by default, which means they can only be run on the GPU. We also have slower, but CPU compatible, versions here.

document-qa's People

Stargazers

Watchers

Forkers

vshan vinaymundada27 robinjia wsdm-paper-reading robert-tien ajay01994 rajasagashe purn3ndu aviczhl2 yedeming rokeer chiuyeelau meinwerk sungjinlees matt-peters pl8787 lim0606 eyebear jadielam shubhampachori12110095 levstyle robbine resec antriv yafuilee viig99 sxdkxgwan raghavendranpm webblearning ntson2002 dirkweissenborn johannesmaxwel rachelker fendaq wolfhu pokbe luisfredgs alontalmor zwdcs rajarshd aiedward batteryhp wangpeng3891 vedeshk viikasgarg liu4lin hrlinlp tianyikenan ltoscano xennygrimmato sunnymarkliu yucoian wazzy jenniferdx yangalan123 chenghuige chrisc36 gaochonga minhson-kaist xuezhong lrh000 csarron sarveshsoni7 hezhihao10 hannandarryl alphaf52 shdut tanbro tomarraj008 havesupper phelanwang ftarlaci jizxgit sangnie vinaybysani pku-wuwei libertatis aalisha danieltangxz joaopfg pengshuang wjleddy jianzhengming ilibx mrschnappi subbaraomanchala ccmaymay majdiali2 gejianwnen antonizdp caoyun001 arshiasait ca-schaefer orkuntemiz icemanjia solversa zhangxuemiao changqing1234 scape1989 lizhaofu

document-qa's Issues

Regards Readme file

As I am a beginner I was unable to understand the you have provided, it was in brief explanation format.
Could you please send me with complete explanation or any documentation with full description on this project.
We are trying to execute this project from a month but unable to execute,
while we try to execute this command

Can not understand why do this op?

dist_matrix = tf.exp(dist_matrix)

https://github.com/allenai/document-qa/blob/master/docqa/nn/attention.py#L37

Results seem to be much worse than those output by a bidaf graph not trained on multi paragraph

I have been messing around with this project and I have found that the results don't seem to be in anywhere near as good as the bidaf model.

For example if we compare answers from question and document pairs at
http://demo.allennlp.org/machine-comprehension and https://documentqa.allenai.org/ we see that this project preforms alot worse, and seems to much prefer shorter answers.

Examples

Example 1

paragraph:
Robotics is an interdisciplinary branch of engineering and science that includes mechanical engineering, electrical engineering, computer science, and others. Robotics deals with the design, construction, operation, and use of robots, as well as computer systems for their control, sensory feedback, and information processing. These technologies are used to develop machines that can substitute for humans. Robots can be used in any situation and for any purpose, but today many are used in dangerous environments (including bomb detection and de-activation), manufacturing processes, or where humans cannot survive. Robots can take on any form but some are made to resemble humans in appearance. This is said to help in the acceptance of a robot in certain replicative behaviors usually performed by people. Such robots attempt to replicate walking, lifting, speech, cognition, and basically anything a human can do.
question
What do robots that resemble humans attempt to do?
answer from http://demo.allennlp.org/machine-comprehension
replicate walking, lifting, speech, cognition
answer from https://documentqa.allenai.org/
walking

Example 2

paragraph
The Matrix is a 1999 science fiction action film written and directed by The Wachowskis, starring Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano. It depicts a dystopian future in which reality as perceived by most humans is actually a simulated reality called "the Matrix", created by sentient machines to subdue the human population, while their bodies' heat and electrical activity are used as an energy source. Computer programmer "Neo" learns this truth and is drawn into a rebellion against the machines, which involves other people who have been freed from the "dream world."
question
Who stars in The Matrix?
answer from http://demo.allennlp.org/machine-comprehension
Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss, Hugo Weaving, and Joe Pantoliano
answer from https://documentqa.allenai.org/
Keanu Reeves

Do you guys know why this might be ?

What is a good choice for vocab.txt ?

For the script run_on_user_documents.py, it takes input paragraphs and user question as vocabulary set to build the tensorflow graph.

But in real application, we want the graph is built in advance.
What is a good choice for vocab.txt in general when we set the following:
model.set_input_spec(ParagraphAndQuestionSpec(batch_size=len(context)), voc)?

I tried the whole vocabulary set for glove.840B.300d.txt as input.
But it seems so large that causing the following problem:
ValueError: Cannot create a tensor proto whose content is larger than 2GB.

In the class SingleSpanAnswerEncoder a tf placeholder is being assign to a python variable which will never be used.

I might be missing something.
But in the class SingleSpanAnswerEncoder you are assigning tf.placeholder to a field however in the encode section you are overriding this field with np.zeros.
What was the point in assigning a the tf.placeholder in the first place?
code below
`
class SingleSpanAnswerEncoder(AnswerEncoder):
""" Encode the answer as integer coordinates, pick a random answer span if multiple spans exists """

def __init__(self):
    self.answer_spans = None

def get_placeholders(self) -> List:
    return [self.answer_spans]

def init(self, batch_size, context_word_dim):
    self.answer_spans = tf.placeholder('int32', [batch_size, 2], name='answer_spans')

def encode(self, batch_size, context_len, context_word_dim, batch) -> Dict:
    answer_spans = np.zeros([batch_size, 2], dtype='int32')
    ....
def __getstate__(self):
    return {}

def __setstate__(self, state):
    return self.__init__() `

Documentation for ShallowOpenWebRanker

Would you be able to explain how ShallowOpenWebRanker works?
I can't understand why it is putting focus on the first words of each paragraph?
I would be happy to add documentation for it when I understand it.

Test on user documents and questions locally using pre-trained models.

How can we test on our own documents and corresponding questions using the pre-trained models?
Could you forward us to any specific methods we need to modify or look into?
Thanks a lot.

No such file or directory: .... data/triviaqa/wiki/file_map.json when

I'm trying to debug the training the model on the data for triviaqa, however I don't seem to be be able to run it without getting an error.
At the moment I am trying to run document-qa/docqa/scripts/ablate_triviaqa_wiki.py.
However I get the following error No such file or directory: .... data/triviaqa/wiki/file_map.json.
What is suppose to be in the file_map.json file?

No-answer training

Why in choices=["paragraph", "confidence", "shared-norm", "merge", "sigmoid"] there isn't a "no-answer" option, described in paper?

How was the pre-trained ELMo model trained?

Hi Chris,

Could you explain how the pre-trained ELMo model was trained a little bit?
Also, could provide a pointer to the paper that describes this model?
Thanks in advance.

Best,
Felix

shape of dist_matrix in MultiSelfAttention

In MultiSelfAttention class:

line 235: dist_matrix = tf.einsum("bwhd,bkhd->bwkh", queries, keys) # dots of (batch, word, key, head)

line 255: select_probs = tf.nn.softmax(dist_matrix) # for each (batch, word, head) probability over keys

The shapes of dist_matrix in the two lines seem not consistent. Should line 235 be changed to:
dist_matrix = tf.einsum("bwhd,bkhd->bwhk", queries, keys) # dots of (batch, word, head, key) ?

DotMerge is confusing name for a class because it does not do a dot product

The name on the DotMerge class is confusing and it also does not seem to be used anywhere.
Maybe it should be deleted ?

Getting an error while training the model on TriviaQA Web

Hi,
Thank you for sharing this amazing work.

I followed the instructions to pre-process the data and trained the model on the TriviaQA Web.
However, I got an error when it tried to save the model after evaluation.

Here are the preprocessing scripts I ran:

python docqa/triviaqa/evidence_corpus.py --n_processes 8
python docqa/triviaqa/build_span_corpus.py web --n_processes 8

Then I ran:

python docqa/scripts/ablate_triviaqa.py -n 8 shared-norm-600 save/web-shared-600

and got the following error:

on epoch=0 batch=1710 step=1710, time=22.648
on epoch=0 batch=1740 step=1740, time=21.728
on epoch=0 batch=1770 step=1770, time=21.884
on epoch=0 batch=1800 step=1800, time=22.147
Checkpointing
Running evaluation...
dev: 100%|████████████████████████████████████| 125/125 [01:40<00:00,  1.70it/s]
Traceback (most recent call last):
  File "docqa/scripts/ablate_triviaqa.py", line 180, in <module>
    main()
  File "docqa/scripts/ablate_triviaqa.py", line 176, in main
    trainer.start_training(data, model, params, eval, model_dir.ModelDir(out), notes)
  File "/home/fw245/clones/document-qa/docqa/trainer.py", line 274, in start_training
    True, train_params, evaluators, out, notes, dry_run)
  File "/home/fw245/clones/document-qa/docqa/trainer.py", line 331, in _train
    evaluators, out, notes, dry_run, start_eval)
  File "/home/fw245/clones/document-qa/docqa/trainer.py", line 624, in _train_async
    val = evaluation.scalars[train_params.best_weights[1]]
KeyError: 'b8/text-f1'

I wonder if there is anything wrong when I ran the scripts.

BTW, it takes a long time to load the data. Is there any way to load a tiny part of the dataset for fast debugging?

Thanks in advance.

ResourceExhaustedError: OOM when allocating tensor with shape

I've run successfully every script and can run the test-scripts with pre-trained models. But I constantly fail to execute the Training scripts, because I'll get a resouceexhausted error and don't know why, i.e. what to do and to overcome it.
my Environment: ubuntu 16.04, GTX 970, python 3.5
details see attached stdout
abalata_triviaqa_errors.zip

Support to direct PDF doc

Thanks @chrisc36 ,and your Team for making documnet Q/A , currently it is not supporting PDF as direct input , i assume the PDF needs to be processed in the form of <title > and < para > ,using your science parse repo , is there a way to directly take PDF doc as an input with built in pdf processing script in your DOC Q /A or , it will be great if you can do us this favour . Also can you please tell us , that can Doc QA be used as a search engine , apart from traditonal lucene based elastic search and SOLR serach engines .

Thanks
Ajay sahu - Mumbai / BHARAT

About the EM/F1 score in pre-trained model

Hi Christopher,

Thanks for open-sourcing your code!

When I try to use your provided pre-trained "triviaqa-web-shared-norm" model to test in the dev set. I get the results as follow:

N Paragraphs EM F1
1 0.5807 0.6339
2 0.6307 0.6818
3 0.6446 0.6945
4 0.6498 0.6993
5 0.6532 0.7024
6 0.6543 0.7035
7 0.6554 0.7045
8 0.6562 0.7053
9 0.6568 0.7058
10 0.6570 0.7059
11 0.6573 0.7062
12 0.6575 0.7064
13 0.6576 0.7064
14 0.6577 0.7065
15 0.6577 0.7064

How could I sum up all answers from different paragraphs to get a unique answer? In other words, how could I get the output of EM/F1=66.37/71.32, which is reported by the paper?

Thanks a lot for your time!
YeDeming

Demo link not working

The link given in the README.md is not working - "A demo of this work can be found at documentqa.allenai.org".

Converting models in Tensorflow 1.6

I'm using tensorflow 1.6 due to newer version of CUDA. After changing few imports the code is working fine, except transferring trained models to cpu. There is no RNNParamsSaveable class in 1.6 version, so I changed it to CudnnGRUSaveable with another parameters:
params_saveable = cudnn_rnn_ops.CudnnGRUSaveable(fw_params, 1, dim, 400, direction="bidirectional")
But I'm getting an error:
2018-05-22 13:45:03.151296: F tensorflow/contrib/cudnn_rnn/kernels/cudnn_rnn_ops.cc:203] Check failed: offset + size <= device_memory.size() The slice is not within the region of DeviceMemory.
I have GeForce GTX 1080 Ti, 12 Gb, so probably I have enough memory, but still doing something wrong.
Just wondering maybe somebody knows how to resolve this problem.

how to expand span for list of answer items?

whenever I ask a question which then shall produce a longer list of items as answer, it mostly/only takes the first item as answer. How can that be extended?

I successfully wrote convert_to_cpu for elmo but inference takes time on cpu

With the help of issues and comments reported here on convert_to_cpu.py for ELMO I wrote the script.
I am able to run the ELMO model on CPU, but the converted model is taking time for inference. Also results for GPU and CPU model is slightly different in terms of score for same trained model.

Output of convert to cpu model

Rebuilding
These should be close:
[False]
[array([[0.8571025]], dtype=float32)]
[array([[0.84660774]], dtype=float32)]

Demo code

Thank you for sharing this awesome work! Do you have plans to share the code to replicate the demo? I am specially interested in the Web part of it. Is there some some pointers I can follow to create similar demo myself?

What should the vocab file be set to when running on new documents and questions

I want to boot up a demo of this using the server.py file in this repo.
However I'm not sure what I should set the vocab file to be.
in the run_on_user_documents.py file I see you set it to be all the words in the questions the client asks. However if I don't know the words from the questions and documents up front how should I handle this?

In the file train_bidaf.py some imports can't be located

In the file train_bidaf.py some of the imports can't be located. I have searched the codebase and they do not seem to be there?
The problematic imports are :
from docqa.data_processing.paragraph_qa import ContextLenKey, ContextLenBucketedKey, DocumentQaTrainingData from docqa.squad.squad_eval import SentenceSpanEvaluator, SquadSpanEvaluator, BoundedSquadSpanEvaluator

raw data train on

HI
"""quite simply"" ->

What should be the format of new raw data to train on?
not SQuAD , not TriviaQA ..

i see in the 'ablate _xxx' scripts only reference/works on to this two format;
what if I want train on my raw data? classic corpus format? or only SQuAD or TriviaQA format?

thanks

About the learning rate

Hi Christopher,

I feel strange about the high learning rate in ablate_triviaqa.py. I find the learning rate is still 1 and it won't decay in training process.

I am confused about why do you use the high learning rate, and why do you use so many epochs(71 epochs) in sigmoid convince method with the high learning rate.

Does the lower learning rate,likes 0.01, work?

Thanks a lot for your time!
YeDeming

I got key error in running ablate_squad.py

Hi, I have a problem training squad data
When I used mode paragraph or confidence, sharenorm, merge, sigmoid,
I got key error ‘b17/text-f1’ in trainer.py lind624.
How can I fix it?

About multiple sentences in a passage

In SQuAD, each passage contains multiple sentences. When using ELMo, is it better to treat each sentence as a separate sequence of words (so that the passage is a batch of sentences)? Or treat the whole passage as a single sequence of words?

ModuleNotFoundError: No module named 'wikidata_linker_utils.successor_mask'

I get this while trying to run /extraction/full_preprocess.sh ${DATA_DIR} en. I have installed all the dependencies. What could be going wrong?

Demo idea: Allow users to upload documents

The demo is a delight to use, and I want to congratulate the entire team on this incredible accomplishment of document question answering. What would be even more impressive to the end user would be the ability to upload their own documents and then perform question answering on that.

Fine tune language model on squad

I wonder on QA benchmark, whether you directly applied pretrained Elmo model or fine tune the biLM model on squad first?

Error while running the eval scripts on squad dataset using pre-trained models

Tried running the squad_full_document_eval.py script using the pre-trained models. But getting the following error:

Command:

python docqa/eval/squad_full_document_eval.py -c dev docqa/models/squad output.csv

Error Log:

Loading data
Split 48 docs into 536115 paragraphs
Using best weights
Setting up model
Loading word vec glove.840B.300d for SquadCorpus from cache
Had pre-trained word embeddings for 26244 of 27612 words

2017-11-14 16:27:31.392041: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 16:27:31.392130: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-11-14 16:27:31.392162: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
tf session...
Traceback (most recent call last):
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1297, in _run_fn
    self._extend_graph()
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1358, in _extend_graph
    self._session, graph_def.SerializeToString(), status)
  File "/home/administrator/anaconda3/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNParamsSize' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

         [[Node: map_embed/layer_1/forward/CudnnRNNParamsSize = CudnnRNNParamsSize[S=DT_INT32, T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", rnn_mode="gru", seed=87654321, seed2=0](map_embed/layer_1/forward/CudnnRNNParamsSize/num_layers, map_embed/layer_1/forward/CudnnRNNParamsSize/num_units, map_embed/layer_1/forward/CudnnRNNParamsSize/input_size)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "docqa/eval/squad_full_document_eval.py", line 213, in <module>
    main()
  File "docqa/eval/squad_full_document_eval.py", line 195, in main
    not args.no_ema, args.async)[args.corpus]
  File "/home/administrator/document-qa/docqa/trainer.py", line 658, in test
    pred = model.get_predictions_for(input_dict)
  File "/home/administrator/document-qa/docqa/doc_qa_models.py", line 110, in get_predictions_for
    return self._get_predictions_for(is_train, q_embed, q_mask, c_embed, c_mask, answer)
  File "/home/administrator/document-qa/docqa/doc_qa_models.py", line 190, in _get_predictions_for
    context_rep = self.embed_mapper.apply(is_train, context_rep, context_mask)
  File "/home/administrator/document-qa/docqa/nn/layers.py", line 403, in apply
    x = layer.apply(is_train, x, mask)
  File "/home/administrator/document-qa/docqa/nn/recurrent_layers.py", line 158, in apply
    return super().map(is_train, x, mask)
  File "/home/administrator/document-qa/docqa/nn/recurrent_layers.py", line 132, in map
    fw = self._apply_transposed(is_train, x)[0]
  File "/home/administrator/document-qa/docqa/nn/recurrent_layers.py", line 62, in _apply_transposed
    n_params = cell.params_size().eval()
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 541, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4085, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: No OpKernel was registered to support Op 'CudnnRNNParamsSize' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

         [[Node: map_embed/layer_1/forward/CudnnRNNParamsSize = CudnnRNNParamsSize[S=DT_INT32, T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", rnn_mode="gru", seed=87654321, seed2=0](map_embed/layer_1/forward/CudnnRNNParamsSize/num_layers, map_embed/layer_1/forward/CudnnRNNParamsSize/num_units, map_embed/layer_1/forward/CudnnRNNParamsSize/input_size)]]

Caused by op 'map_embed/layer_1/forward/CudnnRNNParamsSize', defined at:
  File "docqa/eval/squad_full_document_eval.py", line 213, in <module>
    main()
  File "docqa/eval/squad_full_document_eval.py", line 195, in main
    not args.no_ema, args.async)[args.corpus]
  File "/home/administrator/document-qa/docqa/trainer.py", line 658, in test
    pred = model.get_predictions_for(input_dict)
  File "/home/administrator/document-qa/docqa/doc_qa_models.py", line 110, in get_predictions_for
    return self._get_predictions_for(is_train, q_embed, q_mask, c_embed, c_mask, answer)
  File "/home/administrator/document-qa/docqa/doc_qa_models.py", line 190, in _get_predictions_for
    context_rep = self.embed_mapper.apply(is_train, context_rep, context_mask)
  File "/home/administrator/document-qa/docqa/nn/layers.py", line 403, in apply
    x = layer.apply(is_train, x, mask)
  File "/home/administrator/document-qa/docqa/nn/recurrent_layers.py", line 158, in apply
    return super().map(is_train, x, mask)
  File "/home/administrator/document-qa/docqa/nn/recurrent_layers.py", line 132, in map
    fw = self._apply_transposed(is_train, x)[0]
  File "/home/administrator/document-qa/docqa/nn/recurrent_layers.py", line 62, in _apply_transposed
    n_params = cell.params_size().eval()
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/python/ops/cudnn_rnn_ops.py", line 597, in params_size
    direction=self._direction)[0]
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/contrib/cudnn_rnn/ops/gen_cudnn_rnn_ops.py", line 286, in cudnn_rnn_params_size
    name=name)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/administrator/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): No OpKernel was registered to support Op 'CudnnRNNParamsSize' with these attrs.  Registered devices: [CPU], Registered kernels:
  <no registered kernels>

         [[Node: map_embed/layer_1/forward/CudnnRNNParamsSize = CudnnRNNParamsSize[S=DT_INT32, T=DT_FLOAT, direction="unidirectional", dropout=0, input_mode="linear_input", rnn_mode="gru", seed=87654321, seed2=0](map_embed/layer_1/forward/CudnnRNNParamsSize/num_layers, map_embed/layer_1/forward/CudnnRNNParamsSize/num_units, map_embed/layer_1/forward/CudnnRNNParamsSize/input_size)]]

Discrepancy between EM/F1 scores logged during training and those output from squad_eval.py

Hi,

Thanks for open-sourcing your code!

I noticed a discrepancy between the EM and F1 scores logged during training and those computed when evaluating the model separately using docqa/eval/squad_eval.py. The difference is significant at the beginning of training, but becomes small by the end of training. It'll be super helpful if you could explain where the difference comes from, and more importantly, which are the "correct" scores.

A couple of disclaimers before I describe in more detail:

I am running python3.4 and not >=3.5.
I am using json package and not ujson package in read_data.py.

Unfortunately, my compute environment does not allow me to change the above to see if the problem persists with python3.5 and ujson. However, the code runs fine, and I believe the problem to be somewhere else. Please correct me if I am wrong though.

So I am running the paragraph setting on Squad as follows:

python3.4 docqa/scripts/ablate_squad.py paragraph output/squad

And I am evaluating the output checkpoints as follows:

python3.4 docqa/eval/squad_eval.py -o output/squad-1205-140719/dev-output.json -c dev output/squad-1205-140719/ -s <checkpoint_number>

The output scores I see on Tensorboard and from the evaluation script are as follows:

Update	Tensorboard Acc	Tensorboard F1	Tensorboard text-EM	Tensorboard text-F1	`squad_eval.py` Acc	`squad_eval.py` F1	`squad_eval.py` text-EM	`squad_eval.py` text-F1
1200	0.4685	0.5886	0.4896	0.6080	0.1755	0.2726	0.1853	0.3044
2400	0.5479	0.6610	0.5700	0.6798	0.4572	0.5786	0.4781	0.5992
3600	0.5647	0.6790	0.5886	0.6966	0.5506	0.6688	0.5746	0.6861
4800	0.5951	0.7077	0.6192	0.7239	0.5842	0.6980	0.6094	0.7154

10800	0.6377	0.7501	0.6675	0.7667	0.6437	0.7508	0.6707	0.7662

As you can see the squad_eval.py is much lower than the Tensorboard performance initially, but catches up with it around update 5000. Later it even becomes slightly better.

I guess my main questions are --

Does this happen in your setup? If not, then it is probably something to do with python3.4 / 3.5.
If this does happen in your setup, can you point to why? Also which performance is the correct one?

The reason why I am interested in the initial performance is because I am running some experiments with only 10% of squad training set. In this case there is a big difference in the performances logged during training and from the evaluation script, similar to the top rows of the table above.

Thanks a lot for your time!
Bhuwan

Unable to install ujson

I am getting following error :
Collecting ujson
Using cached https://files.pythonhosted.org/packages/16/c4/79f3409bc710559015464e5f49b9879430d8f87498ecdc335899732e5377/ujson-1.35.tar.gz
Building wheels for collected packages: ujson
Running setup.py bdist_wheel for ujson ... error
Complete output from command c:\users\chchauha\appdata\local\continuum\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\chchauha\AppData\Local\Temp\pip-install-eldp188c\ujson\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" bdist_wheel -d C:\Users\chchauha\AppData\Local\Temp\pip-wheel-c7rmlig3 --python-tag cp35:
running bdist_wheel
running build
running build_ext
building 'ujson' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools

Failed building wheel for ujson
Running setup.py clean for ujson
Failed to build ujson
Installing collected packages: ujson
Running setup.py install for ujson ... error
Complete output from command c:\users\chchauha\appdata\local\continuum\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\chchauha\AppData\Local\Temp\pip-install-eldp188c\ujson\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\chchauha\AppData\Local\Temp\pip-record-10n1ka9p\install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
building 'ujson' extension
error: Microsoft Visual C++ 14.0 is required. Get it with "Microsoft Visual C++ Build Tools": http://landinghub.visualstudio.com/visual-cpp-build-tools

----------------------------------------

Command "c:\users\chchauha\appdata\local\continuum\anaconda3\python.exe -u -c "import setuptools, tokenize;file='C:\Users\chchauha\AppData\Local\Temp\pip-install-eldp188c\ujson\setup.py';f=getattr(tokenize, 'open', open)(file);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, file, 'exec'))" install --record C:\Users\chchauha\AppData\Local\Temp\pip-record-10n1ka9p\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in C:\Users\chchauha\AppData\Local\Temp\pip-install-eldp188c\ujson\

I am having Python 3.5.2

Error while running convert_to_cpu.py , waiting for a solution

s = os.fspath(s)

TypeError: expected str, bytes or os.PathLike object, not module

Pre-trained model

Hello, I would like to run this project, however I lack the GPU resources to train the model. Could a pre-trained model be shared so that it can be downloaded and the project can quickly be replicated locally?

What is the "none-logit" variable in IndependentBoundsNoAnswerOption for?

As far as I can see in the no answer solution of the paper there is no variable.

Error while running Build_span_corpus.py

There is a name mismatch in What_Ever_Happened_to_Baby_Jane?_(1962_film).txt in web_dev.json and the file What_Ever_Happened_to_Baby_Jane?__(1962_film).txt in location \data\triviaqa\evidence\wikipedia , i fixed that but also after 101 iterations iam getting error please help , i tried my possibilities

A small question about shared-normalization model

Hi Christopher,

I tried to read your code. And I am confused about the batch size in shared-normalization model.

In Class GroupedSpanAnswerEncoder's encode function, I find each paragraph with question and answer as a item. When try to calculate the Shared-Normalization in IndependentBoundsGrouped's predict function, we need the all paragraphs about an answer.

So how could you make the all paragraphs(max to 15) of an answer into the same batch, while the batch_size is fixed.

Thanks a lot for your time!
YeDeming

Converting GPUs models to CPU

Hello,

First of all thank you for publishing a very complete code, and the help you still provide here. I trained your models on other datasets successfully on GPU with the 'ablate_squad.py' script, but when I try to convert them with 'convert.py' I get the following error :

2018-04-18 09:33:51.087843: W tensorflow/core/framework/op_kernel.cc:1202] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key global_step not found in checkpoint

Happening somewhere here :

File "docqa/scripts/convert_to_cpu.py", line 164, in
main()
File "docqa/scripts/convert_to_cpu.py", line 161, in main
convert(args.target_model, args.output_dir, args.best_weights)
File "docqa/scripts/convert_to_cpu.py", line 48, in convert
md.restore_checkpoint(sess)
File "/home/rachel/BIDAF_Allen/document-qa/docqa/model_dir.py", line 88, in restore_checkpoint
saver = tf.train.Saver(var_list)

I'm a bit a loss about what to do, especially since I can load this model fine for other actions (evaluations, resume training & co). Do you have any hindsight into this ?

Thanks for your help,

Rachel

Show_features method not used

Should we remove methods that aren't used because they are making the codebase more difficult to navigate?

About the answer prediction

Hi Christopher,

I still have a small question:

In Class BoundaryPrediction at docqa/nn/span_prediction.py, in function get_best_span(), I find you convert start_logits and end_logits to function best_span_from_bounds().

As far as I am concerned, we should convert tf.nn.log_softmax(start_logits) && tf.nn.log_softmax(end_logits) rather than start_logits&end_logits ?

Thanks a lot for your time!
YeDeming

Why is cosine distance used in document_splitter.py

In the method score_paragraphs(in the class ShallowOpenWebRanker) the cosine distant metric is use.
Should it not be using the dot product metric, because the cosine metric does not take into account magnitude and tfidf is direction and magnitude?
From what I can see this would be similar to saying does the paragraph contain the word and not taking into account how many times the word occurs.

Restoring parameters for training DocumentQA

I'm a PhD candidate of Jonathan Berant's, and we are trying to continue
training from a saved checkpoint using your model DocumentQA.

Is this option supported in the code? and what is the best way to do this?

To be more specific : we use ablate_triviaqa_unfiltered.py as our training script.
and it seems "checkpoint" and "parameter_checkpoint" should support this function.
However it is unclear why there are to different variables for that, and why are they called twice:

in _train_async() in trainer.py:

Line 501: (notive that checkpoint is saved and not
parameter_checkpoint is this a bug? )
if parameter_checkpoint is not None:
print("Restoring parameters from %s" % parameter_checkpoint)
saver = tf.train.Saver()
saver.restore(sess, checkpoint)
saver = None

Line 351:
if checkpoint is not None:
print("Restoring from checkpoint...")
saver.restore(sess, checkpoint)
print("Loaded checkpoint: " + str(sess.run(global_step)))
else:
print("Initializing parameters...")
sess.run(tf.global_variables_initializer())

Thanks!

Latest weights vs best weights

Hi,

Thanks for making the code available.

I found that in the triviaQa (web), the best weights are not dumped [1]. In the docqa/evaluator.py, if it doesn't find the best weights, it continues with the latest checkpoint. However, that might be suboptimal, especially if the model overfits the data.

[1]. In the docqa/scripts/ablate_triviaqa.py the function get_triviaqa_train_params doesn't contain best_weights as a parameter (contrary to ablate_squad.py which does have that parameter, and hence it ends up not saving the best weights).

Please correct me if I am wrong.

Now, the issue I am facing is that while I have model dumps for different sizes of training data (something I need for my comparison), I am not able to truly evaluate as the latest weights might be misrepresentative. Given this, I can pick values from the tensorboard, but I read on another thread that those numbers are optimistic since those numbers are computed over a biased set of paragraphs for a given question. What do you recommend I do in such a case?

train in gpu error

这个代码在训练的时候使用CUDA_VISIBLE_DEVICES=2 python train.py这种方式指定为什么一直在gpu0上运行

EM scores are higher than F1 scores

After training the model on TriviaQA Web, I observed that the exact match scores are higher than the F1 scores on both training and test sets.
To my knowledge, the F1 score is always greater than or equal to the exact score.
I wonder if the labels are swapped.
Here is the command I used:

python docqa/scripts/ablate_triviaqa.py -n 8 shared-norm-600 save/web-shared-norm-600

BTW, could you explain the difference between paragraph-text-em, question-text-em, and text-em-k-tau?
Thanks!

How to use IndependentBoundsNoAnswerOption ?

Please we would like to know how to use the IndependentBoundsNoAnswerOption class, the aim would be to retrain bidaf to perform the task of recognizing wether an answer exist or not for the a question in a paragrpah,

thank you !

Documentation for exp_mask

I understand this is not really a bug but I have no idea how to figure out what the method is doing.
None of the tests hit this method so I can't really debug it.
Sorry for the stupid questions but I'm all out of ways of understanding what it is suppose to do.

Documentation for AttentionAndEncode and ContextOnly

Its not clear why there are 3 implementations of ParagraphQuestionModel.
Would you be able to explain why there is the implementations AttentionAndEncode and ContextOnly and I will document them and issue a PR :)

What happened to UNK?

Hi,

I am using the Elmo model provided in this repo and the UNK token seems to be missing from the squad_train_dev_all_unique_tokens.txt file. I think it is just the last index in the embedding because the embedding has length 106300 while there are 106299 tokens in the file. So, I guess that works for embedding_lookup anyway since the UNK id is left as -1. Is that right?

allenai / document-qa Goto Github PK

document-qa's Introduction

Document QA

Setup

Dependencies

Data

Word Vectors

SQuAD Data

TriviaQA Data

Training

Testing

SQuAD

TriviaQA

User Input

Pre-Trained Models

document-qa's People

Stargazers

Watchers

Forkers

document-qa's Issues

Examples

Example 1

Example 2

Recommend Projects

Recommend Topics

Recommend Org