davidgolub / questiongeneration Goto Github PK

View Code? Open in Web Editor NEW

110.0 110.0 30.0 29.34 MB

License: Other

Python 80.37% Shell 8.31% HTML 1.09% Jupyter Notebook 10.23%

questiongeneration's People

Contributors

Stargazers

Watchers

questiongeneration's Issues

labels file

How to get the labels.txt file in datasets/<dataset_name>/train so that predictions can be done?

Cannot read dev,train,test files

The column names are wrong from Maluuba

The other closed issue did not address this issue: Issue

python -m newsqa.prepro
Preprocessing data type dev
Reading data from source path newsqa/dev.csv
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2566, in get_value
    return libts.get_value_box(s, key)
  File "pandas/_libs/tslib.pyx", line 1017, in pandas._libs.tslib.get_value_box
  File "pandas/_libs/tslib.pyx", line 1025, in pandas._libs.tslib.get_value_box
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in <module>
    main()
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
    prepro(args)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 156, in prepro_each
    answer_char_ranges = question_info['answer_char_ranges']
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/series.py", line 623, in __getitem__
    result = self.index.get_value(self, key)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2574, in get_value
    raise e1
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2560, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/_libs/index.pyx", line 83, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 91, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'answer_char_ranges'

Even after manually changing the column names from 'answer_token_ranges' to 'answer_char_ranges', there continues to be problems parsing the column.

(py3) MYMBP:bidaf MY$ python -m newsqa.prepro
Preprocessing data type dev
Reading data from source path newsqa/dev.csv
Traceback (most recent call last):
  File "pandas/_libs/parsers.pyx", line 1175, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas/_libs/parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
  File "pandas/_libs/parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 2353: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in <module>
    main()
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
    prepro(args)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 133, in prepro_each
    keep_default_na=False)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 455, in _read
    data = parser.read(nrows)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1069, in read
    ret = self._engine.read(nrows)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1839, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 1001, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1130, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1182, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas/_libs/parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
  File "pandas/_libs/parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 2353: invalid continuation byte

error with prepro.py

I met a problem when doing python3 -m newsqa.prepro . Any suggestions?

python3 -m newsqa.prepro
Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 174, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 144, in _get_module_details
code = loader.get_code(mod_name)
File "", line 767, in get_code
File "", line 727, in source_to_code
File "", line 222, in _call_with_frames_removed
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 1
version https://git-lfs.github.com/spec/v1
^
SyntaxError: invalid syntax

TypeError: predict() missing 1 required positional argument: 'answer_features'

tests.language_model_predict_test giving this error.

How is the newsqa/data_test.json file generated?

./scripts.sh

FileNotFoundError: [Errno 2] No such file or directory: 'out/basic/14/eval/test-040000.pklz'
Traceback (most recent call last):
  File "newsqa/evaluate.py", line 93, in <module>
    with open(args.dataset_file) as dataset_file:
FileNotFoundError: [Errno 2] No such file or directory: 'newsqa/data_test.json'```

A problem with dev,train,test csv colums

Hi,

After one follows the instructions to set up the NewQA dataset (https://github.com/Maluuba/newsqa) the columns in dev,test,train files are not ordered according to the requirements in the source code.

First issue is that in the csv files the column "answer_token_ranges" should be "answer_char_ranges" as it is refered the source code (bidaf/newqa/prepro.py)
Then the other issue is with the column order in the header.
The csv files are like this
story_id,story_text,question,answer_token_ranges
294:297|None|None,"41,55,82,100,126,138,165,181,204,219,237",60:61,./cnn/stories/42d01e187213e86f5fe617fe32e716ff7fa3afc4.story

Shouldn’t the header be the following ?
answer_char_ranges,story_text,question,story_id.

Thanks

what does the content in "indices.txt" mean?

Generate multiple questions on paragrpahs

@davidgolub Can you please suggest the code changes to generate multiple questions on a single paragraph. Thanks in advance.

can't download the iob dataset

Are there any other link from where i can download the iob data

Question about base BIDAF model and paper results.

Hello,   I am trying to reproduce your results on your paper. I did not get the same results as you with a BIDAF model (GitHub.com/allenai/bi-att-flow) trained on SQuAD v1 only and tested on newsqa. I used a pretrained BIDAF model. With 24EM and 39 F1 score, you get some 5 points higher results than I have.

What exact model did you use ? In your git, you train a model on SQuAD and « an old dataset ». Which dataset is it ?

I thank you for any hindsight you might have on this.

Exceptions raised in bidaf/squad/utils.py get_2d_spans

python3 -m newsqa.prepro

  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in <module>
    main()
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
    prepro(args)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 239, in prepro_each
    yi0, yi1 = get_word_span(context, xi, answer_start, answer_stop)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/squad/utils.py", line 22, in get_word_span
    spanss = get_2d_spans(context, wordss)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/squad/utils.py", line 13, in get_2d_spans
    raise Exception()
Exception```

“answer_char_ranges” error in prepro.py

python3 -m newsqa.prepro

Preprocessing data type dev
Reading data from source path newsqa/dev.csv
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/indexes/base.py", line 2175, in get_value
return tslib.get_value_box(s, key)
File "pandas/tslib.pyx", line 881, in pandas.tslib.get_value_box (pandas/tslib.c:18246)
File "pandas/tslib.pyx", line 890, in pandas.tslib.get_value_box (pandas/tslib.c:17880)
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in
main()
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
prepro(args)
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
prepro_each(args, 'dev', out_name='dev')
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 156, in prepro_each
answer_char_ranges = question_info['answer_char_ranges']
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/core/series.py", line 601, in getitem
result = self.index.get_value(self, key)
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/indexes/base.py", line 2183, in get_value
raise e1
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3567)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3250)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4289)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13733)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13687)
KeyError: 'answer_char_ranges'

SyntaxError:invalid syntax

Generate NewsQA ans + paragraph data

cd bidaf
python3 -m tests.create_generation_dataset_unsupervised

when it comes to "from squad.utils import get_2d_spans"
error comes out
”“”
SyntaxError: invalid syntax
“”“

otherwise,
in directory "QuestionGeneration/bidaf/squad" , all *.py files' contents look like
"""
version https://git-lfs.github.com/spec/v1
oid sha256:f5a673dbbd173e29e9ea38f1b2091d883583b77b3a4c17144b223fb0f2f9bd09
size 3419
"""
I don't understand what does this mean

Error executing "python3 -m newsqa.prepro"

I have download the datasets from newsqa and placed the train,test and dev folders in the instructed folders.
I'm unable to successfully run this command "python3 -m newsqa.prepro" as it says there is a keyerror

This is the error that I'm getting ::

Preprocessing data type dev
in ptb
Reading data from source path /Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/dev.csv
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 307, in <module>
    main()
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 22, in main
    prepro(args)
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 66, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 163, in prepro_each
    answer_char_ranges = question_info['answer_char_ranges']
  File "/Users/Prathusha/Library/Python/2.7/lib/python/site-packages/pandas/core/series.py", line 767, in __getitem__
    result = self.index.get_value(self, key)
  File "/Users/Prathusha/Library/Python/2.7/lib/python/site-packages/pandas/core/indexes/base.py", line 3132, in get_value
    raise e1
KeyError: 'answer_char_ranges'

davidgolub / questiongeneration Goto Github PK

questiongeneration's People

Contributors

Stargazers

Watchers

Forkers

questiongeneration's Issues

Generate NewsQA ans + paragraph data

Recommend Projects

Recommend Topics

Recommend Org