Coder Social home page Coder Social logo

questiongeneration's People

Contributors

davidgolub avatar helloworld12342 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

questiongeneration's Issues

labels file

How to get the labels.txt file in datasets/<dataset_name>/train so that predictions can be done?

Cannot read dev,train,test files

The column names are wrong from Maluuba

The other closed issue did not address this issue: Issue

python -m newsqa.prepro
Preprocessing data type dev
Reading data from source path newsqa/dev.csv
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2566, in get_value
    return libts.get_value_box(s, key)
  File "pandas/_libs/tslib.pyx", line 1017, in pandas._libs.tslib.get_value_box
  File "pandas/_libs/tslib.pyx", line 1025, in pandas._libs.tslib.get_value_box
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in <module>
    main()
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
    prepro(args)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 156, in prepro_each
    answer_char_ranges = question_info['answer_char_ranges']
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/series.py", line 623, in __getitem__
    result = self.index.get_value(self, key)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2574, in get_value
    raise e1
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2560, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/_libs/index.pyx", line 83, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 91, in pandas._libs.index.IndexEngine.get_value
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'answer_char_ranges'

Even after manually changing the column names from 'answer_token_ranges' to 'answer_char_ranges', there continues to be problems parsing the column.

(py3) MYMBP:bidaf MY$ python -m newsqa.prepro
Preprocessing data type dev
Reading data from source path newsqa/dev.csv
Traceback (most recent call last):
  File "pandas/_libs/parsers.pyx", line 1175, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas/_libs/parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
  File "pandas/_libs/parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 2353: invalid continuation byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in <module>
    main()
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
    prepro(args)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 133, in prepro_each
    keep_default_na=False)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 709, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 455, in _read
    data = parser.read(nrows)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1069, in read
    ret = self._engine.read(nrows)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/site-packages/pandas/io/parsers.py", line 1839, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 902, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 924, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 1001, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 1130, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas/_libs/parsers.pyx", line 1182, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas/_libs/parsers.pyx", line 1281, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas/_libs/parsers.pyx", line 1297, in pandas._libs.parsers.TextReader._string_convert
  File "pandas/_libs/parsers.pyx", line 1539, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 2353: invalid continuation byte

error with prepro.py

I met a problem when doing python3 -m newsqa.prepro . Any suggestions?

python3 -m newsqa.prepro
Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 174, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 144, in _get_module_details
code = loader.get_code(mod_name)
File "", line 767, in get_code
File "", line 727, in source_to_code
File "", line 222, in _call_with_frames_removed
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 1
version https://git-lfs.github.com/spec/v1
^
SyntaxError: invalid syntax

How is the newsqa/data_test.json file generated?

./scripts.sh

FileNotFoundError: [Errno 2] No such file or directory: 'out/basic/14/eval/test-040000.pklz'
Traceback (most recent call last):
  File "newsqa/evaluate.py", line 93, in <module>
    with open(args.dataset_file) as dataset_file:
FileNotFoundError: [Errno 2] No such file or directory: 'newsqa/data_test.json'```

A problem with dev,train,test csv colums

Hi,

After one follows the instructions to set up the NewQA dataset (https://github.com/Maluuba/newsqa) the columns in dev,test,train files are not ordered according to the requirements in the source code.

  1. First issue is that in the csv files the column "answer_token_ranges" should be "answer_char_ranges" as it is refered the source code (bidaf/newqa/prepro.py)
  2. Then the other issue is with the column order in the header.
    The csv files are like this
    story_id,story_text,question,answer_token_ranges
    294:297|None|None,"41,55,82,100,126,138,165,181,204,219,237",60:61,./cnn/stories/42d01e187213e86f5fe617fe32e716ff7fa3afc4.story

Shouldn’t the header be the following ?
answer_char_ranges,story_text,question,story_id.

Thanks

Question about base BIDAF model and paper results.

Hello, 

I am trying to reproduce your results on your paper. I did not get the same results as you with a BIDAF model (GitHub.com/allenai/bi-att-flow) trained on SQuAD v1 only and tested on newsqa. I used a pretrained BIDAF model. With 24EM and 39 F1 score, you get some 5 points higher results than I have.

What exact model did you use ? In your git, you train a model on SQuAD and « an old dataset ». Which dataset is it ?

I thank you for any hindsight you might have on this.

Exceptions raised in bidaf/squad/utils.py get_2d_spans

python3 -m newsqa.prepro

  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/Users/MY/anaconda/envs/py3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in <module>
    main()
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
    prepro(args)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/newsqa/prepro.py", line 239, in prepro_each
    yi0, yi1 = get_word_span(context, xi, answer_start, answer_stop)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/squad/utils.py", line 22, in get_word_span
    spanss = get_2d_spans(context, wordss)
  File "/Users/MY/Downloads/QuestionGeneration/bidaf/squad/utils.py", line 13, in get_2d_spans
    raise Exception()
Exception```

“answer_char_ranges” error in prepro.py

python3 -m newsqa.prepro

Preprocessing data type dev
Reading data from source path newsqa/dev.csv
0it [00:00, ?it/s]
Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/indexes/base.py", line 2175, in get_value
return tslib.get_value_box(s, key)
File "pandas/tslib.pyx", line 881, in pandas.tslib.get_value_box (pandas/tslib.c:18246)
File "pandas/tslib.pyx", line 890, in pandas.tslib.get_value_box (pandas/tslib.c:17880)
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"main", mod_spec)
File "/anaconda/envs/py35/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 300, in
main()
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 18, in main
prepro(args)
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 62, in prepro
prepro_each(args, 'dev', out_name='dev')
File "/home/adminye/QuestionGeneration/bidaf/newsqa/prepro.py", line 156, in prepro_each
answer_char_ranges = question_info['answer_char_ranges']
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/core/series.py", line 601, in getitem
result = self.index.get_value(self, key)
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/indexes/base.py", line 2183, in get_value
raise e1
File "/anaconda/envs/py35/lib/python3.5/site-packages/pandas/indexes/base.py", line 2169, in get_value
tz=getattr(series.dtype, 'tz', None))
File "pandas/index.pyx", line 105, in pandas.index.IndexEngine.get_value (pandas/index.c:3567)
File "pandas/index.pyx", line 113, in pandas.index.IndexEngine.get_value (pandas/index.c:3250)
File "pandas/index.pyx", line 161, in pandas.index.IndexEngine.get_loc (pandas/index.c:4289)
File "pandas/src/hashtable_class_helper.pxi", line 732, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13733)
File "pandas/src/hashtable_class_helper.pxi", line 740, in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:13687)
KeyError: 'answer_char_ranges'

SyntaxError:invalid syntax

Generate NewsQA ans + paragraph data

cd bidaf
python3 -m tests.create_generation_dataset_unsupervised

when it comes to "from squad.utils import get_2d_spans"
error comes out
”“”
SyntaxError: invalid syntax
“”“

otherwise,
in directory "QuestionGeneration/bidaf/squad" , all *.py files' contents look like
"""
version https://git-lfs.github.com/spec/v1
oid sha256:f5a673dbbd173e29e9ea38f1b2091d883583b77b3a4c17144b223fb0f2f9bd09
size 3419
"""
I don't understand what does this mean

Error executing "python3 -m newsqa.prepro"

I have download the datasets from newsqa and placed the train,test and dev folders in the instructed folders.
I'm unable to successfully run this command "python3 -m newsqa.prepro" as it says there is a keyerror

This is the error that I'm getting ::

Preprocessing data type dev
in ptb
Reading data from source path /Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/dev.csv
0it [00:00, ?it/s]
Traceback (most recent call last):
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 307, in <module>
    main()
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 22, in main
    prepro(args)
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 66, in prepro
    prepro_each(args, 'dev', out_name='dev')
  File "/Users/Prathusha/research/QuestionGeneration/bidaf/newsqa/prepro.py", line 163, in prepro_each
    answer_char_ranges = question_info['answer_char_ranges']
  File "/Users/Prathusha/Library/Python/2.7/lib/python/site-packages/pandas/core/series.py", line 767, in __getitem__
    result = self.index.get_value(self, key)
  File "/Users/Prathusha/Library/Python/2.7/lib/python/site-packages/pandas/core/indexes/base.py", line 3132, in get_value
    raise e1
KeyError: 'answer_char_ranges'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.