Coder Social home page Coder Social logo

wenhuchen / time-sensitive-qa Goto Github PK

View Code? Open in Web Editor NEW
60.0 3.0 6.0 688.79 MB

Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"

License: BSD 3-Clause "New" or "Revised" License

Python 8.21% Jupyter Notebook 91.79%

time-sensitive-qa's Introduction

Time-Sensitive-QA

The repo contains the dataset and code for NeurIPS2021 (dataset track) paper Time-Sensitive Question Answering dataset. The dataset is collected by UCSB NLP group and issued under BSD 3-Clause "New" or "Revised" License.

This dataset is aimed to study the existing reading comprehension models' capability to perform temporal reasoning, and see whether they are sensitive to the temporal description in the given question. An example of annotated question-answer pairs are listed as follows: overview

Repo Structure

  • dataset/: this folder contains all the dataset
  • dataset/annotated*: these files are the annotated (passage, time-evolving facts) by crowd-workers.
  • dataset/train-dev-test: these files are synthesized using templates, including both easy and hard versions.
  • BigBird/: all the running code for BigBird models
  • FiD/: all the running code for fusion-in-decoder models

Requirements

  1. BigBird-Specific Requirements
  1. FiD-Specific Requirements

BigBird

Extractive QA baseline model, first switch to the BigBird Conda environment:

Initialize from NQ checkpoint

Running Training (Hard)

    python -m BigBird.main model_id=nq dataset=hard cuda=[DEVICE] mode=train per_gpu_train_batch_size=8

Running Evaluation (Hard)

    python -m BigBird.main model_id=nq dataset=hard cuda=[DEVICE] mode=eval model_path=[YOUR_MODEL]

Initialize from TriviaQA checkpoint

Running Training (Hard)

    python -m BigBird.main model_id=triviaqa dataset=hard cuda=[DEVICE] mode=train per_gpu_train_batch_size=2

Running Evaluation (Hard)

    python -m BigBird.main model_id=triviaqa dataset=hard mode=eval cuda=[DEVICE] model_path=[YOUR_MODEL]

Fusion-in Decoder

Generative QA baseline model, first switch to the FiD Conda environment and downaload the checkpoints from Google Drive:

Initialize from NQ checkpoint

Running Training (Hard)

    python -m FiD.main mode=train dataset=hard model_path=/data2/wenhu/Time-Sensitive-QA/FiD/pretrained_models/nq_reader_base/

Running Evaluation (Hard)

    python -m FiD.main mode=eval cuda=3 dataset=hard model_path=[YOUR_MODEL] 

Running Evalution on Human-Test (Hard)

    python -m FiD.main mode=eval cuda=3 dataset=human_hard model_path=[YOUR_MODEL] 

Initialize from TriviaQA checkpoint

Running Training (Hard)

    python -m FiD.main mode=train dataset=hard model_path=/data2/wenhu/Time-Sensitive-QA/FiD/pretrained_models/tqa_reader_base/

Running Evaluation (Hard)

    python -m FiD.main mode=eval cuda=3 dataset=hard model_path=[YOUR_MODEL] 

Running Evalution on Human-Test (Hard)

    python -m FiD.main mode=eval cuda=3 dataset=human_hard model_path=[YOUR_MODEL] 

Open-Domain Experiments

For build the retriever, I would refer you to https://github.com/wenhuchen/OTT-QA/tree/master/retriever, which is based on DrQA's TF-IDF/BM25 retriever implementation.

License

The data and code are released under BSD 3-Clause "New" or "Revised" License.

Report

Please create an issue or send an email to [email protected] for any questions/bugs/etc.

time-sensitive-qa's People

Contributors

wangxinyilinda avatar wenhuchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

time-sensitive-qa's Issues

Only the First 100 Paragraphs

Hello, It looks like the paragraphs field of the examples includes only the first 100 paragraphs. I wonder if I could get the dataset with full paragraphs. Thank you!

Unable to reproduce the Easy Baseline(FiD).

Hello author. I was unable to reproduce the results in the paper.
I used the hyperparameters of provided in the github repository as well as the hyperparameters provided in the original article for several runs, and could not reproduce the results in the Easy part; In the hard part, I could obtain results similar to the paper.

Easy dev em dev f1 test em test f1
Result in Paper 59.5 66.9 60.5 67.9
Reproduce(3 runs) 55.1 63.9 54.6 64.3

In issue #5 Xinsu Also meet the same problem.
I was wondering if you would release the detail hyperparameters for training Easy part.
Thank you.

Trained Models

Hello, sorry, I have one more question. I was wondering if it's possible that you could release trained checkpoints, especially FiD models trained on the easy version of the dataset.

Dataset description

Could you provide a description of the difference between the data in the dataset folder? From the README file, it is not clear what is the difference between (train/test/dev), annotated (train/test/dev) and human_annotated(train/test/dev)? Which one have you used for evaluation in the paper?

regarding dataset

hi team! can you please provide us with the cleaned version of dataset. The data that you have provided seems very complex to understand. we could save lot of time in exploring and training.

Human-paraphrased easy/hard split?

It seems that for the human-paraphrased sets, the easy and hard splits contain the same data. Is this just how it's constructed or is there a mistake in the data release?

Random seed for generating determininstic human_annotated datas.

The repository does not provide processed human_annotated train/test splits.
Since there is no fixed random seed in Process.ipynb, the generated human_annotated data will contain randomness.
Could the authors provide a deterministic , already processed human_annotation train/test split; or provide the seeds used in the experiments to generate datas , for a fair comparison in subsequent experiments
Thank you.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.