Scitail

A repository of the the entailment models used for evaluation in the SciTail: A Textual Entailment Dataset from Science Question Answering paper accepted to AAAI'18. It contains three models built using the PyTorch-based deep-learning NLP library, AllenNLP.

Decomposable Attention (Baseline): A simple model that decomposes the problem into parallelizable attention computations (Parikh et al. 2016). We directly use the AllenNLP implementation (Gardner, et al., 2017) of the decomposable attention model here.
Ngram Overlap (Baseline): A simple word-overlap baseline that uses the proportion of unigrams, 1- skip bigrams, and 1-skip trigrams in the hypothesis that are also present in the premise as three features. We feed these features into a two-layer perceptron.
Decomposable Graph Entailment Model (Proposed): Our proposed decomposed graph entailment model that uses structure from the hypothesis to calculate entailment probabilities for each node and edge in the graph structure and aggregates them for the final entailment computation. Please refer to our paper for more details.

We use the SciTail dataset and pre-trained models by default (downloaded automatically by the scripts/download_data.sh script). The models can be trained and evaluated on new datasets too as described below.

Setup Instruction

Create the scitail environment using Anaconda

conda create -n scitail python=3.6

Activate the environment

source activate scitail

Install the requirements in the environment:

sh scripts/install_requirements.sh

Install pytorch as per instructions on http://pytorch.org/. Commands as of Nov. 22, 2017:

Linux/Mac (no CUDA): conda install pytorch torchvision -c soumith

Linux (with CUDA): conda install pytorch torchvision cuda80 -c soumith

Download the Glove embeddings into a Glove/ folder in the root directory as glove.<tokens>B.<dim>d.txt.gz files.
Test installation

pytest -v

Download the data and models

Run the download_data.sh script to download the dataset and models used in the SciTail paper.

 sh scripts/download_data.sh

This will download and unzip the data to SciTailV1.1 folder (from here) and models to SciTailModelsV1 folder (from here).

Evaluate the SciTail models

To run the trained models on the test sets, run

  sh scripts/evaluate_models.sh

Note that the models include the vocabulary used for training these models. So these pre-trained models will have poor performance on new test sets with a different vocabulary.

View predictions of the SciTail models

To view the model predictions, run

  sh scripts/predict_model.sh

The predictions would be added to the predictions/ folder for each model. Each file has the original examples along with the model's probability, logit predictions, and entailment score using the keys: label_probs, label_logits and score respectively.

Train the SciTail models

To train the models on new datasets, run

  sh scripts/train_models.sh

with the appropriate train/validation sets specified in the training configuration files.

Decomposable Graph Entailment Model: training_config/dgem.json
Decomposable Attention Model: training_config/decompatt.json
NGram Overlap Model: training_config/simple_overlap.json

If you find these models helpful in your work, please cite:

@inproceedings{scitail,
     Author = {Tushar Khot and Ashish Sabharwal and Peter Clark},
     Booktitle = {AAAI},
     Title = {{SciTail}: A Textual Entailment Dataset from Science Question Answering},
     Year = {2018}
}

rioncarter / scitail.allenai Goto Github PK

scitail.allenai's Introduction

Scitail

Setup Instruction

Download the data and models

Evaluate the SciTail models

View predictions of the SciTail models

Train the SciTail models

scitail.allenai's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent