This is a reproduction study for the paper Towards Transparent and Explainable Attention Models (ACL 2020)
This code is built on the original authors' repository.
- The
job_scripts
directory contains some of the scripts we used to train the models on the lisa cluster. - The
Notes
directory contains our notes while working on this project. - The
Transparency
directory on themaster
branch of this repository contains a duplication of the original code with some minor additions:- We added a seeding mechanism
- We added the flexibility to separate the train phase and the experiment phase. This makes it easier to handle large datasets that take a long compute time.
In the other branches we tested out some hypothesis and extensions of the models. Here we provide an overview of the different branches, and details for running scripts in different branches are in the sections below:
biLSTM
contains our extension of using biLSTM as encoder instead of uni-directional LSTMconst_attention
enables forcing the attention weights of the model to be 1) equal on all hidden representations, or 2) all zeros except for the first hidden representation, or 3) all zeros except for the last hidden representationembedding_params
do not fine-tune the pre-trained embeddings if they are usedlime
contains our experiments that compare attention weights with LIME scoresQ_route_fix
contains our investigation about whether orthogonalisation should also be applied on the Q-route when training modelsdataset_analysis
was use to run additional experiments.
- Clone this repository
git clone [email protected]:MotherOfUnicorns/FACT_AI_project.git
- Move to the project directory
cd FACT_AI_project
- Add the parent directory of
Transparency
directory (which should be your current directory) to your python pathexport PYTHONPATH=$PYTHONPATH:$(pwd)
- Python 3.6 or 3.7
- Install all required packages as specified:
pip install -r requirements.txt
- To run the LIME experiments, also
pip install lime
- Install the required packages and download the spacy en model:
python -m spacy download en
- Download some nltk taggers needed for the experiments:
python -c "import nltk; nltk.download('averaged_perceptron_tagger'); nltk.download('universal_tagset')"
Each dataset has a separate jupyter notebook in the Transparency/preprocess
folder.
Follow the instructions in the jupyter notebooks to download and preprocess the datasets.
Alternatively, we have made our pre-processed datasets available on the lisa cluster at /home/lgpu0136/project/Transparency/preprocess
How to train and experiment with models using different encoders, including vanilla_lstm, ortho_lstm, diversity_lstm
:
Datasets available are sst, imdb, yelp, amazon, 20News_sports, tweet
.
An example to train and test the orthogonal LSTM model on imdb
dataset:
dataset_name=imdb
model_name=ortho_lstm
output_path=./experiments
python Transparency/train_and_run_experiments_bc.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}
To use the diversity_lstm
model, an additional --diversity
flag is needed to specify the diversity weight:
python Transparency/train_and_run_experiments_bc.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder diversity_lstm --diversity 0.5
Datasets available are snli, qqp, babi_1, babi_2, babi_3, cnn
.
An example to train and test the orthogonal LSTM model on babi_1
dataset:
dataset_name=babi_1
model_name=ortho_lstm
output_path=./experiments
python Transparency/train_and_run_experiments_qa.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}
Similarly, to use the diversity_lstm
model, an additional --diversity
flag is needed to specify the diversity weight:
python Transparency/train_and_run_experiments_qa.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name} --diversity 0.5
Using the --job_type
flag, you can specify whether you want to only train the model:
--job_type train
or to only run the experiments when you already have a trained model:
--job_type experiment
The default behaviour if you don't specify this flag is performing both train and experiment, i.e. equivalent to
--job_type both
Using the --seed
flag, you can manually set a random seed.
If unspecified, the default seed is zero, i.e. equivalent to
--seed 0
The following are only available in the specified branches. To use them, first move to the correct branch:
git checkout [branchname]
The --encoder
flag now can accept either one of these arguments:
vanilla_lstm
ortho_lstm
diversity_lstm
bi_lstm
ortho_bi_lstm
diversity_bi_lstm
Using the --attention
flag, you can switch between different attention weight distributions:
tanh
: normal unconstrained attention weight with tanh activationequal
: equal attention weights to all words in the sentencefirst_only
: all attention weights are concentrated on the first wordlast_only
: all attention weights are concentrated on the last word (equivalent to LSTM only without attention)
The utilities in the const_attention
branch are also available here.
First make sure you have a trained model,
then use the Transparency/lime_explanation_bc.py
or Transparency/lime_explanation_qa.py
script to compare attention weights with LIME scores,
depending on the specific dataset.
Datasets available are sst, imdb, yelp, amazon, 20News_sports, tweet
.
An example to run the LIME experiment on the imdb
dataset with ortho_lstm
encoder:
dataset_name=imdb
model_name=ortho_lstm
output_path=./experiments # make sure this the same directory as when you trained your model
python Transparency/lime_explanation_bc.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}
The LIME outputs will also be found in ${output_path}
.
When using diversity_lstm
as the encoder, an additional --diversity
flag is needed to specify the diversity weight.
Datasets available are snli, qqp, babi_1, babi_2, babi_3, cnn
.
An example to run the LIME experiment on the babi_1
dataset with ortho_lstm
encoder:
dataset_name=babi_1
model_name=ortho_lstm
output_path=./experiments # make sure this the same directory as when you trained your model
python Transparency/lime_explanation_qa.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}
The LIME outputs will also be found in ${output_path}
.
When using diversity_lstm
as the encoder, an additional --diversity
flag is needed to specify the diversity weight.
Same arguments as the master
branch are accepted here.
But when the ortho_lstm
encoder is used for any dual input sequence tasks, only the P-path is orthogonalised, and not the Q-path.
For as long as they persist on the lisa cluster, our results (trained models + various experiments) are available to view at:
/home/lgpu0136/experiments
/home/lgpu0136/experiments_attentions
/home/lgpu0136/experiments_bilstm
/home/lgpu0136/experiments_fixed_embeddings
/home/lgpu0136/experiments_q_route