Coder Social home page Coder Social logo

fact_ai_project's Introduction

FACT AI project - University of Amsterdam 2021

This is a reproduction study for the paper Towards Transparent and Explainable Attention Models (ACL 2020)

This code is built on the original authors' repository.

Structure and overview

  • The job_scripts directory contains some of the scripts we used to train the models on the lisa cluster.
  • The Notes directory contains our notes while working on this project.
  • The Transparency directory on the master branch of this repository contains a duplication of the original code with some minor additions:
    • We added a seeding mechanism
    • We added the flexibility to separate the train phase and the experiment phase. This makes it easier to handle large datasets that take a long compute time.

other branches

In the other branches we tested out some hypothesis and extensions of the models. Here we provide an overview of the different branches, and details for running scripts in different branches are in the sections below:

  • biLSTM contains our extension of using biLSTM as encoder instead of uni-directional LSTM
  • const_attention enables forcing the attention weights of the model to be 1) equal on all hidden representations, or 2) all zeros except for the first hidden representation, or 3) all zeros except for the last hidden representation
  • embedding_params do not fine-tune the pre-trained embeddings if they are used
  • lime contains our experiments that compare attention weights with LIME scores
  • Q_route_fix contains our investigation about whether orthogonalisation should also be applied on the Q-route when training models
  • dataset_analysis was use to run additional experiments.

Installation

  • Clone this repository git clone [email protected]:MotherOfUnicorns/FACT_AI_project.git
  • Move to the project directory cd FACT_AI_project
  • Add the parent directory of Transparency directory (which should be your current directory) to your python path export PYTHONPATH=$PYTHONPATH:$(pwd)

Requirements

  • Python 3.6 or 3.7
  • Install all required packages as specified:
    pip install -r requirements.txt
    
  • To run the LIME experiments, also
    pip install lime
    
  • Install the required packages and download the spacy en model:
    python -m spacy download en
    
  • Download some nltk taggers needed for the experiments:
    python -c "import nltk; nltk.download('averaged_perceptron_tagger'); nltk.download('universal_tagset')"
    

Preparing the Datasets

Each dataset has a separate jupyter notebook in the Transparency/preprocess folder. Follow the instructions in the jupyter notebooks to download and preprocess the datasets.

Alternatively, we have made our pre-processed datasets available on the lisa cluster at /home/lgpu0136/project/Transparency/preprocess

Training & Running Experiments (on the master branch)

How to train and experiment with models using different encoders, including vanilla_lstm, ortho_lstm, diversity_lstm:

single input sequence tasks

Datasets available are sst, imdb, yelp, amazon, 20News_sports, tweet.

An example to train and test the orthogonal LSTM model on imdb dataset:

dataset_name=imdb
model_name=ortho_lstm
output_path=./experiments
python Transparency/train_and_run_experiments_bc.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}

To use the diversity_lstm model, an additional --diversity flag is needed to specify the diversity weight:

python Transparency/train_and_run_experiments_bc.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder diversity_lstm --diversity 0.5

dual input sequences tasks

Datasets available are snli, qqp, babi_1, babi_2, babi_3, cnn.

An example to train and test the orthogonal LSTM model on babi_1 dataset:

dataset_name=babi_1
model_name=ortho_lstm
output_path=./experiments
python Transparency/train_and_run_experiments_qa.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}

Similarly, to use the diversity_lstm model, an additional --diversity flag is needed to specify the diversity weight:

python Transparency/train_and_run_experiments_qa.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name} --diversity 0.5

other arguments

Using the --job_type flag, you can specify whether you want to only train the model:

--job_type train

or to only run the experiments when you already have a trained model:

--job_type experiment

The default behaviour if you don't specify this flag is performing both train and experiment, i.e. equivalent to

--job_type both

Using the --seed flag, you can manually set a random seed. If unspecified, the default seed is zero, i.e. equivalent to

--seed 0

branch-specific tasks

The following are only available in the specified branches. To use them, first move to the correct branch:

git checkout [branchname]

biLSTM branch

The --encoder flag now can accept either one of these arguments:

  • vanilla_lstm
  • ortho_lstm
  • diversity_lstm
  • bi_lstm
  • ortho_bi_lstm
  • diversity_bi_lstm

const_attention branch

Using the --attention flag, you can switch between different attention weight distributions:

  • tanh: normal unconstrained attention weight with tanh activation
  • equal: equal attention weights to all words in the sentence
  • first_only: all attention weights are concentrated on the first word
  • last_only: all attention weights are concentrated on the last word (equivalent to LSTM only without attention)

embedding_params branch

The utilities in the const_attention branch are also available here.

lime branch

First make sure you have a trained model, then use the Transparency/lime_explanation_bc.py or Transparency/lime_explanation_qa.py script to compare attention weights with LIME scores, depending on the specific dataset.

single input sequence tasks

Datasets available are sst, imdb, yelp, amazon, 20News_sports, tweet.

An example to run the LIME experiment on the imdb dataset with ortho_lstm encoder:

dataset_name=imdb
model_name=ortho_lstm
output_path=./experiments # make sure this the same directory as when you trained your model
python Transparency/lime_explanation_bc.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}

The LIME outputs will also be found in ${output_path}.

When using diversity_lstm as the encoder, an additional --diversity flag is needed to specify the diversity weight.

dual input sequences tasks

Datasets available are snli, qqp, babi_1, babi_2, babi_3, cnn.

An example to run the LIME experiment on the babi_1 dataset with ortho_lstm encoder:

dataset_name=babi_1
model_name=ortho_lstm
output_path=./experiments # make sure this the same directory as when you trained your model
python Transparency/lime_explanation_qa.py --dataset ${dataset_name} --data_dir Transparency --output_dir ${output_path} --encoder ${model_name}

The LIME outputs will also be found in ${output_path}.

When using diversity_lstm as the encoder, an additional --diversity flag is needed to specify the diversity weight.

Q_route_fix branch

Same arguments as the master branch are accepted here. But when the ortho_lstm encoder is used for any dual input sequence tasks, only the P-path is orthogonalised, and not the Q-path.

Our Results

For as long as they persist on the lisa cluster, our results (trained models + various experiments) are available to view at:

  • /home/lgpu0136/experiments
  • /home/lgpu0136/experiments_attentions
  • /home/lgpu0136/experiments_bilstm
  • /home/lgpu0136/experiments_fixed_embeddings
  • /home/lgpu0136/experiments_q_route

fact_ai_project's People

Contributors

akashkm99 avatar frankverhoef avatar motherofunicorns avatar ptrbwmn avatar rvdweerd avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

fact_ai_project's Issues

redundant random seed

should remove redundant random seeds set in Binary_Classification.py and Question_Answering.py

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.