Coder Social home page Coder Social logo

identifiable-transformers's Introduction

More Identifiable yet Equally Performant Transformers for Text Classification

This repository helps:

  • Someone who is looking for a quick transformer-based classifier with low computation budget.
  • Simple data format
  • Simple environment setup
  • Quick identifiability
  • Someone who wants to tweak the size of key vector and value vector, independently.
  • Someone who wants to make their analysis of attention weights more reliable. How? see below...

How to make your attention weights more reliable?

As shown in our work (experimentally and theoretically)- for a given input X, a set of attention weights A, and output transformer prediction probabilities Y, if we can find another set of attention (architecture generatable) weights A* satisfying X-Y pair, analysis performed over A is prone to be inaccurate.

Idea:

  • decrease the size of key vector,
  • increase the size of value vector and perform the addition of head outputs.

Our paper: R. Bhardwaj, ‪N. Majumder, S. Poria, E. Hovy. More Identifiable yet Equally Performant Transformers for Text Classification. ACL 2021. (the latest version is available here.)

Simple python setup

  • I have tried on Python 3.9.2, (since the dependencies are kept as low as possible, should be easy to run/adapt on other Python versions.)
  • PyTorch version 1.8.1
  • Torchtext version 0.9.1
  • Pandas version 0.9.1

How to run the classifier?

declare@lab:~$ python text_classifier.py -dataset data.csv

Note: Feel free to replace data.csv with your choice of text classification problem, be it sentiment, news topic, reviews, etc.

data.csv (format)

should be two columns, header of the column with labels is "label" and text is "text". For example:

text label
we love NLP 5
I ate too much, feeling sick 1

In house datasets

BTW, you can try to run on Torchtext provided datasets for classification. For AG_NEWS dataset,

declare@lab:~$ python text_classifier.py -kdim 64 -dataset ag_news

For quick experiments on variety of text classification datasets, replace ag_news with imdb for IMDb, sogou for SogouNews, yelp_p for YelpReviewPolarity , yelp_f for YelpReviewFull, amazon_p for AmazonReviewPolarity, amazon_f for AmazonReviewFull, yahoo for YahooAnswers, dbpedia for DBpedia.

Want to customize it for more identifiability?

Keep low k-dim and/or switch head addition by using the flag add_heads. Feel free to analyze attention weights for inputs with lengths up to embedding dim that is specified by embedim arguments while running the command below.

declare@lab:~$ python text_classifier.py -kdim 16 -add_heads -dataset ag_news -embedim 256

Note:

  • Lower k-dim may/may not impact the classification accuracy, please keep the possible trade-off in the bucket during experiments.
  • It is recommended to keep embedim close to maximum text length (see max_text_len parameter below). However, make sure you do not overparametrize the model to make attention weights identifiable for large text lengths.

Tweak classifier parameters

  • batch: training batch size (default = 64).
  • nhead: number of attention heads (default = 4).
  • epochs: number training epochs (default = 10).
  • lr: learning rate (default = 0.001).
  • dropout: dropout regularization parameter (default = 0.1).
  • vocab_size: set threshold on vocabular size (default = 100000).
  • max_text_len: trim the text longer than this value (default = 512).
  • test_frac: only for user specified datasets, fraction of test set from the specified data set (default = 0.3).
  • valid_frac: fraction of training samples kept aside for model development (default = 0.3).
  • kdim: dimensions of key (and query) vector (default = 16).
  • add_heads: mention if replace concatenation with addition of multi-head outputs.
  • pos_emb: mention if need positional embedding.
  • return_attn: mention if attention tensors are to be returned from the model.
  • embedim: decides dimension of token vectors and value vector, i.e.,
add_heads vdim
False
True embedim

Citation

R. Bhardwaj, ‪N. Majumder, S. Poria, E. Hovy. More Identifiable yet Equally Performant Transformers for Text Classification. ACL 2021.

Note: Please cite our paper if you find this repository useful. The latest version is available here.

identifiable-transformers's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

terminalai

identifiable-transformers's Issues

about the hyperparameters

Dear author, thanks for publishing the code. I try to reproduce the exp results in the paper, but fail to get the accuracy reported in the paper. I wonder if you can tell me the hyperparameters in training transformer. It seems there is only configuration for one dataset ag_news.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.