Coder Social home page Coder Social logo

hierarchicalattentionnetworks's Introduction

Hierarchical Attention Networks

This repository contains an implementation of Hierarchical Attention Networks for Document Classification in keras and another implementation of the same network in tensorflow.

Hierarchical Attention Networks consists of the following parts:

  1. Embedding layer
  2. Word Encoder: word level bi-directional GRU to get rich representation of words
  3. Word Attention:word level attention to get important information in a sentence
  4. Sentence Encoder: sentence level bi-directional GRU to get rich representation of sentences
  5. Sentence Attention: sentence level attention to get important sentence among sentences
  6. Fully Connected layer + Softmax

These models have 2 levels of attention: one at the word level and one at the sentence level thereby allowing the model to pay less or more attention to individual words and sentences accordingly when constructing the represenation of a document.

Hierarchical Attn Network

DataSet:

I have used the IMDB Movies dataset from Kaggle, labeledTrainData.tsv which contains 25000 reviews with labels

Preprocessing on the Data:

I have done minimal preprocessing on the input reviews in the dataset following these basic steps:

  1. Remove html tags

  2. Replace non-ascii characters with a single space

  3. Split each review into sentences

Then I create the character set with a max sentence length of 512 chars and set an upper bound of 15 for the max number of sentences per review. The input X is indexed as (document, sentence, char) and the target y has the corresponding sentiments.

Attention Layer Implementation

Attention mechanism layer which reduces Bi-RNN outputs with Attention vector (adapted from the paper)
Args:
    inputs: The Attention inputs.             
            In case of Bidirectional RNN, this must be a tuple (outputs_fw, outputs_bw) containing 
            the forward and the backward RNN outputs `Tensor`.
                If time_major == False (default),
                    outputs_fw is a `Tensor` shaped:
                    `[batch_size, max_time, cell_fw.output_size]`
                    and outputs_bw is a `Tensor` shaped:
                    `[batch_size, max_time, cell_bw.output_size]`.
                If time_major == True,
                    outputs_fw is a `Tensor` shaped:
                    `[max_time, batch_size, cell_fw.output_size]`
                    and outputs_bw is a `Tensor` shaped:
                    `[max_time, batch_size, cell_bw.output_size]`.
    attention_size: Linear size of the Attention weights.
    time_major: The shape format of the `inputs` Tensors.
        If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`.
        If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`.
        Using `time_major = True` is a bit more efficient because it avoids
        transposes at the beginning and end of the RNN calculation.  However,
        most TensorFlow data is batch-major, so by default this function
        accepts input and emits output in batch-major form.
    return_alphas: Whether to return attention coeef variable along with layer's output.
        Used for visualization purpose.
Returns:
    The Attention output `Tensor`.

Requirements:

  1. pandas 0.20.3
  2. tensorflow 1.4.0
  3. keras 2.0.8
  4. numpy 1.14.0

Implementation in Keras

Execution:

python HierarchicalAttn.py

Results & Accuracy:

Accuracy

Implementation in Tensorflow

Execution:

python HierarchicalAttn_tf.py

Results & Accuracy:

Accuracy

hierarchicalattentionnetworks's People

Contributors

arunarn2 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.