Coder Social home page Coder Social logo

batzner / indrnn Goto Github PK

View Code? Open in Web Editor NEW
515.0 39.0 129.0 609 KB

TensorFlow implementation of Independently Recurrent Neural Networks

Home Page: https://arxiv.org/abs/1803.04831

License: Apache License 2.0

Python 100.00%
tensorflow indrnn rnn paper-implementations

indrnn's Introduction

Independently Recurrent Neural Networks

Simple TensorFlow implementation of Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN by Shuai Li et al. The author's original implementation in Theano and Lasagne can be found in Sunnydreamrain/IndRNN_Theano_Lasagne.

Build Status

Summary

In IndRNNs, neurons in recurrent layers are independent from each other. The basic RNN calculates the hidden state h with h = act(W * input + U * state + b). IndRNNs use an element-wise vector multiplication u * state meaning each neuron has a single recurrent weight connected to its last hidden state.

The IndRNN

  • can be used efficiently with ReLU activation functions making it easier to stack multiple recurrent layers without saturating gradients
  • allows for better interpretability, as neurons in the same layer are independent from each other
  • prevents vanishing and exploding gradients by regulating each neuron's recurrent weight

Usage

Copy ind_rnn_cell.py into your project.

from ind_rnn_cell import IndRNNCell

# Regulate each neuron's recurrent weight as recommended in the paper
recurrent_max = pow(2, 1 / TIME_STEPS)

cell = MultiRNNCell([IndRNNCell(128, recurrent_max_abs=recurrent_max),
                     IndRNNCell(128, recurrent_max_abs=recurrent_max)])
output, state = tf.nn.dynamic_rnn(cell, input_data, dtype=tf.float32)
...

Experiments in the paper

Addition Problem

See examples/addition_rnn.py for a script reproducing the "Adding Problem" from the paper. Below are the results reproduced with the addition_rnn.py code.

https://github.com/batzner/indrnn/raw/master/img/addition/TAll.png

Sequential MNIST

See examples/sequential_mnist.py for a script reproducing the Sequential MNIST experiment. I let it run for two days and stopped it after 60,000 training steps with a

  • Training error rate of 0.7%
  • Validation error rate of 1.1%
  • Test error rate of 1.1%

https://github.com/batzner/indrnn/raw/master/img/sequential_mnist/errors.png

Requirements

  • Python 3.4+
  • TensorFlow 1.5+

indrnn's People

Contributors

batzner avatar edemeijer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

indrnn's Issues

Result of Sequential MNIST

Hi,

First, thanks a lot for this example.
I just noticed that you wrote "I let it run for two days and stopped it after 60,000 training steps ". In your example, LEARNING_RATE_DECAY_STEPS=600000 meaning learning rate starts to drop after 600,000 training steps. Does this mean that the result shown on your page is obtained before decaying the learning rate?

If this is the case, further dropping the learning rate might improve the performance.
Also, from your results, although the validation error keeps dropping, it drops relatively slow. So I think maybe by setting LEARNING_RATE_DECAY_STEPS=20000, it gives you a better result than the one you presented (although it is not the best).

By the way, "I let it run for two days and stopped it after 60,000 training steps", it seems much slower than mine. I am not very familiar with TensorFlow, but did it compute the input*W for all the time steps together? If not, then I would suggest removing this process from the IndRNN cell and add an extra layer computing this. This could improve the efficiency a lot I think.

Thanks.

Initialization of recurrent weight for the adding problem or similar problems

Hi,

Thanks for the implementation.
I have one comment on "why IndRNN on 5000 time steps does not work in your implementation". I think it is related to the initialization of the recurrent weight.
As shown in the paper, to keep long-term memory, the recurrent weight needs to be around 1. For the adding problem, and for the last IndRNN layer, only the last output is useful. Therefore, there is no need to keep the short-term memory. Accordingly, the recurrent weight of the last IndRNN layer can be initialized to be all 1 or a range (1-epsilon, 1+epsilon) where epsilon is small. By the way, for relu, the recurrent weight for the other layers can be initialized to (0, recurr_max) without the negative part, and it is initialized with Uniform distribution to keep all kinds of memory.

In your implementation, only 128 units are used. With the uniform distribution for the last IndRNN layer, the number of units that can keep long-term memory is very small, thus making it very hard to solve tasks of long sequences.

This applies to other tasks that only require the final output such as mnist classification and action recognition.

Could you please give it a try? It works on my end.

Thanks.

Errors when used in bidirectional_dynamic_rnn

when i use IndRNN in bidirectional rnn, it raises an error,
ValueError:cannot use '/bidirectional_rnn/fw/fw/while/ind_rnn/cell/Mul_1' as input to 'bidirectional_rnn/fw/fw/while/fw/ind_rnn_cell/clip_by_value' because they are in different while loops.

snippet of my code

recurrent_max = pow(2, 1.0 / time_steps)           
fw _rnn_cell = IndRNNCell(hidden_size, recurrent_max_abs=recurrent_max)
bw_rnn_cell = IndRNN(hidden_size, recurrent_max_abs=recurrent_max)
bi_states, _ = tf.nn.bidirectional_dynamic_rnn(
                    fw_rnn_cell,
                    bw_rnn_cell,
                    inputs,
                    sequence_length=lengths,
                    dtype=tf.float32
                )

any body encounter this problem? how to solve it?

Performance issues in the program

Hello,I found a performance issue in the definition of get_training_set ,
batzner/indrnn/blob/master/examples/sequential_mnist.py,
dataset.map was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.

The same issues also exist in dataset = dataset.map(preprocess_data) and
dataset = dataset.map(preprocess_data)

Here is the documemtation of tensorflow to support this thing.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Cell structure

Hello!
I noticed that your implementation of indrnn is the basic version(not the residual version).
And in original paper, there should two batch normalization operations after the cell's input and before activation layer. Though it's not required in indrnn's structure, but it's 'recommended' in the paper :)

Dimension Mismatch ?

line 149 in idn_rnn_cell.py
gate_inputs = math_ops.matmul(inputs, self._input_kernel)

inputs: Tensor, 2-D tensor of shape [batch, num_units].

self._input_kernel = self.add_variable( "input_kernel", shape=[input_depth, self._num_units], initializer=self._input_initializer)

self._input_kernel: shape=[input_depth, self._num_units]

Constrain (0, max) instead of (-max, max)?

-self._recurrent_max,

Hey, I checked your implementation of the paper and noticed that instead of constraining the recurrent weight between 0 and max, where max is pow(2, 1/T), you constrain between -max and max. Since this matrix is applied element wise for every time step, wouldn't negative weights potentially result in outputs oscillating between positive and negative signs? This might explain why your version did not converge as fast as in the paper.

If this is the case, the I think the standard weight initialization might not be optimal since it's centered around 0, so half of the weights would be immediately truncated to 0. Maybe a uniform distribution between 0 and max might help initial convergence. Just a thought.

Let me know what you think. I feel like this architecture night be very promising because of it's simplicity and I'd love to see more results.

ValueError Issu

Hi,
Thanks a lot for your work! However when I run the example code I got a Value Error:
'Variable rnn/multi_rnn_cell/cell_0/ind_rnn_cell/input_kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:...'
Could you help to figure out what was went wrong?
Tanks again!

Probably wrong indexing

in ind_rnn_cell.py:

def build(self, inputs_shape):
    if inputs_shape[1].value is None:
      raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s"
                       % inputs_shape)

if inputs_shape[1].value is None: should be if inputs_shape[-1].value is None: because we need to check if depth is defined

ReLU activation with IndyLSTMCell

Sorry to open an issue here. However, I think that you are an expert on this topic, and IndRNNCell related upgrade in TensorFlow 1.10.0 might also be created by you.

I am trying to apply relu activation to IndyLSTMCell in the new TensorFlow (1.10). However, the loss becomes NAN after I make that change. The default activation tanh works for this cell. IndyGRUCell has the same problem. For IndRNNCell, both tanh and relu work. However, when I stack it to multiple layer, I did not see any performance (capacity) increase of the model (concerning the loss decrease speed over training epochs).

Can you please give me a hint to address this? Any suggestions would be very appreciated. Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.