batzner / indrnn Goto Github PK

View Code? Open in Web Editor NEW

515.0 39.0 129.0 609 KB

TensorFlow implementation of Independently Recurrent Neural Networks

Home Page: https://arxiv.org/abs/1803.04831

License: Apache License 2.0

Python 100.00%

tensorflow indrnn rnn paper-implementations

indrnn's Introduction

Independently Recurrent Neural Networks

Simple TensorFlow implementation of Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN by Shuai Li et al. The author's original implementation in Theano and Lasagne can be found in Sunnydreamrain/IndRNN_Theano_Lasagne.

Summary

In IndRNNs, neurons in recurrent layers are independent from each other. The basic RNN calculates the hidden state h with h = act(W * input + U * state + b). IndRNNs use an element-wise vector multiplication u * state meaning each neuron has a single recurrent weight connected to its last hidden state.

The IndRNN

can be used efficiently with ReLU activation functions making it easier to stack multiple recurrent layers without saturating gradients
allows for better interpretability, as neurons in the same layer are independent from each other
prevents vanishing and exploding gradients by regulating each neuron's recurrent weight

Usage

Copy ind_rnn_cell.py into your project.

from ind_rnn_cell import IndRNNCell

# Regulate each neuron's recurrent weight as recommended in the paper
recurrent_max = pow(2, 1 / TIME_STEPS)

cell = MultiRNNCell([IndRNNCell(128, recurrent_max_abs=recurrent_max),
                     IndRNNCell(128, recurrent_max_abs=recurrent_max)])
output, state = tf.nn.dynamic_rnn(cell, input_data, dtype=tf.float32)
...

Experiments in the paper

Addition Problem

See examples/addition_rnn.py for a script reproducing the "Adding Problem" from the paper. Below are the results reproduced with the addition_rnn.py code.

Sequential MNIST

See examples/sequential_mnist.py for a script reproducing the Sequential MNIST experiment. I let it run for two days and stopped it after 60,000 training steps with a

Training error rate of 0.7%
Validation error rate of 1.1%
Test error rate of 1.1%

Requirements

Python 3.4+
TensorFlow 1.5+

indrnn's People

Contributors

Stargazers

Watchers

Forkers

edemeijer ml-lab amoliu franciscogodoy samithaj hyzcn bigmaye shivamgupta211 kervinbill sunnydreamrain shuidongliu fendaq moses1994 geoblog sam9211 kingofoz guo253 xiangliu886 cqqqh zhuofujiang zouxiaoyuonly zkk995 sli1989 johnsonman fjibj zcrwind plutozjc hwangtamu johanwange zhf459 davezj yuzhecd qq542871902 wonyonyon nprokoptsev indrnn scapeqin haswelliris lllhhhqqq iamshreeram dl-deeplearning tangyoubao andasl tonylibing dzwallkilled hellojing89 maleilei lichuankun01 zhangzhao156 gzpan dunchen hxyshare luciencho tenaflyyy sunshinezhihuo zhilangtaosha chenxingqiang namhsing tinyloop jasminexjf renzhong007 lrank eminemrain lichunown njust-taoye mostlyharmless420 kepengxu stevewsw xspring14 htc121 christinaliang noke8868 yushuai oakyms stevenji brantbzhang liybu36 tungk babayara wpfhtl kyridiculous1993 tarsbase xiong233 hanzenglong ammieqi autogyro saber5433 soonhwan-kwon s4sarath caoxu915683474 psyxusheng weidezhang john2912 zengqizhang hitdongfeng nuaasxr morsiesml qqiang00 524633094 ablek

indrnn's Issues

Result of Sequential MNIST

Hi,

First, thanks a lot for this example.
I just noticed that you wrote "I let it run for two days and stopped it after 60,000 training steps ". In your example, LEARNING_RATE_DECAY_STEPS=600000 meaning learning rate starts to drop after 600,000 training steps. Does this mean that the result shown on your page is obtained before decaying the learning rate?

If this is the case, further dropping the learning rate might improve the performance.
Also, from your results, although the validation error keeps dropping, it drops relatively slow. So I think maybe by setting LEARNING_RATE_DECAY_STEPS=20000, it gives you a better result than the one you presented (although it is not the best).

By the way, "I let it run for two days and stopped it after 60,000 training steps", it seems much slower than mine. I am not very familiar with TensorFlow, but did it compute the input*W for all the time steps together? If not, then I would suggest removing this process from the IndRNN cell and add an extra layer computing this. This could improve the efficiency a lot I think.

Thanks.

Initialization of recurrent weight for the adding problem or similar problems

Hi,

Thanks for the implementation.
I have one comment on "why IndRNN on 5000 time steps does not work in your implementation". I think it is related to the initialization of the recurrent weight.
As shown in the paper, to keep long-term memory, the recurrent weight needs to be around 1. For the adding problem, and for the last IndRNN layer, only the last output is useful. Therefore, there is no need to keep the short-term memory. Accordingly, the recurrent weight of the last IndRNN layer can be initialized to be all 1 or a range (1-epsilon, 1+epsilon) where epsilon is small. By the way, for relu, the recurrent weight for the other layers can be initialized to (0, recurr_max) without the negative part, and it is initialized with Uniform distribution to keep all kinds of memory.

In your implementation, only 128 units are used. With the uniform distribution for the last IndRNN layer, the number of units that can keep long-term memory is very small, thus making it very hard to solve tasks of long sequences.

This applies to other tasks that only require the final output such as mnist classification and action recognition.

Could you please give it a try? It works on my end.

Thanks.

How to add dropout to indrnn cells

Looking forward to reply~~

Errors when used in bidirectional_dynamic_rnn

when i use IndRNN in bidirectional rnn, it raises an error,
ValueError:cannot use '/bidirectional_rnn/fw/fw/while/ind_rnn/cell/Mul_1' as input to 'bidirectional_rnn/fw/fw/while/fw/ind_rnn_cell/clip_by_value' because they are in different while loops.

snippet of my code

recurrent_max = pow(2, 1.0 / time_steps)           
fw _rnn_cell = IndRNNCell(hidden_size, recurrent_max_abs=recurrent_max)
bw_rnn_cell = IndRNN(hidden_size, recurrent_max_abs=recurrent_max)
bi_states, _ = tf.nn.bidirectional_dynamic_rnn(
                    fw_rnn_cell,
                    bw_rnn_cell,
                    inputs,
                    sequence_length=lengths,
                    dtype=tf.float32
                )

any body encounter this problem? how to solve it?

Performance issues in the program

Hello,I found a performance issue in the definition of get_training_set ,
batzner/indrnn/blob/master/examples/sequential_mnist.py,
dataset.map was called without num_parallel_calls.
I think it will increase the efficiency of your program if you add this.

The same issues also exist in dataset = dataset.map(preprocess_data) and
dataset = dataset.map(preprocess_data)

Here is the documemtation of tensorflow to support this thing.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Cell structure

Hello!
I noticed that your implementation of indrnn is the basic version(not the residual version).
And in original paper, there should two batch normalization operations after the cell's input and before activation layer. Though it's not required in indrnn's structure, but it's 'recommended' in the paper :)

Dimension Mismatch ?

line 149 in idn_rnn_cell.py
gate_inputs = math_ops.matmul(inputs, self._input_kernel)

inputs: Tensor, 2-D tensor of shape [batch, num_units].

self._input_kernel = self.add_variable( "input_kernel", shape=[input_depth, self._num_units], initializer=self._input_initializer)

self._input_kernel: shape=[input_depth, self._num_units]

AttributeError: module 'tensorflow.python.ops.rnn_cell_impl' has no attribute '_LayerRNNCell'

It seems like a problem to run the code with tensorflow v1.4, can I just replace _LayerRNNCell with RNNCell?

Please release the code of action recognition

In your paper, you have mentioned that the result of action recognition outperform the state-of-the-art work. Could you please release the code of this part?

Thank you a lot!

Constrain (0, max) instead of (-max, max)?

indrnn/ind_rnn_cell.py

Line 68 in 551f9fe

-self._recurrent_max,

Hey, I checked your implementation of the paper and noticed that instead of constraining the recurrent weight between 0 and max, where max is pow(2, 1/T), you constrain between -max and max. Since this matrix is applied element wise for every time step, wouldn't negative weights potentially result in outputs oscillating between positive and negative signs? This might explain why your version did not converge as fast as in the paper.

If this is the case, the I think the standard weight initialization might not be optimal since it's centered around 0, so half of the weights would be immediately truncated to 0. Maybe a uniform distribution between 0 and max might help initial convergence. Just a thought.

Let me know what you think. I feel like this architecture night be very promising because of it's simplicity and I'd love to see more results.

ValueError Issu

Hi,
Thanks a lot for your work! However when I run the example code I got a Value Error:
'Variable rnn/multi_rnn_cell/cell_0/ind_rnn_cell/input_kernel already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:...'
Could you help to figure out what was went wrong?
Tanks again!

Would you like to contribute this cell to TF's tf.contrib.rnn?

Happy to review your PRs. May want to address the issue opened by EdeMeijer first.

how to reimplement the action recognition experiment of the original paper

hello, could you help share some ideas about how to reimplement the idea on the action recognition dataset?
I tried to implement this, but the performance is not good, hope you can give some instruction.
my code is here:
https://github.com/jren2019/test_ind
I can give you the data that i used for my code. Thanks

Performance issue in the definition of build_rnn, examples/sequential_mnist.py

Hello, I found a performance issue in the definition of build_rnn, examples/sequential_mnist.py, tf.equal in line 201 and 202 will be created repeatedly during program execution, resulting in reduced efficiency. I think it should be created before the loop in build_rnn.

Looking forward to your reply. Btw, I am very glad to create a PR to fix it if you are too busy.

Probably wrong indexing

in ind_rnn_cell.py:

def build(self, inputs_shape):
    if inputs_shape[1].value is None:
      raise ValueError("Expected inputs.shape[-1] to be known, saw shape: %s"
                       % inputs_shape)

if inputs_shape[1].value is None: should be if inputs_shape[-1].value is None: because we need to check if depth is defined

ReLU activation with IndyLSTMCell

Sorry to open an issue here. However, I think that you are an expert on this topic, and IndRNNCell related upgrade in TensorFlow 1.10.0 might also be created by you.

I am trying to apply relu activation to IndyLSTMCell in the new TensorFlow (1.10). However, the loss becomes NAN after I make that change. The default activation tanh works for this cell. IndyGRUCell has the same problem. For IndRNNCell, both tanh and relu work. However, when I stack it to multiple layer, I did not see any performance (capacity) increase of the model (concerning the loss decrease speed over training epochs).

Can you please give me a hint to address this? Any suggestions would be very appreciated. Thanks!