Hello! I noticed that your implementation of indrnn is the basic version(not the r

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Cell structure about indrnn HOT 3 CLOSED

batzner commented on July 17, 2024

Cell structure

from indrnn.

Comments (3)

batzner commented on July 17, 2024

Hi @zeka0, you're right - the implementation is the most basic version. As stated in the paper, there are a couple of extensions possible:

Frame-wise batch normalization before or after Recurrent + ReLU
Sequence-wise batch normalization before or after Recurrent + ReLU
Residual connections between layers

Not all of these are feasible to be incorporated into the cell code. For example, for sequence-wise batch normalization, you need access to the activations of all time steps. However, in TensorFlow the cells (LSTMCell, BasicRNNCell etc.) compute the activation for only one time step at once. See examples/sequential_mnist.py for an example of how to implement sequence-wise batch normalization.

Implementing residual connections should work with the current implementation as that doesn't depend on the actual cell (it could also be an LSTM cell, for example). tf.nn.rnn_cell.ResidualWrapper provides a good solution.

What might make sense to be incorporated into the model is the frame-wise batch normalization before the Recurrent + ReLU part (doing BN after it is also possible right now). I will look into that - the downside would be that it bloats the code a bit for adding a specific use case.

I hope this helps!

By the way, did you fix the NaN problem you mentioned earlier?

from indrnn.

zeka0 commented on July 17, 2024

Thanks batzner! Your comment really teaches me something new!

The NaN problem is fixed if I set recurrent_max_abs to 1
When set to pow(2, 1.0 / seq_length) it becomes NaN.(In my code the seq_length is 100)
(I have also tried to use a smaller learning rate, but it didn't solve the problem)
Actually this is reason why I posted this issue, because I think batch normalization might fix this NaN problem as well.

from indrnn.

Sunnydreamrain commented on July 17, 2024

@zeka0 Hi, in the paper it says, "batch normalization, denoted as “BN”, can also be employed in the IndRNN network before or after the activation function". I did not try using two BNs, but I think there is no reason to do both. There is a comment on BN in the IndRNN at this repo. Generally, if you are using sequence-wise BN, BN after activation probably gives you a high performance. But if you are using frame-wise BN, BN before activation probably is better.

from indrnn.

Recommend Projects

Cell structure about indrnn HOT 3 CLOSED

Comments (3)

Related Issues (16)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent