Coder Social home page Coder Social logo

Cell structure about indrnn HOT 3 CLOSED

batzner avatar batzner commented on July 17, 2024
Cell structure

from indrnn.

Comments (3)

batzner avatar batzner commented on July 17, 2024

Hi @zeka0, you're right - the implementation is the most basic version. As stated in the paper, there are a couple of extensions possible:

  • Frame-wise batch normalization before or after Recurrent + ReLU
  • Sequence-wise batch normalization before or after Recurrent + ReLU
  • Residual connections between layers

Not all of these are feasible to be incorporated into the cell code. For example, for sequence-wise batch normalization, you need access to the activations of all time steps. However, in TensorFlow the cells (LSTMCell, BasicRNNCell etc.) compute the activation for only one time step at once. See examples/sequential_mnist.py for an example of how to implement sequence-wise batch normalization.

Implementing residual connections should work with the current implementation as that doesn't depend on the actual cell (it could also be an LSTM cell, for example). tf.nn.rnn_cell.ResidualWrapper provides a good solution.

What might make sense to be incorporated into the model is the frame-wise batch normalization before the Recurrent + ReLU part (doing BN after it is also possible right now). I will look into that - the downside would be that it bloats the code a bit for adding a specific use case.

I hope this helps!

By the way, did you fix the NaN problem you mentioned earlier?

from indrnn.

zeka0 avatar zeka0 commented on July 17, 2024

Thanks batzner! Your comment really teaches me something new!

The NaN problem is fixed if I set recurrent_max_abs to 1
When set to pow(2, 1.0 / seq_length) it becomes NaN.(In my code the seq_length is 100)
(I have also tried to use a smaller learning rate, but it didn't solve the problem)
Actually this is reason why I posted this issue, because I think batch normalization might fix this NaN problem as well.

from indrnn.

Sunnydreamrain avatar Sunnydreamrain commented on July 17, 2024

@zeka0 Hi, in the paper it says, "batch normalization, denoted as “BN”, can also be employed in the IndRNN network before or after the activation function". I did not try using two BNs, but I think there is no reason to do both. There is a comment on BN in the IndRNN at this repo. Generally, if you are using sequence-wise BN, BN after activation probably gives you a high performance. But if you are using frame-wise BN, BN before activation probably is better.

from indrnn.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.