Implementation of RNN about mocha.jl HOT 14 CLOSED

pluskid commented on June 27, 2024

Implementation of RNN

from mocha.jl.

Comments (14)

jskDr commented on June 27, 2024

It is a great news to hear that RNN implementation for Mocha.
I guess it will be useful for highly nonlinear chemical molecule property
modeling.

On Sun, Aug 23, 2015 at 2:09 PM, Ratan [email protected] wrote:

So I want to work on adding RNN functionality mainly to help myself
understand them better and to do something of a larger scale in Julia! I
did want to open this issue though so that there would be a forum for
discussion about implementation.

Here are my current thoughts, I don't know if they're consistent with
Mocha's architecture, or even with the principles of RNN's as I only spent
a little time getting acquainted but here goes. Please point out any of my
misunderstandings!
RNN Specific Stuff

Not strictly forward and back

Backprop is unrolled through time instead, which essentially means a
final "equivalent" FF net of varying sizes dependent on the number of time
steps to backprop

LSTM to prevent exploding/vanishing gradients

Topology of an RNN in Mocha

To my understanding, there are split layers which allow a layer's output
to be sent to two different layers and still be able to play nice with
backprop. An RNN implementation would likely need to use this.
Additionally, would something like a join layer be necessary?
Caffe

I think BVLC/caffe#1873 BVLC/caffe#1873 is the
relevant thread from Caffe.

If I'm understanding correctly, one of the inputs to a recurrent layer is
a stream that represents the past states of that layer. Understandably, the
forward prop is exact as it only depends on the current value of an input
layer and the most recent past value, presumably stored at one end of the
stream. He mentions, however, that the back prop is approximate. This is
the part I don't understand at all, how is the backprop being approximated?

Thanks for reading!

—
Reply to this email directly or view it on GitHub
#89.

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)

from mocha.jl.

pluskid commented on June 27, 2024

For the approximate gradient computation issue in the caffe discussion. That is because the back-propagate through time get truncated at the boundary of minibatches when the sequence is longer than the minibatch size. So it is approximate

from mocha.jl.

RatanRSur commented on June 27, 2024

Ah, ok. Any other comments on my post?

from mocha.jl.

Andy-P commented on June 27, 2024

Just thought I would mention that there is a pure Julia implementation of various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.

https://github.com/Andy-P/RecurrentNN.jl

That might be a useful starting point.

Andre

from mocha.jl.

jskDr commented on June 27, 2024

That's wonderful information.

2015년 8월 23일 일요일, Andre [email protected]님이 작성한 메시지:

Just thought I would mention that there is a pure Julia implementation of
various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.

https://github.com/Andy-P/RecurrentNN.jl

That might be a useful starting point.

Andre

—
Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)

from mocha.jl.

RatanRSur commented on June 27, 2024

Thanks for pointing me to that @Andy-P, I'll definitely take a look at those when I need help with the conceptual stuff :)

from mocha.jl.

pluskid commented on June 27, 2024

@RatanRSur Thanks for your interests in doing this! Some other comments:

Yes SplitLayer is needed to pass the output blob to both the current outputs and the input to the next time step. I do not think a join layer is needed. Because ultimately, the outputs at each time step goes to the loss layer, and Mocha automatically accumulates all the losses.
When un-rolled in time, RNN becomes an ordinary deep nets, with a large depth (depth = time-step unrolled). You will need to use the parameter sharing mechanism to make share the un-rolled layers shares the same parameters.

from mocha.jl.

RatanRSur commented on June 27, 2024

Oops, didn't mean to close the issue.

from mocha.jl.

jskDr commented on June 27, 2024

It happen sometimes. That's okay.

Now, I have a question. Is there any special point which makes Mocha to
adopt RNN yet?
May be no right?

On Mon, Aug 24, 2015 at 11:16 AM, Ratan [email protected] wrote:

Oops, didn't mean to close the issue.

—
Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)

from mocha.jl.

RatanRSur commented on June 27, 2024

Assuming a simple net like:

X_t-1 ----> Y_t-1 ------> H_t-1
|
V
X_t -------->Y_t ------> H_t

So, in some way, the user specifies the recurrence of the hidden layer (more on this later) and it is converted into the unrolled RNN by Net.jl? Is this what the solver eventually sees?

Regarding designating a layer as recurrent, I'm guessing this would be implemented through a characterization?

from mocha.jl.

jskDr commented on June 27, 2024

Thank you for sharing your model, Ratan.
It seems a canonical form of RNN. Is my understanding right?

Then, while we are implementing RNN we put modes for full RNN and canonical
RNN.
In linear model, if I remember correctly this is called as decision
feedback model.

On Mon, Aug 24, 2015 at 1:12 PM, Ratan [email protected] wrote:

Assuming a simple net like:

X_t-1 ----> Y_t-1 ------> H_t-1
|
V
X_t -------->Y_t ------> H_t

So, in some way, the user specifies the recurrence of the hidden layer
(more on this later) and it is converted into the unrolled RNN by Net.jl?
Is this what the solver eventually sees?
[image: imag0486]
https://cloud.githubusercontent.com/assets/4733314/9446758/4bab76be-4a61-11e5-8cae-0eefff87003b.jpg

Regarding designating a layer as recurrent, I'm guessing this would be
implemented through a characterization?

—
Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)

from mocha.jl.

pluskid commented on June 27, 2024

@RatanRSur Yes, conceptually the unrolled network looks exactly like what you described.

from mocha.jl.

pluskid commented on June 27, 2024

For those who is interested in RNN/LSTM in Julia. Please checkout this char-rnn LSTM example in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.

from mocha.jl.

RatanRSur commented on June 27, 2024

Awesome, thanks!

from mocha.jl.

Implementation of RNN about mocha.jl HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent