Coder Social home page Coder Social logo

Implementation of RNN about mocha.jl HOT 14 CLOSED

pluskid avatar pluskid commented on June 27, 2024
Implementation of RNN

from mocha.jl.

Comments (14)

jskDr avatar jskDr commented on June 27, 2024

It is a great news to hear that RNN implementation for Mocha.
I guess it will be useful for highly nonlinear chemical molecule property
modeling.

On Sun, Aug 23, 2015 at 2:09 PM, Ratan [email protected] wrote:

So I want to work on adding RNN functionality mainly to help myself
understand them better and to do something of a larger scale in Julia! I
did want to open this issue though so that there would be a forum for
discussion about implementation.

Here are my current thoughts, I don't know if they're consistent with
Mocha's architecture, or even with the principles of RNN's as I only spent
a little time getting acquainted but here goes. Please point out any of my
misunderstandings!
RNN Specific Stuff

  • Not strictly forward and back
  • Backprop is unrolled through time instead, which essentially means a
    final "equivalent" FF net of varying sizes dependent on the number of time
    steps to backprop
  • LSTM to prevent exploding/vanishing gradients

Topology of an RNN in Mocha

To my understanding, there are split layers which allow a layer's output
to be sent to two different layers and still be able to play nice with
backprop. An RNN implementation would likely need to use this.
Additionally, would something like a join layer be necessary?
Caffe

I think BVLC/caffe#1873 BVLC/caffe#1873 is the
relevant thread from Caffe.

If I'm understanding correctly, one of the inputs to a recurrent layer is
a stream that represents the past states of that layer. Understandably, the
forward prop is exact as it only depends on the current value of an input
layer and the most recent past value, presumably stored at one end of the
stream. He mentions, however, that the back prop is approximate. This is
the part I don't understand at all, how is the backprop being approximated?

Thanks for reading!


Reply to this email directly or view it on GitHub
#89.

Best regards,
(James) Sungjin Kim, Ph.D.

  • Post-doc, CCB department in Harvard
    [email protected]
    (Tech-consultant in Samsung Elec.)

from mocha.jl.

pluskid avatar pluskid commented on June 27, 2024

For the approximate gradient computation issue in the caffe discussion. That is because the back-propagate through time get truncated at the boundary of minibatches when the sequence is longer than the minibatch size. So it is approximate

from mocha.jl.

RatanRSur avatar RatanRSur commented on June 27, 2024

Ah, ok. Any other comments on my post?

from mocha.jl.

Andy-P avatar Andy-P commented on June 27, 2024

Just thought I would mention that there is a pure Julia implementation of various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.

https://github.com/Andy-P/RecurrentNN.jl

That might be a useful starting point.

Andre

from mocha.jl.

jskDr avatar jskDr commented on June 27, 2024

That's wonderful information.

2015년 8월 23일 일요일, Andre [email protected]님이 작성한 메시지:

Just thought I would mention that there is a pure Julia implementation of
various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.

https://github.com/Andy-P/RecurrentNN.jl

That might be a useful starting point.

Andre


Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

  • Post-doc, CCB department in Harvard
    [email protected]
    (Tech-consultant in Samsung Elec.)

from mocha.jl.

RatanRSur avatar RatanRSur commented on June 27, 2024

Thanks for pointing me to that @Andy-P, I'll definitely take a look at those when I need help with the conceptual stuff :)

from mocha.jl.

pluskid avatar pluskid commented on June 27, 2024

@RatanRSur Thanks for your interests in doing this! Some other comments:

  • Yes SplitLayer is needed to pass the output blob to both the current outputs and the input to the next time step. I do not think a join layer is needed. Because ultimately, the outputs at each time step goes to the loss layer, and Mocha automatically accumulates all the losses.
  • When un-rolled in time, RNN becomes an ordinary deep nets, with a large depth (depth = time-step unrolled). You will need to use the parameter sharing mechanism to make share the un-rolled layers shares the same parameters.

from mocha.jl.

RatanRSur avatar RatanRSur commented on June 27, 2024

Oops, didn't mean to close the issue.

from mocha.jl.

jskDr avatar jskDr commented on June 27, 2024

It happen sometimes. That's okay.

Now, I have a question. Is there any special point which makes Mocha to
adopt RNN yet?
May be no right?

On Mon, Aug 24, 2015 at 11:16 AM, Ratan [email protected] wrote:

Oops, didn't mean to close the issue.


Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

  • Post-doc, CCB department in Harvard
    [email protected]
    (Tech-consultant in Samsung Elec.)

from mocha.jl.

RatanRSur avatar RatanRSur commented on June 27, 2024

Assuming a simple net like:

X_t-1 ----> Y_t-1 ------> H_t-1
                        |
                        V
X_t -------->Y_t ------> H_t

So, in some way, the user specifies the recurrence of the hidden layer (more on this later) and it is converted into the unrolled RNN by Net.jl? Is this what the solver eventually sees?
imag0486

Regarding designating a layer as recurrent, I'm guessing this would be implemented through a characterization?

from mocha.jl.

jskDr avatar jskDr commented on June 27, 2024

Thank you for sharing your model, Ratan.
It seems a canonical form of RNN. Is my understanding right?

Then, while we are implementing RNN we put modes for full RNN and canonical
RNN.
In linear model, if I remember correctly this is called as decision
feedback model.

On Mon, Aug 24, 2015 at 1:12 PM, Ratan [email protected] wrote:

Assuming a simple net like:

X_t-1 ----> Y_t-1 ------> H_t-1
|
V
X_t -------->Y_t ------> H_t

So, in some way, the user specifies the recurrence of the hidden layer
(more on this later) and it is converted into the unrolled RNN by Net.jl?
Is this what the solver eventually sees?
[image: imag0486]
https://cloud.githubusercontent.com/assets/4733314/9446758/4bab76be-4a61-11e5-8cae-0eefff87003b.jpg

Regarding designating a layer as recurrent, I'm guessing this would be
implemented through a characterization?


Reply to this email directly or view it on GitHub
#89 (comment).

Best regards,
(James) Sungjin Kim, Ph.D.

  • Post-doc, CCB department in Harvard
    [email protected]
    (Tech-consultant in Samsung Elec.)

from mocha.jl.

pluskid avatar pluskid commented on June 27, 2024

@RatanRSur Yes, conceptually the unrolled network looks exactly like what you described.

from mocha.jl.

pluskid avatar pluskid commented on June 27, 2024

For those who is interested in RNN/LSTM in Julia. Please checkout this char-rnn LSTM example in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.

from mocha.jl.

RatanRSur avatar RatanRSur commented on June 27, 2024

Awesome, thanks!

from mocha.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.