Comments (14)
It is a great news to hear that RNN implementation for Mocha.
I guess it will be useful for highly nonlinear chemical molecule property
modeling.
On Sun, Aug 23, 2015 at 2:09 PM, Ratan [email protected] wrote:
So I want to work on adding RNN functionality mainly to help myself
understand them better and to do something of a larger scale in Julia! I
did want to open this issue though so that there would be a forum for
discussion about implementation.Here are my current thoughts, I don't know if they're consistent with
Mocha's architecture, or even with the principles of RNN's as I only spent
a little time getting acquainted but here goes. Please point out any of my
misunderstandings!
RNN Specific Stuff
- Not strictly forward and back
- Backprop is unrolled through time instead, which essentially means a
final "equivalent" FF net of varying sizes dependent on the number of time
steps to backprop- LSTM to prevent exploding/vanishing gradients
Topology of an RNN in Mocha
To my understanding, there are split layers which allow a layer's output
to be sent to two different layers and still be able to play nice with
backprop. An RNN implementation would likely need to use this.
Additionally, would something like a join layer be necessary?
CaffeI think BVLC/caffe#1873 BVLC/caffe#1873 is the
relevant thread from Caffe.If I'm understanding correctly, one of the inputs to a recurrent layer is
a stream that represents the past states of that layer. Understandably, the
forward prop is exact as it only depends on the current value of an input
layer and the most recent past value, presumably stored at one end of the
stream. He mentions, however, that the back prop is approximate. This is
the part I don't understand at all, how is the backprop being approximated?Thanks for reading!
—
Reply to this email directly or view it on GitHub
#89.
Best regards,
(James) Sungjin Kim, Ph.D.
- Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)
from mocha.jl.
For the approximate gradient computation issue in the caffe discussion. That is because the back-propagate through time get truncated at the boundary of minibatches when the sequence is longer than the minibatch size. So it is approximate
from mocha.jl.
Ah, ok. Any other comments on my post?
from mocha.jl.
Just thought I would mention that there is a pure Julia implementation of various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.
https://github.com/Andy-P/RecurrentNN.jl
That might be a useful starting point.
Andre
from mocha.jl.
That's wonderful information.
2015년 8월 23일 일요일, Andre [email protected]님이 작성한 메시지:
Just thought I would mention that there is a pure Julia implementation of
various RNN models (RNN, LSTM etc) in the RecurrentNN.jl package.https://github.com/Andy-P/RecurrentNN.jl
That might be a useful starting point.
Andre
—
Reply to this email directly or view it on GitHub
#89 (comment).
Best regards,
(James) Sungjin Kim, Ph.D.
- Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)
from mocha.jl.
Thanks for pointing me to that @Andy-P, I'll definitely take a look at those when I need help with the conceptual stuff :)
from mocha.jl.
@RatanRSur Thanks for your interests in doing this! Some other comments:
- Yes
SplitLayer
is needed to pass the output blob to both the current outputs and the input to the next time step. I do not think a join layer is needed. Because ultimately, the outputs at each time step goes to the loss layer, and Mocha automatically accumulates all the losses. - When un-rolled in time, RNN becomes an ordinary deep nets, with a large depth (depth = time-step unrolled). You will need to use the parameter sharing mechanism to make share the un-rolled layers shares the same parameters.
from mocha.jl.
Oops, didn't mean to close the issue.
from mocha.jl.
It happen sometimes. That's okay.
Now, I have a question. Is there any special point which makes Mocha to
adopt RNN yet?
May be no right?
On Mon, Aug 24, 2015 at 11:16 AM, Ratan [email protected] wrote:
Oops, didn't mean to close the issue.
—
Reply to this email directly or view it on GitHub
#89 (comment).
Best regards,
(James) Sungjin Kim, Ph.D.
- Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)
from mocha.jl.
Assuming a simple net like:
X_t-1
----> Y_t-1
------> H_t-1
|
V
X_t
-------->Y_t
------> H_t
So, in some way, the user specifies the recurrence of the hidden layer (more on this later) and it is converted into the unrolled RNN by Net.jl
? Is this what the solver eventually sees?
Regarding designating a layer as recurrent, I'm guessing this would be implemented through a characterization
?
from mocha.jl.
Thank you for sharing your model, Ratan.
It seems a canonical form of RNN. Is my understanding right?
Then, while we are implementing RNN we put modes for full RNN and canonical
RNN.
In linear model, if I remember correctly this is called as decision
feedback model.
On Mon, Aug 24, 2015 at 1:12 PM, Ratan [email protected] wrote:
Assuming a simple net like:
X_t-1 ----> Y_t-1 ------> H_t-1
|
V
X_t -------->Y_t ------> H_tSo, in some way, the user specifies the recurrence of the hidden layer
(more on this later) and it is converted into the unrolled RNN by Net.jl?
Is this what the solver eventually sees?
[image: imag0486]
https://cloud.githubusercontent.com/assets/4733314/9446758/4bab76be-4a61-11e5-8cae-0eefff87003b.jpgRegarding designating a layer as recurrent, I'm guessing this would be
implemented through a characterization?—
Reply to this email directly or view it on GitHub
#89 (comment).
Best regards,
(James) Sungjin Kim, Ph.D.
- Post-doc, CCB department in Harvard
[email protected]
(Tech-consultant in Samsung Elec.)
from mocha.jl.
@RatanRSur Yes, conceptually the unrolled network looks exactly like what you described.
from mocha.jl.
For those who is interested in RNN/LSTM in Julia. Please checkout this char-rnn LSTM example in MXNet.jl now. It used explicit unrolling so everything fit in the current FeedForward
model, therefore multi-GPU training can be used directly. For more general purpose variable length RNN without unrolling, we will still need to develop the modeling interface. I will add tutorial document soon.
from mocha.jl.
Awesome, thanks!
from mocha.jl.
Related Issues (20)
- ERROR: TypeError: read_refs: in typeassert, expected Array{Array{T,N},1}, got Array{Array{T,N},1} HOT 5
- UndefVarError
- PoolingLayer without pad HOT 2
- MethodError: no method matching configure(::Logging.LogLevel, ::Base.TTY) HOT 4
- SolverState has no field specific in get_learning_rate HOT 1
- Julia 0.6 and Mocha instalation HOT 1
- UndefVarError: readbytes not defined HOT 3
- Mocha is giving me error after installation with GFORCE NVIDA GPU HOT 2
- examples/mnist/convert.jl needs to be updated to work on Julia 0.6 HOT 2
- Using Adam in Convolutional context
- Multi-class architecture
- Exports of undefined variables HOT 1
- imageNet classifier example is still using the type Image from Images.jl HOT 5
- Creating HDF5 Data Layer
- Failed to precompile Mocha when i "using Mocha" HOT 3
- Error building `Mocha` in Julia v1.0 HOT 1
- Package compatibility caps
- Cannot instal Mocha on Julia 1.02 HOT 4
- Info about upcoming removal of packages in the General registry
- [question] How is M computed in wasserstein layer? HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mocha.jl.