Understand how scan are used in different models. Including RNN, RBM and etc.

This shows that: result is always encapsulate in a 2d array, f

Understand how scan are used in different models about deeplearningresearch HOT 6 CLOSED

chenditc commented on September 18, 2024

Understand how scan are used in different models

from deeplearningresearch.

Comments (6)

chenditc commented on September 18, 2024

The scan is mostly used to construct then tensor variable that has recursive behavior. The usual shared variable update is still done by setting "update" option

from deeplearningresearch.

chenditc commented on September 18, 2024

Verification Test:
Construct a variable that add up through the dataset, so y_t = y_tm1 + ax_t + b,
Then calculate the gradient of x at each step, this should gives:
g_b = 1
g_a = x_t + x_tm-1 + ... + x_0

from deeplearningresearch.

chenditc commented on September 18, 2024

Test program:
1 import theano
2 import theano.tensor as T
3
4 inputData = [1,2,3,4,5]
5 x = T.vector("x")
6 y = T.scalar("y")
7 a = T.scalar("a")
8 b = T.scalar("b")
9
10 def recurrent(x_t, y_tm1):
11 y_t = y_tm1 + a * x_t + b
12 return y_t
13
14 y0 = T.vector()
15 result, update = theano.scan(fn = recurrent,
16 outputs_info = y0,
17 sequences = x,
18 truncate_gradient=2,
19 non_sequences= None)
20 index = T.iscalar("i")
21 g_result = T.grad(result[index][0], a)
22
23 sum_all = theano.function(inputs=[y0, x, a, b], outputs=result)
24 grad_all = theano.function(inputs=[y0, x, a, b, index], outputs=g_result)
25
26
27 print "result:"
28 print sum_all([100.0], inputData, 1, 0)
29
30 print "gradient:"
31 for i in range(len(inputData)):
32 print grad_all([100.0], inputData, 1, 0, i)

from deeplearningresearch.

chenditc commented on September 18, 2024

output:
result:
[[ 101.]
[ 103.]
[ 106.]
[ 110.]
[ 115.]]
gradient:
0.0
0.0
0.0
4.0
9.0

from deeplearningresearch.

chenditc commented on September 18, 2024

This shows that:

result is always encapsulate in a 2d array, first dimension is for each input sequence, second dimension is for the output arrays
The truncated gradient is work in a way from the last instance, so we should probably always use full unfold. so NO TRUNCATED GRADIENT. (rather, we should have ceiling for gradient)

from deeplearningresearch.

chenditc commented on September 18, 2024

Done

from deeplearningresearch.

Recommend Projects

Understand how scan are used in different models about deeplearningresearch HOT 6 CLOSED

Comments (6)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent