Shallow RNN in GroundHog about groundhog HOT 13 CLOSED

pascanur commented on August 18, 2024

Shallow RNN in GroundHog

from groundhog.

Comments (13)

tomsbergmanis commented on August 18, 2024

Set state['rec_layer'] = 'RecurrentLayer'

Then main parts of the coded affected would be:

Word Embedding

emb_words = MultiLayer(
    rng,
    n_in=state['n_in'],
    n_hids=eval(state['inp_nhids']),
    activation=eval(state['inp_activ']),
    init_fn='sample_weights_classic',
    weight_noise=state['weight_noise'],
    rank_n_approx = state['rank_n_approx'],
    scale=state['inp_scale'],
    sparsity=state['inp_sparse'],
    learn_bias = True,
    bias_scale=eval(state['inp_bias']),
    name='emb_words')

#### Recurrent Layer
rec = eval(state['rec_layer'])(
        rng,
        eval(state['nhids']),
        activation = eval(state['rec_activ']),
        bias_scale = eval(state['rec_bias']),
        scale=eval(state['rec_scale']),
        sparsity=eval(state['rec_sparse']),
        init_fn=eval(state['rec_init']),
        weight_noise=state['weight_noise'],
        name='rec')

#### Stiching them together
##### (1) Get the embedding of a word
x_emb = emb_words(x, no_noise_bias=state['no_noise_bias'])
##### (2) Embedding + Hidden State via Recurrent Layer
reset = TT.scalar('reset')

rec_layer = rec(x_emb,
                no_noise_bias=state['no_noise_bias'],
                truncate_gradient=state['truncate_gradient'],
                batch_size=state['bs'])


#### Softmax Layer
output_layer = SoftmaxLayer(
    rng,
    eval(state['nhids']),
    state['n_out'],
    scale=state['out_scale'],
    bias_scale=state['out_bias_scale'],
    init_fn="sample_weights_classic",
    weight_noise=state['weight_noise'],
    sparsity=state['out_sparse'],
    sum_over_time=True,
    name='out')

There might be something else you might need to change, but you might to
figure it out.

Toms Bergmanis

On 16 October 2014 16:28, mkudinov [email protected] wrote:

I want to train a simple shallow RNN with 1 hidden layer (like in
Mikolov's work).
In this case I don't need an embedding layer and I would like to turn it
off.
What is the right way to do it? I tried to do this:
rec_layer = rec(x, n_steps=x.shape[0],
init_state=h0*reset,
no_noise_bias=state['no_noise_bias'],
truncate_gradient=state['truncate_gradient'],
batch_size=1)
but it, obviously, didn't worrk.
I'm completely new at Teano so I didn't succeed in guessing it just
looking through the code.

P.S. Obviously, it is not a bug and should not be here but I don't have
another way of communication with you.

—
Reply to this email directly or view it on GitHub
#5.

from groundhog.

kyunghyuncho commented on August 18, 2024

From Toms' example: you can ignore emb_words part, and feed 'x' directly to 'rec' as well. This will be closer to what Mikolov uses.

from groundhog.

tomsbergmanis commented on August 18, 2024

Would that make training simpler?
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Kyunghyun Cho [email protected]
Date: Thu, 16 Oct 2014 09:27:19
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] Shallow RNN in GroundHog (#5)

From Toms' example: you can ignore emb_words part, and feed 'x' directly to 'rec' as well. This will be closer to what Mikolov uses.

Reply to this email directly or view it on GitHub:
#5 (comment)

from groundhog.

kyunghyuncho commented on August 18, 2024

The size of embedding matrix (in the latter case), O(H x V). If your hidden
state H and the vocabulary size V are large, the size of the embedding
matrix grows quite large. Instead, you can do lower-rank approximation by
having an embedding layer M, resulting in the number of parameters being
O(H x M + M x V).

This was our justification for this approach, but it's not necessary, as
long as you have enough data and moderately sized hidden state.

On Thu, Oct 16, 2014 at 12:29 PM, tomsbergmanis [email protected]
wrote:

Would that make training simpler?
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Kyunghyun Cho [email protected]
Date: Thu, 16 Oct 2014 09:27:19
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] Shallow RNN in GroundHog (#5)

From Toms' example: you can ignore emb_words part, and feed 'x' directly
to 'rec' as well. This will be closer to what Mikolov uses.

Reply to this email directly or view it on GitHub:
#5 (comment)

—
Reply to this email directly or view it on GitHub
#5 (comment).

from groundhog.

mkudinov commented on August 18, 2024

@kyunghyuncho In this case I get the following error:
Traceback (most recent call last):
File "mikolovStyle.py", line 416, in
jobman(state, None)
File "mikolovStyle.py", line 134, in jobman
name='rec')
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/rec_layers.py", line 974, in init
self._init_params()
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/rec_layers.py", line 1007, in init_params
self.nG_hh = theano.shared(self.G_hh.get_value()*0, name='noise'+self.G_hh.name)
AttributeError: 'RecurrentLayer' object has no attribute 'G_hh'

What is update gate?

from groundhog.

kyunghyuncho commented on August 18, 2024

Sorry about the late reply (I'm travelling now.)

Can you replace the following lines

self.nW_hh = theano.shared(self.W_hh.get_value()*0, name='noise_'+self.W_hh.name)     
self.nG_hh = theano.shared(self.G_hh.get_value()*0, name='noise_'+self.G_hh.name) 
self.noise_params = [self.nW_hh, self.nG_hh]

self.nW_hh = theano.shared(self.W_hh.get_value()*0, name='noise_'+self.W_hh.name)     
self.noise_params = [self.nW_hh]                                                      
if self.gating:
    self.nG_hh = theano.shared(self.G_hh.get_value()*0, name='noise_'+self.G_hh.name) 
    self.noise_params += [self.nG_hh]                                                 
if self.reseting:
    self.nR_hh = theano.shared(self.R_hh.get_value()*0, name='noise_'+self.R_hh.name) 
    self.noise_params += [self.nR_hh]

and try again?

If it works for you, I'll commit my changes.

from groundhog.

mkudinov commented on August 18, 2024

I made the change and now I get

Original exception was:
Traceback (most recent call last):
File "scripts/mikolovStyle.py", line 415, in
jobman(state, None)
File "scripts/mikolovStyle.py", line 144, in jobman
batch_size=state['bs'])
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/basic.py", line 464, in call
new_obj.fprop(_args, *_kwargs)
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/rec_layers.py", line 1170, in fprop
n_steps = nsteps)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py", line 1007, in scan
scan_outs = local_op(_scan_inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 399, in call
node = self.make_node(_inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 370, in make_node
inner_sitsot_out.type.ndim))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32 and 2 dimension(s), while the result of the inner function for this output has dtype float64 and 1 dimension(s). This could happen if the inner graph of scan results in an upcast or downcast. Please make sure that you use dtypes consistently

from groundhog.

kyunghyuncho commented on August 18, 2024

Currently, GroundHog only supports single-precision floating point variables (float32). When you run your script, you should explicitly set floatX to 'float32' in Theano configuration variables:

> THEANO_FLAGS=device=gpu,floatX=float32 python your_script_name.py

from groundhog.

mkudinov commented on August 18, 2024

I set it through .theanorc.

print theano.config.floatX
float32

from groundhog.

caglar commented on August 18, 2024

I didn't follow the whole discussion. But probably somewhere in your code,
you forget to do casting. You need to do casting on every variable that you
change the type of. For example if you don't explicitly, cast the output of
multiplicion of a float32 variable with int128 variable, the result will be
float64,.

On Sun, Oct 19, 2014 at 2:49 PM, mkudinov [email protected] wrote:

I set it through .theanorc.

print theano.config.floatX
float32

—
Reply to this email directly or view it on GitHub
#5 (comment).

Caglar GULCEHRE

from groundhog.

mkudinov commented on August 18, 2024

I solved the problem. It was caused by implicit conversion inside RecurrentLayer.step_fprop(). dtype of input vector was set to int64. Inside RecurrentLayer.step_fprop() there is a line:
preactiv = TT.dot(state_before_, W_hh) + state_below

adding
preactiv = TT.cast(preactiv,theano.config.floatX)
solves the problem

but
It means that in recurrent layer input is simply added to the previous hidden. It is not the same as Mikolov does, so the 1-layer embedding is required. I.e. tomsbergmanis was right isn't it?

from groundhog.

kyunghyuncho commented on August 18, 2024

You're right. I've somehow mistaken about this whole thing.

Can you make a pull request for the casting code there? Or, I can make a direct change later.

from groundhog.

mkudinov commented on August 18, 2024

I'll try.

from groundhog.

Shallow RNN in GroundHog about groundhog HOT 13 CLOSED

Comments (13)

Word Embedding

Related Issues (7)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent