Coder Social home page Coder Social logo

Shallow RNN in GroundHog about groundhog HOT 13 CLOSED

pascanur avatar pascanur commented on August 18, 2024
Shallow RNN in GroundHog

from groundhog.

Comments (13)

tomsbergmanis avatar tomsbergmanis commented on August 18, 2024

Set state['rec_layer'] = 'RecurrentLayer'

Then main parts of the coded affected would be:

Word Embedding

emb_words = MultiLayer(
    rng,
    n_in=state['n_in'],
    n_hids=eval(state['inp_nhids']),
    activation=eval(state['inp_activ']),
    init_fn='sample_weights_classic',
    weight_noise=state['weight_noise'],
    rank_n_approx = state['rank_n_approx'],
    scale=state['inp_scale'],
    sparsity=state['inp_sparse'],
    learn_bias = True,
    bias_scale=eval(state['inp_bias']),
    name='emb_words')

#### Recurrent Layer
rec = eval(state['rec_layer'])(
        rng,
        eval(state['nhids']),
        activation = eval(state['rec_activ']),
        bias_scale = eval(state['rec_bias']),
        scale=eval(state['rec_scale']),
        sparsity=eval(state['rec_sparse']),
        init_fn=eval(state['rec_init']),
        weight_noise=state['weight_noise'],
        name='rec')

#### Stiching them together
##### (1) Get the embedding of a word
x_emb = emb_words(x, no_noise_bias=state['no_noise_bias'])
##### (2) Embedding + Hidden State via Recurrent Layer
reset = TT.scalar('reset')

rec_layer = rec(x_emb,
                no_noise_bias=state['no_noise_bias'],
                truncate_gradient=state['truncate_gradient'],
                batch_size=state['bs'])


#### Softmax Layer
output_layer = SoftmaxLayer(
    rng,
    eval(state['nhids']),
    state['n_out'],
    scale=state['out_scale'],
    bias_scale=state['out_bias_scale'],
    init_fn="sample_weights_classic",
    weight_noise=state['weight_noise'],
    sparsity=state['out_sparse'],
    sum_over_time=True,
    name='out')

There might be something else you might need to change, but you might to
figure it out.

Toms Bergmanis

On 16 October 2014 16:28, mkudinov [email protected] wrote:

I want to train a simple shallow RNN with 1 hidden layer (like in
Mikolov's work).
In this case I don't need an embedding layer and I would like to turn it
off.
What is the right way to do it? I tried to do this:
rec_layer = rec(x, n_steps=x.shape[0],
init_state=h0*reset,
no_noise_bias=state['no_noise_bias'],
truncate_gradient=state['truncate_gradient'],
batch_size=1)
but it, obviously, didn't worrk.
I'm completely new at Teano so I didn't succeed in guessing it just
looking through the code.

P.S. Obviously, it is not a bug and should not be here but I don't have
another way of communication with you.


Reply to this email directly or view it on GitHub
#5.

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

From Toms' example: you can ignore emb_words part, and feed 'x' directly to 'rec' as well. This will be closer to what Mikolov uses.

from groundhog.

tomsbergmanis avatar tomsbergmanis commented on August 18, 2024

Would that make training simpler?
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Kyunghyun Cho [email protected]
Date: Thu, 16 Oct 2014 09:27:19
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] Shallow RNN in GroundHog (#5)

From Toms' example: you can ignore emb_words part, and feed 'x' directly to 'rec' as well. This will be closer to what Mikolov uses.


Reply to this email directly or view it on GitHub:
#5 (comment)

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

The size of embedding matrix (in the latter case), O(H x V). If your hidden
state H and the vocabulary size V are large, the size of the embedding
matrix grows quite large. Instead, you can do lower-rank approximation by
having an embedding layer M, resulting in the number of parameters being
O(H x M + M x V).

This was our justification for this approach, but it's not necessary, as
long as you have enough data and moderately sized hidden state.

On Thu, Oct 16, 2014 at 12:29 PM, tomsbergmanis [email protected]
wrote:

Would that make training simpler?
Sent from my BlackBerry® smartphone

-----Original Message-----
From: Kyunghyun Cho [email protected]
Date: Thu, 16 Oct 2014 09:27:19
To: pascanur/[email protected]
Reply-To: pascanur/GroundHog [email protected]
Cc: [email protected]
Subject: Re: [GroundHog] Shallow RNN in GroundHog (#5)

From Toms' example: you can ignore emb_words part, and feed 'x' directly
to 'rec' as well. This will be closer to what Mikolov uses.


Reply to this email directly or view it on GitHub:
#5 (comment)


Reply to this email directly or view it on GitHub
#5 (comment).

from groundhog.

mkudinov avatar mkudinov commented on August 18, 2024

@kyunghyuncho In this case I get the following error:
Traceback (most recent call last):
File "mikolovStyle.py", line 416, in
jobman(state, None)
File "mikolovStyle.py", line 134, in jobman
name='rec')
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/rec_layers.py", line 974, in init
self._init_params()
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/rec_layers.py", line 1007, in init_params
self.nG_hh = theano.shared(self.G_hh.get_value()*0, name='noise
'+self.G_hh.name)
AttributeError: 'RecurrentLayer' object has no attribute 'G_hh'

What is update gate?

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

Sorry about the late reply (I'm travelling now.)

Can you replace the following lines

self.nW_hh = theano.shared(self.W_hh.get_value()*0, name='noise_'+self.W_hh.name)     
self.nG_hh = theano.shared(self.G_hh.get_value()*0, name='noise_'+self.G_hh.name) 
self.noise_params = [self.nW_hh, self.nG_hh]                                                      

to

self.nW_hh = theano.shared(self.W_hh.get_value()*0, name='noise_'+self.W_hh.name)     
self.noise_params = [self.nW_hh]                                                      
if self.gating:
    self.nG_hh = theano.shared(self.G_hh.get_value()*0, name='noise_'+self.G_hh.name) 
    self.noise_params += [self.nG_hh]                                                 
if self.reseting:
    self.nR_hh = theano.shared(self.R_hh.get_value()*0, name='noise_'+self.R_hh.name) 
    self.noise_params += [self.nR_hh]

and try again?

If it works for you, I'll commit my changes.

from groundhog.

mkudinov avatar mkudinov commented on August 18, 2024

I made the change and now I get

Original exception was:
Traceback (most recent call last):
File "scripts/mikolovStyle.py", line 415, in
jobman(state, None)
File "scripts/mikolovStyle.py", line 144, in jobman
batch_size=state['bs'])
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/basic.py", line 464, in call
new_obj.fprop(_args, *_kwargs)
File "/home/mkudinov/workspace/GroundHog-master/groundhog/layers/rec_layers.py", line 1170, in fprop
n_steps = nsteps)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan.py", line 1007, in scan
scan_outs = local_op(_scan_inputs)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 399, in call
node = self.make_node(_inputs, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 370, in make_node
inner_sitsot_out.type.ndim))
ValueError: When compiling the inner function of scan the following error has been encountered: The initial state (outputs_info in scan nomenclature) of variable IncSubtensor{Set;:int64:}.0 (argument number 1) has dtype float32 and 2 dimension(s), while the result of the inner function for this output has dtype float64 and 1 dimension(s). This could happen if the inner graph of scan results in an upcast or downcast. Please make sure that you use dtypes consistently

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

Currently, GroundHog only supports single-precision floating point variables (float32). When you run your script, you should explicitly set floatX to 'float32' in Theano configuration variables:

> THEANO_FLAGS=device=gpu,floatX=float32 python your_script_name.py

from groundhog.

mkudinov avatar mkudinov commented on August 18, 2024

I set it through .theanorc.

print theano.config.floatX
float32

from groundhog.

caglar avatar caglar commented on August 18, 2024

I didn't follow the whole discussion. But probably somewhere in your code,
you forget to do casting. You need to do casting on every variable that you
change the type of. For example if you don't explicitly, cast the output of
multiplicion of a float32 variable with int128 variable, the result will be
float64,.

On Sun, Oct 19, 2014 at 2:49 PM, mkudinov [email protected] wrote:

I set it through .theanorc.

print theano.config.floatX
float32


Reply to this email directly or view it on GitHub
#5 (comment).

Caglar GULCEHRE

from groundhog.

mkudinov avatar mkudinov commented on August 18, 2024

I solved the problem. It was caused by implicit conversion inside RecurrentLayer.step_fprop(). dtype of input vector was set to int64. Inside RecurrentLayer.step_fprop() there is a line:
preactiv = TT.dot(state_before_, W_hh) + state_below

adding
preactiv = TT.cast(preactiv,theano.config.floatX)
solves the problem

but
It means that in recurrent layer input is simply added to the previous hidden. It is not the same as Mikolov does, so the 1-layer embedding is required. I.e. tomsbergmanis was right isn't it?

from groundhog.

kyunghyuncho avatar kyunghyuncho commented on August 18, 2024

You're right. I've somehow mistaken about this whole thing.

Can you make a pull request for the casting code there? Or, I can make a direct change later.

from groundhog.

mkudinov avatar mkudinov commented on August 18, 2024

I'll try.

from groundhog.

Related Issues (7)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.