oxford-cs-ml-2015 / practical6 Goto Github PK

View Code? Open in Web Editor NEW

258.0 258.0 84.0 355 KB

Practical 6: LSTM language models

Home Page: https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/

Lua 98.51% Shell 1.49%

practical6's People

Contributors

Stargazers

Watchers

Forkers

mainstreetquant ajaytalati alexbw zencoding bdadachev vseledkin allanino guilhermehartmann omar-florez emmanuelq2 chonglinsun fedorajzf pyankoff yanweifu jingtaow vivanov879 colin1988 kirk86 junwei-pan methodsensembled eriche2016 c0debrain soroushmehr tigerneil txd888 barneyeldinosaurio yehaibuaa zjmbjfu jz3707 weiyumiao fone4u peratham felixmonkey harrywy xuguoping xifengbishu ml-ai-nlp-ir miradel51 feynmanliang sherriiie p-baleine maiyatang2010 gandalfvn bin2000 clear-datacenter bkjackson drjzhou turgunyusuf blueeyer27 vkomini walkerwu johndpope sigmaquan amds123 pianpian315 shaoxuan92 danlg william-xi-paris moses1994 mcallara kevinking sylvia1664 ssanbu08 futurev gp1313 wanjinchang hailiang-wang choiyeren rhahn28 liguo86 lihua213 whrenstone hanfeijp ykseul rhendrickson42 zuowenting zgsxwsdxg afcarl deeplearningpk qingtaodu atlury sandy4321 spicybird dengniran

practical6's Issues

How to use `model_utils.combine_all_parameters` ?

Hi Brendan,

I was just wondering if you could give me some advice on how to use model_utils.combine_all_parameters. I tracked down that that's where I was doing something fundamentally wrong, and I've got my code to work - I just don't fully understand it why it works? Or, maybe I just got a few lucky runs?

The problem I'm working on is a variational autoencoder - in its simplest form it consists of 3 gModules, an encoder, Q-sampler and decoder. If I put these gModules, into another nngraph gModule, say system_module, and use the standard parameter/gradient flattening tool,

params, grad_params = system_module:getParameters()

and then clone, system_module, everything's fine, and it works perfectly, (and the code in feval is much more concise).

If instead I didn’t put the encoder, sampler and decoder into a big system_module, and used

params, grad_params = model_utils.combine_all_parameters( system_t.encoder_t , system_t.sampler_t , system_t.decoder_t )

I run was running into problems, if I did not wrap the forward and backward methods of say system_t.sampler_t[t], inside unpack. So it seems you need to get the ordering of the unpacking of the table outputs of the gModules exactly the same in as you define them. Otherwise it seems this does not line up with the parameters/grads, stored in place in memory.

So as a concrete example I wonder if you could just clarify the difference between, your new method model_utils.combine_all_parameters, where you unpack the output of the LSTM gModule directly,

-- backprop through LSTM timestep
dembeddings[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.lstm[t]:backward(
{embeddings[t], lstm_c[t-1], lstm_h[t-1]},
{dlstm_c[t], dlstm_h[t]}
))

and the old method, which for readability (to help me), I use intermediate tables for both input and output, (which I guess messes up the memory addresses in the new method),

local input_of_LSTM_at_t = {embeddings[t], lstm_c[t-1], lstm_h[t-1]}

local doutput_of_LSTM_at_t = {dlstm_c[t], dlstm_h[t]}

dinput_of_LSTM_at_t = clones.lstm[t]:backward( input_of_LSTM_at_t , doutput_LSTM_at_t )

dembeddings[t] = dinput_LSTM_at_t[1]
dlstm_c[t-1] = dinput_LSTM_at_t[2]
dlstm_h[t-1] = dinput_LSTM_at_t[3]

I'm still new to Torch/Lua, and don't really understand either model_utils.combine_all_parameters or getParameters() and torch.pointer. Sorry for the long question, any chance of a little explanation?

Best,

Question about clone_many_times()

I'm curious about why should we unroll the LSTM to T timesteps. Since all copies share the same parameters, every time doing bp step makes the LSTM's parameters and gradPrameters changing. Why do we just use one LSTM make fp repeatedly from t = 1 to T and then bp from t = T to 1 repeatedly.
Because I wanted imply LSTM as a decoder to generate sentence, I have to handle the sequence of variant length. However, I tried my thought but failed.
Could someone help me, thanks honestly.

Why is LSTM final state's backward message (dloss/dfinalstate) 0?

Thanks for this code, it's very clear. But I don't understand these two lines:

-- LSTM final state's backward message (dloss/dfinalstate) is 0, since it doesn't influence predictions
local dfinalstate_c = initstate_c:clone()
local dfinalstate_h = initstate_c:clone()

Why is LSTM final state's backward message (dloss/dfinalstate) 0?

Backprop through clones

Not sure I understand the backprop through LSTM timestep lines 110-112 in train.lua?

Any chance of an explanation? Thanks:)

Understanding model_utils.lua

Is it possible to switch combine_all_parameters() and clone_many_times() in train.lua? Can I extract parameters after making a bunch of clones? If I extract params like that
local params, grad_params = model_utils.combine_all_parameters(unpack(clones.embed), unpack(clones.lstm), unpack(clones.softmax)) , the model fails to reduce loss over iterations.

CharLMMinibatchLoader.lua, Line 39 Question

I have a question about Line 39: ydata[-1] = data[1]

The key here is to shift x's (input characters) by one character forward to get y's (target characters we want to predict based on x's given). So Line 38 makes perfect sense to me:

ydata:sub(1,-2):copy(data:sub(2,-1)).

However, why do we want to assign the very first character in the given text to the last element of ydata? We certainly do not want to predict the first character of the text, right? It would make sense if the final element of ydata was instead the actual next character in the text in case our number of characters is not divisible by (seq_length times batch_size). But it looks that the code just cuts off the remaining characters if any.

Could anyone help me to make sense of this? Maybe I am completely misunderstanding the code. Thank you in advance.

How to restart training of a saved model?

Hi,

I've been experimenting with using the model_utils.lua file on some on my own concatenations of gModules. I was just wondering if you could give an example of how to use,

model_utils.combine_all_parameters

and

model_utils.clone_many_times

to get the params and grad_params of a saved protos which can then be used with some of the appropriate lines of train.lua to restart training?

Just to give some context - what I've tried is instead of saving the full protos, just saving the following table,

table_to_save = { options = opt , saved_params=params, saved_grad_params=grad_params }

Then used basically all of train.lua, with the following,

saved_data = torch.load(saved_filename)

opt = saved_data.options

params:copy( saved_data.saved_params )

grad_params:copy( saved_data.grad_params )

That is, I recreate the system using the same options and clone it in the same way - the main change is simply transferring the saved params and grad_params before starting the optimization.

I was just wondering if this is the right way to do it?

Thanks for your help 👍

Best regards,

oxford-cs-ml-2015 / practical6 Goto Github PK

practical6's People

Contributors

Stargazers

Watchers

Forkers

practical6's Issues

How to use `model_utils.combine_all_parameters` ?

Question about clone_many_times()

Why is LSTM final state's backward message (dloss/dfinalstate) 0?

Backprop through clones

Understanding model_utils.lua

CharLMMinibatchLoader.lua, Line 39 Question

How to restart training of a saved model?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent