oxford-cs-ml-2015 / practical6 Goto Github PK
View Code? Open in Web Editor NEWPractical 6: LSTM language models
Home Page: https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
Practical 6: LSTM language models
Home Page: https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
Hi Brendan,
I was just wondering if you could give me some advice on how to use model_utils.combine_all_parameters
. I tracked down that that's where I was doing something fundamentally wrong, and I've got my code to work - I just don't fully understand it why it works? Or, maybe I just got a few lucky runs?
The problem I'm working on is a variational autoencoder - in its simplest form it consists of 3 gModules, an encoder, Q-sampler and decoder. If I put these gModules, into another nngraph
gModule, say system_module
, and use the standard parameter/gradient flattening tool,
params, grad_params = system_module:getParameters()
and then clone, system_module
, everything's fine, and it works perfectly, (and the code in feval
is much more concise).
If instead I didnโt put the encoder, sampler and decoder into a big system_module
, and used
params, grad_params = model_utils.combine_all_parameters( system_t.encoder_t , system_t.sampler_t , system_t.decoder_t )
I run was running into problems, if I did not wrap the forward and backward methods of say system_t.sampler_t[t]
, inside unpack
. So it seems you need to get the ordering of the unpacking
of the table outputs of the gModules exactly the same in as you define them. Otherwise it seems this does not line up with the parameters/grads, stored in place in memory.
So as a concrete example I wonder if you could just clarify the difference between, your new method model_utils.combine_all_parameters
, where you unpack the output of the LSTM gModule directly,
-- backprop through LSTM timestep
dembeddings[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.lstm[t]:backward(
{embeddings[t], lstm_c[t-1], lstm_h[t-1]},
{dlstm_c[t], dlstm_h[t]}
))
and the old method, which for readability (to help me), I use intermediate tables for both input and output, (which I guess messes up the memory addresses in the new method),
local input_of_LSTM_at_t = {embeddings[t], lstm_c[t-1], lstm_h[t-1]}
local doutput_of_LSTM_at_t = {dlstm_c[t], dlstm_h[t]}
dinput_of_LSTM_at_t = clones.lstm[t]:backward( input_of_LSTM_at_t , doutput_LSTM_at_t )
dembeddings[t] = dinput_LSTM_at_t[1]
dlstm_c[t-1] = dinput_LSTM_at_t[2]
dlstm_h[t-1] = dinput_LSTM_at_t[3]
I'm still new to Torch/Lua, and don't really understand either model_utils.combine_all_parameters
or getParameters()
and torch.pointer
. Sorry for the long question, any chance of a little explanation?
Best,
Aj
I'm curious about why should we unroll the LSTM to T timesteps. Since all copies share the same parameters, every time doing bp step makes the LSTM's parameters and gradPrameters changing. Why do we just use one LSTM make fp repeatedly from t = 1 to T and then bp from t = T to 1 repeatedly.
Because I wanted imply LSTM as a decoder to generate sentence, I have to handle the sequence of variant length. However, I tried my thought but failed.
Could someone help me, thanks honestly.
Thanks for this code, it's very clear. But I don't understand these two lines:
-- LSTM final state's backward message (dloss/dfinalstate) is 0, since it doesn't influence predictions
local dfinalstate_c = initstate_c:clone()
local dfinalstate_h = initstate_c:clone()
Why is LSTM final state's backward message (dloss/dfinalstate) 0?
Not sure I understand the backprop through LSTM timestep lines 110-112 in train.lua
?
Any chance of an explanation? Thanks:)
Is it possible to switch combine_all_parameters()
and clone_many_times()
in train.lua? Can I extract parameters after making a bunch of clones? If I extract params like that
local params, grad_params = model_utils.combine_all_parameters(unpack(clones.embed), unpack(clones.lstm), unpack(clones.softmax))
, the model fails to reduce loss over iterations.
I have a question about Line 39: ydata[-1] = data[1]
The key here is to shift x's (input characters) by one character forward to get y's (target characters we want to predict based on x's given). So Line 38 makes perfect sense to me:
ydata:sub(1,-2):copy(data:sub(2,-1)).
However, why do we want to assign the very first character in the given text to the last element of ydata? We certainly do not want to predict the first character of the text, right? It would make sense if the final element of ydata was instead the actual next character in the text in case our number of characters is not divisible by (seq_length times batch_size). But it looks that the code just cuts off the remaining characters if any.
Could anyone help me to make sense of this? Maybe I am completely misunderstanding the code. Thank you in advance.
Hi,
I've been experimenting with using the model_utils.lua file on some on my own concatenations of gModules. I was just wondering if you could give an example of how to use,
model_utils.combine_all_parameters
and
model_utils.clone_many_times
to get the params
and grad_params
of a saved protos
which can then be used with some of the appropriate lines of train.lua
to restart training?
Just to give some context - what I've tried is instead of saving the full protos, just saving the following table,
table_to_save = { options = opt , saved_params=params, saved_grad_params=grad_params }
Then used basically all of train.lua
, with the following,
saved_data = torch.load(saved_filename)
opt = saved_data.options
params:copy( saved_data.saved_params )
grad_params:copy( saved_data.grad_params )
That is, I recreate the system using the same options and clone it in the same way - the main change is simply transferring the saved params
and grad_params
before starting the optimization.
I was just wondering if this is the right way to do it?
Thanks for your help ๐
Best regards,
Aj
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.