Coder Social home page Coder Social logo

practical6's People

Contributors

bshillingford avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

practical6's Issues

How to use `model_utils.combine_all_parameters` ?

Hi Brendan,

I was just wondering if you could give me some advice on how to use model_utils.combine_all_parameters. I tracked down that that's where I was doing something fundamentally wrong, and I've got my code to work - I just don't fully understand it why it works? Or, maybe I just got a few lucky runs?

The problem I'm working on is a variational autoencoder - in its simplest form it consists of 3 gModules, an encoder, Q-sampler and decoder. If I put these gModules, into another nngraph gModule, say system_module, and use the standard parameter/gradient flattening tool,

params, grad_params = system_module:getParameters()

and then clone, system_module, everything's fine, and it works perfectly, (and the code in feval is much more concise).

If instead I didnโ€™t put the encoder, sampler and decoder into a big system_module, and used

params, grad_params = model_utils.combine_all_parameters( system_t.encoder_t , system_t.sampler_t , system_t.decoder_t )

I run was running into problems, if I did not wrap the forward and backward methods of say system_t.sampler_t[t], inside unpack. So it seems you need to get the ordering of the unpacking of the table outputs of the gModules exactly the same in as you define them. Otherwise it seems this does not line up with the parameters/grads, stored in place in memory.

So as a concrete example I wonder if you could just clarify the difference between, your new method model_utils.combine_all_parameters, where you unpack the output of the LSTM gModule directly,

-- backprop through LSTM timestep
dembeddings[t], dlstm_c[t-1], dlstm_h[t-1] = unpack(clones.lstm[t]:backward(
{embeddings[t], lstm_c[t-1], lstm_h[t-1]},
{dlstm_c[t], dlstm_h[t]}
))

and the old method, which for readability (to help me), I use intermediate tables for both input and output, (which I guess messes up the memory addresses in the new method),

local input_of_LSTM_at_t = {embeddings[t], lstm_c[t-1], lstm_h[t-1]}

local doutput_of_LSTM_at_t = {dlstm_c[t], dlstm_h[t]}

dinput_of_LSTM_at_t = clones.lstm[t]:backward( input_of_LSTM_at_t , doutput_LSTM_at_t )

dembeddings[t] = dinput_LSTM_at_t[1]
dlstm_c[t-1] = dinput_LSTM_at_t[2]
dlstm_h[t-1] = dinput_LSTM_at_t[3]

I'm still new to Torch/Lua, and don't really understand either model_utils.combine_all_parameters or getParameters() and torch.pointer. Sorry for the long question, any chance of a little explanation?

Best,

Aj

Question about clone_many_times()

I'm curious about why should we unroll the LSTM to T timesteps. Since all copies share the same parameters, every time doing bp step makes the LSTM's parameters and gradPrameters changing. Why do we just use one LSTM make fp repeatedly from t = 1 to T and then bp from t = T to 1 repeatedly.
Because I wanted imply LSTM as a decoder to generate sentence, I have to handle the sequence of variant length. However, I tried my thought but failed.
Could someone help me, thanks honestly.

Why is LSTM final state's backward message (dloss/dfinalstate) 0?

Thanks for this code, it's very clear. But I don't understand these two lines:

-- LSTM final state's backward message (dloss/dfinalstate) is 0, since it doesn't influence predictions
local dfinalstate_c = initstate_c:clone()
local dfinalstate_h = initstate_c:clone()

Why is LSTM final state's backward message (dloss/dfinalstate) 0?

Backprop through clones

Not sure I understand the backprop through LSTM timestep lines 110-112 in train.lua?

Any chance of an explanation? Thanks:)

Understanding model_utils.lua

Is it possible to switch combine_all_parameters() and clone_many_times() in train.lua? Can I extract parameters after making a bunch of clones? If I extract params like that
local params, grad_params = model_utils.combine_all_parameters(unpack(clones.embed), unpack(clones.lstm), unpack(clones.softmax)) , the model fails to reduce loss over iterations.

CharLMMinibatchLoader.lua, Line 39 Question

I have a question about Line 39: ydata[-1] = data[1]

The key here is to shift x's (input characters) by one character forward to get y's (target characters we want to predict based on x's given). So Line 38 makes perfect sense to me:

ydata:sub(1,-2):copy(data:sub(2,-1)).

However, why do we want to assign the very first character in the given text to the last element of ydata? We certainly do not want to predict the first character of the text, right? It would make sense if the final element of ydata was instead the actual next character in the text in case our number of characters is not divisible by (seq_length times batch_size). But it looks that the code just cuts off the remaining characters if any.

Could anyone help me to make sense of this? Maybe I am completely misunderstanding the code. Thank you in advance.

How to restart training of a saved model?

Hi,

I've been experimenting with using the model_utils.lua file on some on my own concatenations of gModules. I was just wondering if you could give an example of how to use,

model_utils.combine_all_parameters

and

model_utils.clone_many_times

to get the params and grad_params of a saved protos which can then be used with some of the appropriate lines of train.lua to restart training?

Just to give some context - what I've tried is instead of saving the full protos, just saving the following table,

table_to_save = { options = opt , saved_params=params, saved_grad_params=grad_params }

Then used basically all of train.lua, with the following,

saved_data = torch.load(saved_filename)

opt = saved_data.options

params:copy( saved_data.saved_params )

grad_params:copy( saved_data.grad_params )

That is, I recreate the system using the same options and clone it in the same way - the main change is simply transferring the saved params and grad_params before starting the optimization.

I was just wondering if this is the right way to do it?

Thanks for your help ๐Ÿ‘

Best regards,

Aj

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.