twitter-archive / torch-autograd Goto Github PK

View Code? Open in Web Editor NEW

555.0 50.0 120.0 2.3 MB

Autograd automatically differentiates native Torch code

License: Apache License 2.0

Lua 99.75% CMake 0.25%

torch-autograd's Introduction

Autograd

Autograd automatically differentiates native Torch code. Inspired by the original Python version.

Scope

Autograd has multiple goals:

provide automatic differentiation of Torch expressions
support arbitrary Torch types (e.g. transparent and full support for CUDA-backed computations)
full integration with nn modules: mix and match auto-differentiation with user-provided gradients
the ability to define any new nn compliant Module with automatic differentiation
represent complex evaluation graphs, which is very useful to describe models with multiple loss functions and/or inputs
graphs are dynamic, i.e. can be different at each function call: for loops, or conditional, can depend on intermediate results, or on input parameters
enable gradients of gradients for transparent computation of Hessians

Updates

Jan 21, 2016: Two big new user-facing features:

First, we now support direct assignment (so you can now do x[k] = v inside optimize=true autograd code, where k can be a number, table or LongTensor, and v can be a tensor or number, whichever is appropriate. Here's a few examples.
Second, you can now take 2nd-order and higher gradients (supported in optimized mode. Either run autograd.optimize(true) or take the derivative of your function using df = autograd(f, {optimize = true}). Check out a simple example in our tests
Plus, lots of misc bugfixes and new utilities to help with tensor manipulation (autograd.util.cat can work with numbers, or tensors of any time. autograd.util.cast can cast a nested table of tensors to any type you like).

Nov 16, 2015: Runtime performance was improved dramatically, as well as ease of use with better debugging tools. Performance is now within 30% of a statically described version of an equivalent model (nn and nngraph).

a compute DAG is now generated and cached based on input tensors's dimensions
the DAG is compiled into Lua code, with several optimizations
all intermediate states (tensors) are saved and re-used in a tensor pool
debugging facilities have been added: when debugging is enabled, a nan or inf will trigger a callback, that can be used to render a DOT representation of the graph (see debugging)
now restricting user code to the functional API of Torch (a:add(b) forbidden, use res = torch.add(a,b) instead)
additional control flags can be passed to d(f, {...}) to compute subparts of the graph (fprop or bprop), useful to generate a compiled fprop (see fine grained control)

Nov 6, 2015: initial release.

Install

Install Torch (instructions here).
Retrieve this repo
Run: luarocks make

Examples

Autograd example

A simple neural network with a multinomial logistic loss:

-- libraries:
t = require 'torch'
grad = require 'autograd'

-- define trainable parameters:
params = {
   W = {
      t.randn(100,50),
      t.randn(50,10),
   },
   b = {
      t.randn(50),
      t.randn(10),
   }
}

-- define model
neuralNet = function(params, x, y)
   local h1 = t.tanh(x * params.W[1] + params.b[1])
   local h2 = t.tanh(h1 * params.W[2] + params.b[2])
   local yHat = h2 - t.log(t.sum(t.exp(h2)))
   local loss = - t.sum(t.cmul(yHat, y))
   return loss
end

-- gradients:
dneuralNet = grad(neuralNet)

-- some data:
x = t.randn(1,100)
y = t.Tensor(1,10):zero() y[1][3] = 1

-- compute loss and gradients wrt all parameters in params:
dparams, loss = dneuralNet(params, x, y)

-- in this case:
--> loss: is a scalar (Lua number)
--> dparams: is a table that mimics the structure of params; for
--  each Tensor in params, dparams provides the derivatives of the
--  loss wrt to that Tensor.

Important note: only variables packed in the first argument of the eval function will have their gradients computed. In the example above, if the gradients wrt x are needed, then x simply has to be moved into params. The params table can be arbitrarily nested.

See more complete examples in examples.

Assuming the model defined above, and a training set of {x,y} pairs, the model can easily be optimized using SGD:

for i,sample in datasetIterator() do
   -- estimate gradients wrt params:
   local grads, loss = dneuralNet(params, sample.x, sample.y)

   -- SGD step:
   for i = 1,#params.W do
      -- update params with an arbitrary learning rate:
      params.W[i]:add(-.01, grads.W[i])
      params.b[i]:add(-.01, grads.b[i])
   end
end

Optimization

To enable the optimizer, which produces optimized representations of your loss and gradient functions (as generated lua code):

grad = require 'autograd'
grad.optimize(true) -- global
local df = grad(f, { optimize = true }) -- for this function only
local grads = df(params)

Benefits:

Intermediate tensors are re-used between invocations of df(), dramatically reducing the amount of garbage produced.
Zero overhead from autograd itself, once the code for computing your gradients has been generated.
On average, a 2-3x overall performance improvement.

Caveats:

The generated code is cached based on the dimensions of the input tensors. If your problem is such that you have thousands of unique tensors configurations, you won't see any benefit.
Each invocation of grad(f) produces a new context for caching, so be sure to only call this once.
WARNING: Variables that you close over in an autograd function in optimize mode will never be updated -- they are treated as static as soon as the function is defined.
WARNING: If you make extensive use of control flow (any if-statements, for-loops or while-loops), you're better off using direct mode. In the best case, the variables used for control flow will be passed in as arguments, and trigger recompilation for as many possible branches as exist in your code. In the worst case, the variables used for control flow will be either computed internally, closed over, or not change in size or rank, and control flow changes will be completely ignored.

Wrapping nn modules

The nn library provides with all sorts of very optimized primitives, with gradient code written and optimized manually. Sometimes it's useful to rely on these for maximum performance.

Here we rewrite the neural net example from above, but this time relying on a mix of nn primitives and autograd-inferred gradients:

-- libraries:
t = require 'torch'
grad = require 'autograd'

-- define trainable parameters:
params = {
   linear1 = {
      t.randn(50,100), -- note that parameters are transposed (nn convention for nn.Linear)
      t.randn(50),
   },
   linear2 = {
      t.randn(10,50),
      t.randn(10),
   }
}

-- instantiate nn primitives:
-- Note: we do this outside of the eval function, so that memory
-- is only allocated once; moving these calls to within the body
-- of neuralNet would work too, but would be quite slower.
linear1 = grad.nn.Linear(100, 50)
acts1 = grad.nn.Tanh()
linear2 = grad.nn.Linear(50, 10)
acts2 = grad.nn.Tanh()

-- define model
neuralNet = function(params, x, y)
   local h1 = acts1(linear1(params.linear1, x))
   local h2 = acts2(linear2(params.linear2, h1))
   local yHat = h2 - t.log(t.sum(t.exp(h2)))
   local loss = - t.sum(t.cmul(yHat, y))
   return loss
end

-- gradients:
dneuralNet = grad(neuralNet)

-- some data:
x = t.randn(1,100)
y = t.Tensor(1,10):zero() y[1][3] = 1

-- compute loss and gradients wrt all parameters in params:
dparams, loss = dneuralNet(params, x, y)

This code is stricly equivalent to the code above, but will be more efficient (this is especially true for more complex primitives like convolutions, ...).

3rd party libraries that provide a similar API to nn can be registered like this:

local customnnfuncs = grad.functionalize('customnn')  -- requires 'customnn' and wraps it
module = customnnfuncs.MyNnxModule(...)

-- under the hood, this is already done for nn:
grad.nn = grad.functionalize('nn')

On top of this functional API, existing nn modules and containers, with arbitarily nested parameters, can also be wrapped into functions. This is particularly handy when doing transfer learning from existing models:

-- Define a standard nn model:
local model = nn.Sequential()
model:add(nn.SpatialConvolutionMM(3, 16, 3, 3, 1, 1, 1, 1))
model:add(nn.Tanh())
model:add(nn.Reshape(16*8*8))
model:add(nn.Linear(16*8*8, 10))
model:add(nn.Tanh())
-- Note that this model could have been pre-trained, and reloaded from disk.

-- Functionalize the model:
local modelf, params = autograd.functionalize(model)

-- The model can now be used as part of a regular autograd function:
local loss = autograd.nn.MSECriterion()
neuralNet = function(params, x, y)
   local h = modelf(params, x)
   return loss(h, y)
end

-- Note: the parameters are always handled as an array, passed as the first
-- argument to the model function (modelf). This API is similar to the other
-- model primitives we provide (see below in "Model Primitives").

-- Note 2: if there are no parameters in the model, then you need to pass the input only, e.g.:
local model = nn.Sigmoid()
-- Functionalize :
local sigmoid = autograd.functionalize(model)

-- The sigmoid can now be used as part of a regular autograd function:
local loss = autograd.nn.MSECriterion()
neuralNet = function(params, x, y)
   local h = sigmoid(x) -- please note the absence of params arg
   return loss(h, y)
end

Creating auto-differentiated nn modules

For those who have a training pipeline that heavily relies on the torch/nn API, torch-autograd defines the autograd.nn.AutoModule and autograd.nn.AutoCriterion functions. When given a name, it will create a new class locally under autograd.auto.name. This class can be instantiated by providing a function, a weight, and a bias. They are also clonable, savable and loadable. Here we show an example of writing a 2-layer fully-connected module and an MSE criterion using AutoModule and AutoCriterion:

Here we rewrite the neural net example from above, but this time relying on a mix of nn primitives and autograd-inferred gradients:

-- Define functions for modules
-- Linear
local linear  = function(input, weight, bias)
   local y = weight * input + bias
   return y
end

-- Linear + ReLU
local linearReLU  = function(input, weight, bias)
   local y = weight * input + bias
   local output = torch.mul( torch.abs( y ) + y, 0.5)
   return output
end

-- Define function for criterion
-- MSE
local mse = function(input, target)
   local buffer = input-target
   return torch.sum( torch.cmul(buffer, buffer) ) / (input:dim() == 2 and input:size(1)*input:size(2) or input:size(1))
end

-- Input size, nb of hiddens
local inputSize, outputSize = 100, 1000

-- Define auto-modules and auto-criteria
-- and instantiate them immediately
local autoModel = nn.Sequential()
local autoLinear1ReLU = autograd.nn.AutoModule('AutoLinearReLU')(linearReLU, linear1.weight:clone(), linear1.bias:clone())
local autoLinear2 = autograd.nn.AutoModule('AutoLinear')(linear, linear2.weight:clone(), linear2.bias:clone())
autoModel:add( autoLinear1ReLU )
autoModel:add( autoLinear2 )
local autoMseCriterion = autograd.nn.AutoCriterion('AutoMSE')(mse)
-- At this point, print(autograd.auto) should yield
-- {
--   AutoLinearReLU : {...}
--   AutoMSE : {...}
--   AutoLinear : {...}
-- }

-- Define number of iterations and learning rate
local n = 100000
local lr = 0.001
local autoParams,autoGradParams = autoModel:parameters()
local unifomMultiplier = torch.Tensor(inputSize):uniform()

-- Train: this should learn how to approximate e^(\alpha * x)
-- with an mlp aith both auto-modules and regular nn
for i=1,n do
   autoModel:zeroGradParameters()
   local input = torch.Tensor(inputSize):uniform(-5,5):cmul(uniformMultiplier)
   local target = input:clone():exp()
   -- Forward
   local output = autoModel:forward(input)
   local mseOut = autoMseCriterion:forward(output, target)
   -- Backward
   local gradOutput = autoMseCriterion:backward(output, target)
   local gradInput = autoModel:backward(input, gradOutput)
   for i=1,#autoParams do
      autoParams[i]:add(-lr, autoGradParams[i])
   end
end

Gradient checks

For ease of mind (and to write proper tests), a simple grad checker is provided. See test.lua for complete examples. In short, it can be used like this:

-- Parameters:
local W = t.Tensor(32,100):normal()
local x = t.Tensor(100):normal()

-- Function:
local func = function(inputs)
   return t.sum(inputs.W * inputs.x)
end

-- Check grads wrt all inputs:
tester:assert(gradcheck(func, {W=W, x=x}), 'incorrect gradients on W and x')

Model Primitives

To ease the construction of new models, we provide primitives to generate standard models.

Each constructor returns 2 things:

f: the function, can be passed to grad(f) to get gradients
params: the list of trainable parameters

Once instantiated, f and params can be used like this:

input = torch.randn(10)
pred = f(params, input)
grads = autograd(f)(params, input)

Current list of model primitives includes:

autograd.model.NeuralNetwork

API:

f,params = autograd.model.NeuralNetwork({
   -- number of input features:
   inputFeatures = 10,

   -- number of hidden features, per layer, in this case
   -- 2 layers, each with 100 and 10 features respectively:
   hiddenFeatures = {100,10},

   -- activation functions:
   activations = 'ReLU',

   -- if true, then no activation is used on the last layer;
   -- this is useful to feed a loss function (logistic, ...)
   classifier = false,

   -- dropouts:
   dropoutProbs = {.5, .5},
})

autograd.model.SpatialNetwork

API:

f,params = autograd.model.SpatialNetwork({
   -- number of input features (maps):
   inputFeatures = 3,

   -- number of hidden features, per layer:
   hiddenFeatures = {16, 32},

   -- poolings, for each layer:
   poolings = {2, 2},

   -- activation functions:
   activations = 'Sigmoid',

   -- kernel size:
   kernelSize = 3,

   -- dropouts:
   dropoutProbs = {.1, .1},
})

autograd.model.RecurrentNetwork

API:

f,params = autograd.model.RecurrentNetwork({
   -- number of input features (maps):
   inputFeatures = 100,

   -- number of output features:
   hiddenFeatures = 200,

   -- output is either the last h at step t,
   -- or the concatenation of all h states at all steps
   outputType = 'last', -- or 'all'
})

autograd.model.RecurrentLSTMNetwork

API:

f,params = autograd.model.RecurrentLSTMNetwork({
   -- number of input features (maps):
   inputFeatures = 100,

   -- number of output features:
   hiddenFeatures = 200,

   -- output is either the last h at step t,
   -- or the concatenation of all h states at all steps
   outputType = 'last', -- or 'all'
})

Loss Primitives

Similarly to model primitives, we provide common loss functions in autograd.loss:

-- cross entropy between 2 vectors:
-- (for categorical problems, the target should be encoded as one-hot)
loss = loss.crossEntropy(prediction, target)

-- binary cross entropy - same as above, but labels are considered independent bernoulli variables:
loss = loss.binaryEntropy(prediction, target)

-- least squares - mean square error between 2 vectors:
loss = loss.leastSquares(prediction, target)

Gradients of gradients

autograd can be called from within an autograd function, and the resulting gradients can used as part of your outer function:

local d = require 'autograd'
d.optimize(true)
local innerFn = function(params)
   -- compute something...
end
local ddf = d(function(params)
   local grads = d(innerFn)(params)
   -- do something with grads of innerFn...
end)
local gradGrads = ddf(params) -- second order gradient of innerFn

### Debugging and fine-grain control

Debugging hooks can be inserted when wrapping the function with autograd. The debugger will turn off any optimizations and insert NaN/Inf checks after every computation. If any of these trip the debugHook will be called with a message providing as much information as possible about the offending function, call stack and values. The debugHook also provides an interface to save or render a GraphViz dot file of the computation graph. We don't recommend leaving the debugHook installed all the time as your training speed will be significantly slower.

grad(f, {
   debugHook = function(debugger, msg, gen)
      -- dump a dot representation of the graph:
      debugger.generateDot('result.dot')

      -- or show it (OSX only, uses Safari):
      debugger.showDot()

      -- print the generated source line that caused the inf/nan
      print(string.split(gen.source, "\n")[gen.line])
   end
})

Consider this usage of autograd, it clearly contains a divide by zero.

local W = torch.Tensor(32,100):fill(.5)
local x = torch.Tensor(100):fill(.5)
local func = function(inputs)
   return torch.sum(torch.div(inputs.W * inputs.x, 0))  -- DIV ZERO!
end
local dFunc = autograd(func, {
   debugHook = function(debugger, msg)
      debugger.showDot()
      print(msg)
      os.exit(0)
   end
})
dFunc({W=W, x=x})

Will output:

autograd debugger detected a nan or inf value for locals[1]
   1: fn@path/to/code/example.lua:4

And render in Safari as:

Finer-grain control over execution can also be achieved using these flags:

-- All of these options default to true:
grad(f, {
   withForward = true | false,    -- compute the forward path
   withGradients = true | false,  -- compute the gradients (after forward)
   partialGrad = true | false     -- partial grad means that d(f) expects grads wrt output
})

-- Running this:
pred = grad(f, {withForward=true, withGradients=false})(inputs)
-- is equivalent to:
pred = f(inputs)
-- ... but the function is compiled, and benefits from tensor re-use!

License

Licensed under the Apache License, Version 2.0. See LICENSE file.

torch-autograd's People

Contributors

Stargazers

Watchers

Forkers

crovis soumith codeaudit salemameen adityachivu zencoding rockt eriche2016 wavelets ian09 peratham oztc gela noa shyamalschandra jmrinaldi quangduytran qnix gitter-badger michaelten fedorajzf wgapl technologiclee davidbelanger tempbottle szagoruyko dyzhou2015 ajtulloch ml-ai-nlp-ir farizikhwantri zakattacktwitter caohy1988 gloine 1206lyp tworec ameenetemady as1986 cheng6076 xurantju wangg12 nicholas-leonard tuananhle7 velioglu hfxunlp alband dexter1691 amartya18x bartvm scitao mkolod linearregression ghostcow jakesnell veterun hangjun allanzelener pranjaldaga gbaydin deeplearningsprint vyraun bedeedidiong eugenium mnick preethamsp vanpersie32 neeraj12121 luciany currymj soledad89 bigtaor rugby110 skinscanner yalechang ml-lab abdullahjamal s-ai nabihach alexxnica kryndex alexgian hoangcuong2011 pacifly reneschaub tony32769 gxieaa csgaobb robot-ai-machinelearning dibyendumajumdar kittyyinhui hibiscuses jiapei100 afcarl velconia zge elikosan vmuthuk2 stjordanis jason-cooke stanleyjacob dushyanttara

torch-autograd's Issues

repeatTensor

I'm seeing the error:

/usr/local/share/lua/5.1/autograd/direct/DirectTape.lua:56: attempt to index field 'gradFun' (a nil value)

in a test involving repeatTensor. It looks like it might just not be supported yet?

Variable sized input and target tensors

In some cases it's useful to have variable sized input and target tensors and batch sizes, e.g. RNNs. This was reasonably efficient before the Nov 16 overhaul but I'm now seeing extreme slowdowns after the overhaul. Any plans to introduce a flag to switch between the pre- and post-overhaul behaviors, or suggestions on how to improve performance? I assume that moving to fixed BPTT dimension and batch size would address the speed issue (to match the provided LM example), but maybe there's another way to improve performance without changing the data?

Error "attempt to compare number with table" when parameters involved in comparison

I would like to differentiate functions involving Bernoulli trials whose probabilities depend on the parameters, where I would like the result of the trial to be treated as a constant by the differentiation.
The following code, which tries to do that, raises "attempt to compare number with table":

t = require 'torch'
grad = require 'autograd'

params = {
   a = t.randn(1,1)
}

f = function(params, x)
   local result = t.sum(x * params.a)

   -- sample from Bernoulli dist
   local bernoulli = torch.FloatTensor(1):zero()
   if torch.uniform() < t.sum(params.a) then
         bernoulli[1] = 1
   end

   return t.sum(bernoulli * result)
end

df = grad(f)

x = t.randn(1,1)

print(f(params, x))

print(df(params, x))

Replacing the if block by decision[1] = t.bernoulli(t.sum(params.a)), which should do the same thing, also raises an error. Replacing the condition by 0.5 < t.sum(params.a) does not eliminate the error, which seems to show that the problem is not the use of random numbers.

Is there a way to tell autograd not to try to differentiate through comparisons and instead treat the resulting truth values as constants (which is possible in Theano)?

Constant tensor with more than one dimension

Hi, been hitting into this error quite frequently. Have this error when I declare a new Tensor and pass it into a torch function.

local d = require 'autograd'
d.optimize(true)

local lossf = d.nn.ClassNLLCriterion()
local f = function(params, y)
  local yhat = torch.Tensor(2, 400003):uniform(-0.1,0.1)

  -- also, if you uncomment this block, the autograd will print out the entire tensor :\
  -- local t = {
  --   torch.Tensor(400003):double():uniform(-0.1, 0.1),
  --   torch.Tensor(400003):double():uniform(-0.1, 0.1)
  -- }
  -- yhat = torch.cat(t,1)

  local loss = lossf(yhat, y)
  return loss
end

local df = d(f)
local y = torch.Tensor({9516,400003})
local params = {}

df(params, y)

/torch/install/bin/luajit: .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:25: constant tensor with more than one dimension. is this an upvalue that should be a function argument?
stack traceback:
    [C]: in function 'error'
    .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:25: in function 'init'
    .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:10: in function 'new'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:19: in function 'lossf'
    train.lua:154: in function 'fn'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:281: in function 'protectedFn'
    ...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:309: in function 'record'
    .../install/share/lua/5.1/autograd/runtime/codegen/init.lua:44: in function 'generateFn'
    .../install/share/lua/5.1/autograd/runtime/codegen/init.lua:66: in function 'df'
    train.lua:161: in main chunk
    [C]: in function 'dofile'
    ...gfei/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

Remove unnecessary trepl require

Since trepl is not used in the main autograd source, only in the benchmark file, it could probably be removed as a dependency from the init.lua file. See this commit:

762a404

Got NaN gradients after N batches on FloatTensor.

Hi,

On Cuda, I'm getting NaN for all gradients after 114 batches in my model. They are fine when trained on CPU. I have tried using different learning rate (00.2, 0.0002), and they returns NaN after the same batch number.

On Cuda, if I increase the model's hidden dimension, I got out of memory error after batches 28. (Each batch has similar amount of data.)

Could this be memory leak? Can you advise how should I debug this?

Grads needed

A node type was not returned.

Hi,
I write a recursive autoencoder model, but get those message :

A node type was not returned. This is either because a gradient was not defined, or the input is independent of the output.

require 'torch'
grad = require 'autograd'


embeddingSize = 5

-- model parameters
-- encode parameters:
--      encodeW1, encodeW2, encodeB1
--
-- decode parameters
--      decodeW1, decodeW2, decodeB1, decodeB2
model_params = {
    W = {
        torch.randn(embeddingSize, embeddingSize), -- encodeW1
        torch.randn(embeddingSize, embeddingSize), -- encodeW2
        torch.randn(embeddingSize, embeddingSize), -- decodeW1
        torch.randn(embeddingSize, embeddingSize)  -- decodeW2
    },
    b = {
        torch.randn(embeddingSize, 1), -- encodeB1
        torch.randn(embeddingSize, 1), -- decodeB1
        torch.randn(embeddingSize, 1)  -- decodeB2
    }
}


--[[
    parameters:
        - input
            sentence embedding; the i-th column as w_i's enmbedding 
--]]
-- pass model_params to RAEModel
RAEModel = function(params, input)
    if input:size(2) == 1 then 
        return 0
    end

    local nodes = torch.Tensor(embeddingSize, input:size(2) * 2 - 1)
    local candidates_index = torch.range(1, input:size(2))
    candidates_index = candidates_index:long()
    nodes:indexCopy(2, candidates_index, input)
    local loss = 0.0
    -- run (sentence length - 1) turns
    for turn = 1, input:size(2) - 1 do
        local candidates = nodes:index(2, candidates_index)

        local C1 = candidates:narrow(2, 1, candidates:size(2) - 1) 
        local C2 = candidates:narrow(2, 2, candidates:size(2) - 1) 

        --     tanh(encodeW1 * C1 + encodeW2 * C2 + encodeB1)
        local P = torch.tanh(params.W[1] * C1 + params.W[2] * C2 +
                    torch.expand(params.b[1], C1:size(1), C1:size(2)))

        -- decode
        --      tanh(decodeW1 * P + decodeB1)
        --      tanh(decodeW2 * P + decodeB2)
        local C1Rec = torch.tanh(params.W[3] * P +
                        torch.expand(params.b[2], C1:size(1), C1:size(2)))

        local C2Rec = torch.tanh(params.W[4] * P +
                        torch.expand(params.b[3], C1:size(1), C1:size(2)))

        -- reconstruction error
        local recC1Error = C1Rec - C1
        local recC2Error = C2Rec - C2


        local recError = torch.sum(torch.pow(recC1Error, 2) +  
                                    torch.pow(recC2Error, 2), 1)

        -- get best pos 
        local mask = torch.eq(recError, torch.min(recError))[1]
        local index = torch.nonzero(mask)[1][1]
        loss = loss + recError[1][index]

        -- generate new candidates
        local p = P[{{}, {index}}]

        -- add p to nodes, at location input:size(2) + i
        nodes:indexCopy(2, torch.LongTensor{input:size(2) + turn}, p)

        -- Debug
        print('============' .. turn .. '===============')
        print('====> reconstruction error')
        print(recError[1])
        print('====> index: ' .. index)
        print('====> content in candidates_index: ')
        print(candidates_index)
        print('====> new generate content: ')
        print(nodes[{{}, input:size(2) + turn}])

        -- get the result, then return
        if turn == input:size(2) - 1 then
            print('====> nodes')
            print(nodes)
            return loss
        end


        -- generat new candidates index
        -- replace index and (index + 0) with (inputSize(2) + turn)
        -- if there is only one candidates, we get the result
        candidates_index[index] = input:size(2) + turn
        if index == 1 then
            torch.cat(candidates_index, candidates_index[{{index}}],
                                        candidates_index[{{index+2,
                                            candidates_index:size(1)}}])

        elseif index == candidates_index:size(1) - 1 then 
            torch.cat(candidates_index, candidates_index[{{1, 
                                                            index - 1}}],
                                        candidates_index[{{index}}])

        else
            torch.cat(candidates_index, candidates_index[{{1, 
                                                            index}}],
                            candidates_index[{{index+2, 
                                            candidates_index:size(1)}}])
        end

    end
end


input = torch.randn(5, 10)
dmodel = grad(RAEModel)
grad, loss = dmodel(model_params, input)

print('=========== Result ===============')
print('====> loss:')
print(loss)

Support assignment

It's annoying to not be able to assign values to tensors. For example, we don't support any operations that look like

x[i] = y

We can easily support this via a utility, like

x = util.set(x, i, y)

which will return a new wrapped value that we can track. However, using the built-in __newindex and __index syntax is a great deal more convenient. We have some ideas cooking up on how to support this. Just need a gradient for __newindex, and then some bookkeeping magic.

Proxy class methods for torch Tensors

Right now, A:size() will fail if A is a Node (if it's been wrapped to track computation in the forward pass).
However, getValue(A):size() will work.
Should proxy all class functions through the Node to the tensor.

Adding two Criterions?

Hi,

What are the possible ways to return the sum of two loss? ParallelCriterion doesn't seem to have the :add() function.

Tried the code below, no luck too. But it did successfully sum the values.

local lossf1 = d.nn.ClassNLLCriterion()
local lossf2 = d.nn.ClassNLLCriterion()
local loss1 = lossf1(attentionGates, attentionY)
local loss2 = lossf2(predictions, y)

loss = loss1 + loss2

print(loss1.value)
print(loss2.value)
print(loss.value)

return loss

-- output
0.68027830123901    
12.994906425476 
13.675184726715 

/torch/install/share/lua/5.1/torch/Tensor.lua:462: Wrong size for view. Input size: 2x1. Output size: 1x400004
stack traceback:
    [C]: in function 'error'
    /home/yongfei/torch/install/share/lua/5.1/torch/Tensor.lua:462: in function 'gf'
    ...all/share/lua/5.1/autograd/runtime/direct/DirectTape.lua:58: in function 'gradOnly'

A node type was not returned. This is either because a gradient was not defined, or the input is independent of the output

I am basically mimicking your example code for autocriterion, but with my own function, but i get an error, here is the function in question:

autograd = require 'autograd'

local cmd = torch.CmdLine()

cmd:text()
cmd:text('Script for training model.')

cmd:option('-inputSize' , 61, 'number of input dimension')
cmd:option('-dimSize' , 2, 'dim size for U')
cmd:option('-batchSize' , 4, 'mini batch size')
cmd:option('-numMixture' , 10, 'number of mixture components in output layer')

cmd:text()
opt = cmd:parse(arg)

function logDet(matrix)
    output = torch.zeros(opt.batchSize, 1)
    for i = 1, opt.batchSize do
        eig_vals = (torch.eig(matrix[i], 'N'))
        output[i] = (torch.log(eig_vals:select(2, 1))):sum()
    end
    return output
end

function inverse(matrix)
    output = torch.zeros(opt.batchSize, opt.inputSize, opt.inputSize)
    for i = 1, opt.batchSize do
        output[i] = (torch.inverse(matrix[i]))
    end
    return output
end

function getEps()
   eps = torch.eye(opt.inputSize,opt.inputSize) * 1e-2
   eps:resize(1,opt.inputSize,opt.inputSize)
   fulleps = eps:clone()
   for i = 2, opt.batchSize do
       fulleps = torch.cat(fulleps,eps,1)
    end
    return fulleps
end

local mixtureMultvarGauss = function(input, target)
    local sizeMeanInput = opt.inputSize * opt.numMixture
    local sizeCovarianceInput = opt.inputSize * opt.numMixture * opt.dimSize

    local piStart = 1
    local piEnd = opt.numMixture
    local hat_pi_t = input[{{},{piStart,piEnd}}]

    local muStart = piEnd + 1
    local muEnd = piEnd + sizeMeanInput
    local hat_mu_t = input[{{},{muStart,muEnd}}]

    local sigmaStart = muEnd + 1
    local sigmaEnd = muEnd + sizeCovarianceInput
    local hat_sigma_t = input[{{},{sigmaStart,sigmaEnd}}]

    local mask = input[{{},{sigmaEnd + 1}}]

    hat_mu_t:resize(opt.batchSize, opt.numMixture, 1, opt.inputSize)
    hat_sigma_t:resize(opt.batchSize, opt.numMixture, opt.dimSize, opt.inputSize)
    target:resize(opt.batchSize, 1, opt.inputSize)
    eps = getEps()

    local join_mixture_result = torch.zeros(opt.batchSize, opt.numMixture)

    for i = 1, opt.numMixture do
        local u = hat_sigma_t[{{},{i},{},{}}]:squeeze(2)
        local mu = hat_mu_t[{{},{i},{},{}}]:squeeze(2)
        local pi = hat_pi_t[{{},{i}}]
        local sigma = torch.bmm(u:transpose(2,3), u)
        sigma:add(eps)
        local det_sigma_2_pi = logDet(sigma) + (opt.inputSize * torch.log(2 * math.pi)) 
        local sqr_det_sigma_2_pi = (det_sigma_2_pi) * -0.5

        local target_mu = target - mu 
        local transpose_target_mu = target_mu:transpose(2,3)

        local inv_sigma = inverse(sigma)
        local transpose_target_mu_sigma = torch.bmm(target_mu, inv_sigma)

        local transpose_target_mu_sigma_target_mu = torch.bmm(transpose_target_mu_sigma, 
            transpose_target_mu)

        local exp_term = transpose_target_mu_sigma_target_mu * -0.5

        local mixture_result = pi + sqr_det_sigma_2_pi + exp_term

        join_mixture_result[{{},{i}}] = mixture_result
    end

    local max_mixture = torch.max(join_mixture_result, 2)
    local max_expanded = max_mixture:expandAs(join_mixture_result) * -1
    local norm_mixture = max_expanded + join_mixture_result
    local norm_mixture_exp = torch.exp(norm_mixture)
    local norm_mixture_sumexp = torch.sum(norm_mixture_exp, 2)
    local norm_mixture_logsumexp = torch.log(norm_mixture_sumexp)
    local norm_mixture_addlogsumexp = (max_mixture + norm_mixture_logsumexp)*-1

    return torch.cmul(mask, norm_mixture_addlogsumexp)
end 

local autoMseCriterion = autograd.nn.AutoCriterion('AutoMixGauss')(mixtureMultvarGauss)

pi = torch.rand(4,10)
mask = torch.ones(4,1)
sig = torch.rand(4, 61*2*10)
mu = torch.rand(4, 61*10)
input = torch.cat(pi, mu, 2)
input = torch.cat(input, sig, 2)
input = torch.cat(input, mask, 2)
target = torch.rand(4,61)
print(autoMseCriterion:forward(input, target))

Failed to parse generated code

Hi,

I'm having problem running with optimized enabled.
I print out the error of loadstring/load function in init.lua, here's the error.

[string "return function(locals, rlocals, vlocals, mod..."]:634: function at line 38 has more than 60 upvalues  
/home/yongfei/torch/install/bin/luajit: ...re/lua/5.1/autograd/runtime/codegen/backend/lua/init.lua:760: failed to parse generated code

And here's the generated code.

https://gist.github.com/yongfei25/8992651ec5fd55805b7f

The error above was running on CPU, when I try on GPU, got

/home/ubuntu/lib/torch/install/bin/luajit: ...nstall/share/lua/5.1/autograd/runtime/codegen/Source.lua:35: invalid value (table) at index 1 in table for 'concat'
stack traceback:
    [C]: in function 'concat'
    ...nstall/share/lua/5.1/autograd/runtime/codegen/Source.lua:35: in function 'symbolPath'

non-optimizing still optimizes

i think it's the case that somewhere around commit cd6b9e0 optimization started happening even when not in optimization mode. main.lua says this around line 32:

   if optimize then
      return RuntimeCodegen.create(fn, opt)
   else
      return RuntimeDirect.create(fn, opt)
   end

i believe the test should be on opt.optimize?

'nodeApply' is nil when calling functionalized nn module

When I run the code below

require 'env'
local t = require 'torch'
local ag = require 'autograd'

local n_enc = 10

local nn_enc = nn.Sequential()
:add(nn.SpatialConvolutionMM(1, 16, 3, 3, 1, 1, 1, 1))
:add(nn.Tanh())
:add(nn.SpatialMaxPooling(2,2,2,2))
:add(nn.SpatialConvolutionMM(16, 32, 3, 3, 1, 1, 1, 1))
:add(nn.Tanh())
:add(nn.Reshape(32*16*16))
:add(nn.ConcatTable()
    :add(nn.Linear(32*16*16, n_enc))
    :add(nn.Linear(32*16*16, n_enc))
    )

-- quick test
local x = torch.randn(10, 1, 32, 32)
local y_nn = nn_enc:forward(x)

-- Functionalize the model:
local ag_enc, enc_params = ag.functionalize(nn_enc)

local y_ag = ag_enc(enc_params, x)
print{params=params,y_ag=y_ag,y_nn=y_nn,x=x}

I get the error:

autograd/nnwrapper.lua:291: attempt to call upvalue 'nodeApply' (a nil value)
stack traceback:
[C]: in function 'nodeApply'
.../autograd/nnwrapper.lua:291: in function 'ag_enc'
test.lua:29: in main chunk

Errors with the "Reorganize codegen" commit

Commit 29a2399 introduced various errors that should get fixed.

In two places, the require paths were not updated correctly:

In src/gradfuns.lua, line 2 should be
local DirectNode = require 'autograd.runtime.direct.DirectNode'

In src/runtime/direct/DirectTape.lua, line 3 should be
local DirectNode = require 'autograd.runtime.direct.DirectNode'

The reason these were not caught by tests is because of an error that was simultaneously introduced in CMakeLists.txt. If you take a look at https://travis-ci.org/twitter/torch-autograd/jobs/96998417 you see that the files under src/runtime are getting built multiple times. This has the effect of installing the src/runtime/direct/DirectNode.lua file into autograd/direct/DirectNode.lua, an incorrect location.

Instead, lines 15 - 18 of CMakeLists.txt:

INSTALL(DIRECTORY "src/runtime/direct" DESTINATION "${Torch_INSTALL_LUA_PATH_SUBDIR}/autograd")
INSTALL(DIRECTORY "src/runtime/codegen" DESTINATION "${Torch_INSTALL_LUA_PATH_SUBDIR}/autograd")
INSTALL(DIRECTORY "src/runtime/codegen/backend" DESTINATION "${Torch_INSTALL_LUA_PATH_SUBDIR}/autograd")
INSTALL(DIRECTORY "src/runtime/codegen/backend/lua" DESTINATION "${Torch_INSTALL_LUA_PATH_SUBDIR}/autograd")

should be removed. I wonder if there's a way to even install all the files under src/, rather than having to list out all the subfolders.

Code that worked in initial release now fails with "luajit: not enough memory"

The example in this repository (currently at commit 6a03a4c)

https://github.com/thouis/denoise_cnn

Worked under the initial release (September 6, 2015). After the newest release (September 16), it no longer works, getting an out of memory error when run.

This is on an 8GB macbook pro.

train-penn-lstm.lua crashes after finishing 1st epoch

Running th train-penn-lstm.lua without passing any parameters (processing done on CPU, default parameters) I got the following error right after the first epoch finishes training and validation/test perplexities are computed:

Test set [just indicative, not used for training]...
Test set perplexity = 176.93627527427

Training Epoch #2
/Users/jfsantos/torch/install/bin/luajit: [string "return function(locals, rlocals, vlocals, Cla..."]:35: attempt to perform arithmetic on local 'p4' (a table value)
stack traceback:
    [string "return function(locals, rlocals, vlocals, Cla..."]:35: in function 'df'
    train-penn-lstm.lua:185: in main chunk
    [C]: in function 'dofile'
    ...ntos/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
    [C]: at 0x010b42a320

p4 seems to be an automatically generated variable inside the df function, but I have no clue on how I could debug that. Please let me know if I can run any specific tests to help fixing this issue.

"expected FloatTensor in tensor array" error

After pulling the latest commits I am running into the following error on the LSTM example:
[string "return function(locals, rlocals, vlocals, cri..."]:254: expected FloatTensor in tensor array
stack traceback:
[C]: in function 'torch_FloatTensor_cat'
[string "return function(locals, rlocals, vlocals, cri..."]:254: in function 'df'
train-penn-lstm.lua:183: in main chunk
[C]: in function 'dofile'
....edu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405b50

Similar errors also occur when I changed to the CUDA backend. What has gone wrong?

Convolutional LSTM

I'm implementing convolution LSTMs with autograd - the pseudo code is here https://github.com/ankurhanda/convlstm.autograd/blob/master/RecurrentConvLSTMNetwork.lua. The code is adapted from RecurrentLSTMNetwork.lua in src/model which is very nicely structured. However, I always get bad argument #1 (field weight does not exist) in function 'conv_ixinput' when I test my RecurrentConvLSTMNetwork.lua. I wonder if nn.SpatialConvolution allows conv(x, W, b). Otherwise what's the best way to implement this while keeping the structure similar to RecurrentLSTMNetwork?

Placing Tensors into Tables for Bookkeeping Creates Infinite Loops

Bookkeeping in LSTMs is made easier with tables, but currently a hack is required.

Mac OS X install

Even after updating to the latest torch-distro, I'm unable to install. The log suggests it's not happy with my gcc installation, but gcc was capable of installing torch. I'm on El Capitan.

[davidkelley]$ sudo luarocks make
Password:
cmake -E make_directory build && cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/Users/davidkelley/software/torch/install/bin/.." -DCMAKE_INSTALL_PREFIX="/Users/davidkelley/software/torch/install/lib/luarocks/rocks/autograd/scm-1" && make
-- The C compiler identification is GNU 4.9.2
-- The CXX compiler identification is GNU 4.9.2
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc-4.9
-- Check for working C compiler: /usr/local/bin/gcc-4.9 -- broken
CMake Error at /usr/local/Cellar/cmake/3.2.2/share/cmake/Modules/CMakeTestCCompiler.cmake:61 (message):
  The C compiler "/usr/local/bin/gcc-4.9" is not able to compile a simple
  test program.

  It fails with the following output:

   Change Dir: /Users/davidkelley/software/torch-autograd/build/CMakeFiles/CMakeTmp

  Run Build Command:"/usr/bin/make" "cmTryCompileExec384197817/fast"

  /Applications/Xcode.app/Contents/Developer/usr/bin/make -f
  CMakeFiles/cmTryCompileExec384197817.dir/build.make
  CMakeFiles/cmTryCompileExec384197817.dir/build

  /usr/local/Cellar/cmake/3.2.2/bin/cmake -E cmake_progress_report
  /Users/davidkelley/software/torch-autograd/build/CMakeFiles/CMakeTmp/CMakeFiles
  1

  Building C object
  CMakeFiles/cmTryCompileExec384197817.dir/testCCompiler.c.o

  /usr/local/bin/gcc-4.9 -o
  CMakeFiles/cmTryCompileExec384197817.dir/testCCompiler.c.o -c
  /Users/davidkelley/software/torch-autograd/build/CMakeFiles/CMakeTmp/testCCompiler.c

  Linking C executable cmTryCompileExec384197817

  /usr/local/Cellar/cmake/3.2.2/bin/cmake -E cmake_link_script
  CMakeFiles/cmTryCompileExec384197817.dir/link.txt --verbose=1

  /usr/local/bin/gcc-4.9 -Wl,-search_paths_first
  -Wl,-headerpad_max_install_names
  CMakeFiles/cmTryCompileExec384197817.dir/testCCompiler.c.o -o
  cmTryCompileExec384197817

  ld: library not found for -lSystem

  collect2: error: ld returned 1 exit status

  make[1]: *** [cmTryCompileExec384197817] Error 1

  make: *** [cmTryCompileExec384197817/fast] Error 2

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):

-- Configuring incomplete, errors occurred!
See also "/Users/davidkelley/software/torch-autograd/build/CMakeFiles/CMakeOutput.log".
See also "/Users/davidkelley/software/torch-autograd/build/CMakeFiles/CMakeError.log".

Error: Build error: Failed building.

Cloning AutoModule fails

The following simple AutoModule fails at module:clone() with the error: torch/File.lua:107: Unwritable object

local autograd = require 'autograd'

local f = function(x, W, b)
  return x * W + torch.expand(b, torch.size(x, 1), torch.size(b, 2))
end

local batchSize = 5
local inputFeatures = 6
local outputFeatures = 7

local weight = torch.rand(inputFeatures, outputFeatures)
local bias = torch.rand(1, outputFeatures)
local module = autograd.nn.AutoModule('AutoModule')(f, weight, bias)

module:clone()

Is cloning not supported or is it a bug ?

Thanks

Documentation Needed

Hi,

Thanks for creating autograd for torch7 and It's really cool, It will good if we have documentation for autograd package. Examples are good and it's helps to get started with this package but I think we need documentation too which will makes much more easier to use this package.

LSTM now only support batch size of 1

I had implemented an LSTM using the initial release. Since updating to the latest release my code fails with the error error("constant tensor with more than one dimension") due to the input to the LSTM layer being a 3-dimensional tensor, i.e. a batch of multi-dimensional sequences. Is this no longer supported functionality?

Problem with optim interoperability

I have tried to modify the train-mnist-logistc.lua example to use optim for the SGD optimization (code below). When I run this I get the error nil parameter value from autograd/Value.lua originating from line 19 in the code. What am I doing wrong?

-- Libs
local grad = require 'autograd'
local util = require 'autograd.util'
local lossFuns = require 'autograd.loss'
local optim = require 'optim'

grad.optimize(true)

-- Load in MNIST
local trainData, testData, classes = require('./get-mnist.lua')()
local inputSize = trainData.x[1]:nElement()
local confusionMatrix = optim.ConfusionMatrix(classes)

-- What model to train:
local predict,f,params

-- Define our neural net
function predict(params, input, target)
   local h1 = input * params.W[1] + params.B[1]
   local out = util.logSoftMax(h1)
   return out
end

-- Define our loss function
function f(params, input, target)
   local prediction = predict(params, input, target)
   local loss = lossFuns.logMultinomialLoss(prediction, target)
   return loss
end

-- Define our parameters
-- [-1/sqrt(#output), 1/sqrt(#output)]
torch.manualSeed(0)
local W1 = torch.FloatTensor(inputSize,#classes):uniform(-1/math.sqrt(#classes),1/math.sqrt(#classes))
local B1 = torch.FloatTensor(#classes):fill(0)

-- Trainable parameters:
params = {
   W = {W1},
   B = {B1},
}

state = {
   learningRate = 1e-3,
   momentum = 0.5
}

-- Train a neural network
for epoch = 1,100 do
   print('Training Epoch #'..epoch)
   for i = 1,trainData.size do
      -- Next sample:
      local input = trainData.x[i]:view(1,inputSize)
      local target = torch.view(trainData.y[i], 1, 10)

      local feval = grad(f, { optimize = true })

      optim.sgd(feval,params,state)

      print(loss)
   end
end

network crashes when run with CUDA (dropout-related?)

The example from #17 now works in the new release, again (thanks!).

However, when I change it to use CUDA (on the https://github.com/thouis/denoise_cnn/tree/cuda branch), it crashes, with:

/Users/thouis/torch/install/bin/luajit: /Users/thouis/torch/install/share/lua/5.1/autograd/Node.lua:25: constant tensor with more than one dimension
stack traceback:
    [C]: in function 'error'
    /Users/thouis/torch/install/share/lua/5.1/autograd/Node.lua:25: in function 'init'
    /Users/thouis/torch/install/share/lua/5.1/autograd/Node.lua:10: in function 'new'
    /Users/thouis/torch/install/share/lua/5.1/autograd/main.lua:126: in function 'bernoulli'
    /Users/thouis/torch/install/share/lua/5.1/autograd/util.lua:60: in function 'regularize'
    fcnnlr.lua:72: in function 'predict'
    fcnnlr.lua:88: in function 'fn'
    /Users/thouis/torch/install/share/lua/5.1/autograd/main.lua:403: in function 'createGraph'
    /Users/thouis/torch/install/share/lua/5.1/autograd/main.lua:650: in function 'generateCode'
    /Users/thouis/torch/install/share/lua/5.1/autograd/main.lua:903: in function 'df'
    fcnnlr.lua:176: in main chunk
    [C]: in function 'dofile'
    ...ouis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010b90d2d0

Run with

th fcnnlr.lua --type cuda z00000123_orig.png z00000123_labels.png

To see the crash, or remove --type cuda to see it work.

grad of grad

Below I've pasted some self-contained code to test the gradient-of-gradient stuff that was added recently to the README. It's described in the comments in the code. Unfortunately, it crashes, with this error:

/usr/local/torch/install/bin/luajit: .../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:135: missing gradient for function util.fillSameSizeAsInPlace
stack traceback:
[C]: in function 'error'
.../install/share/lua/5.1/autograd/runtime/codegen/Node.lua:135: in function 'evaluateBackward'
...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:298: in function 'protectedFn'
...install/share/lua/5.1/autograd/runtime/codegen/Graph.lua:309: in function 'record'
.../install/share/lua/5.1/autograd/runtime/codegen/init.lua:44: in function 'generateFn'
.../install/share/lua/5.1/autograd/runtime/codegen/init.lua:66: in function 'ddf'
testGradGrad2.lua:32: in main chunk

Has anyone seen this before?
Thanks, David

local grad = require 'autograd'
grad.optimize(true)

--simple linear model with squared loss
local params = torch.randn(100,10)

local innerFn = function(params, x, y)
local yHat = x*params
local squaredLoss = torch.sum(torch.pow(y - yHat,2))
return squaredLoss
end

local dneuralNet = grad(innerFn)

--synthetic data
local x = torch.randn(1,100)
local y = torch.randn(1,10)

print('first derivative size')
print(dneuralNet(params,x,y):size())

--the outer function computes the sum of the gradient of the neural network. Therefore, differentiating this returns the diagonal of the Hessian
local outerFn = function(params,x,y)
local grad = dneuralNet(params,x,y)
local sum = torch.sum(grad)
return sum
end

print('outer function value')
print(outerFn(params,x,y))

local ddf = grad(outerFn)
local gradGrads = ddf(params,x,y)

print('second derivative')
print(gradGrads)

Docs or example for writing gradfuns

I have a workaround for specific application of #50 that involves using torch.eq(A, B), but returning the same type as A or B. I was able to hack this into my system by putting this function into the torch module, and adding an entry in autograd.lua to define gradients (zero everywhere).

I'm sure there's a much cleaner way to do this, without making changes to autograd itself, but this was the easiest way I could discern. A short example would be enough, I think.

support to torch.FloatTensor.view

Hi,

Thanks for this really interesting module!

I'd like to try a contribution to torch-autograd. torch.FloatTensor.view isn't supported yet but I believe it can be pretty useful.
My first goal was to calculate a set of pairwise distances between the rows of two matrices. In python that would be

sq = ((X[:, None, :] - Z)**2).sum(-1)

I was trying the same in Torch with following code:

local x = X:view(X:size(1), 1, X:size(2))
 x = x:expand(X:size(1), X:size(2), X:size(2))
local z = Z:view(Z:size(1), Z:size(2), 1)
z = z:expandAs(x)
local diffs = torch.sum(x - z, 3)
local sq = torch.cmul(diffs, diffs)

But autograd doesn't support torch.view yet. But I believe that view only reshapes the gradient, without needing any extra operations. The same should happen to expand.

So, my question is, could guys either tell me how I can contribute with support to view or suggest me another way to get the sq matrix above?

Incorrect gradfun for torch.RepeatTensor

I cannot submit a pull request right now, but there is an issue using torch.RepeatTensor with non-double tensors.

Within the gradient function for torch.RepeatTensor in gradfuns.lua, torch.cat is called but this is incorrect for non-double tensors.

The correct module should call util.cat. The new module would be:

 module.gradient("repeatTensor", {
      function(g, ans, x, ...)
         local Dg = torch.nDimension(g)
         local Dx = torch.nDimension(x)
         for i=Dx,1,-1 do
            local D = torch.nDimension(g)
            local c = util.cat(torch.split(g,torch.size(x,i), Dg-Dx+i), D+1)
            g = torch.squeeze(torch.sum(c,D+1))
         end
         for i=1,Dg-Dx do
            g = torch.squeeze(torch.sum(g,1))
         end
         return g
      end,
      function(g, ans, ...) return nil end,
      function(g, ans, ...) return nil end,
      function(g, ans, ...) return nil end,
      function(g, ans, ...) return nil end,
      function(g, ans, ...) return nil end, -- five dimensions should be enough
   })

To expose the error, you can augment the RepeatTensor test to include a cuda tensor as input:

   RepeatTensor = function()
      local function f2to2(params)
         local y = torch.repeatTensor(params.x, 2, 2)*3
         return torch.sum(y)
      end
      tester:assert(gradcheck(f2to2, {x=torch.randn(3,3)}), "Incorrect gradient")

      local function f3to3(params)
         local y = torch.repeatTensor(params.x, 2, 2, 2)*3
         return torch.sum(y)
      end
      tester:assert(gradcheck(f3to3, {x=torch.randn(3,3,3)}), "Incorrect gradient")

      local function f2to3(params)
         local y = torch.repeatTensor(params.x, 2, 2, 2)*3
         return torch.sum(y)
      end
      tester:assert(gradcheck(f2to3, {x=torch.randn(3,3)}), "Incorrect gradient")

      local function f3to4(params)
         local y = torch.repeatTensor(params.x, 2, 2, 2, 2)*3
         return torch.sum(y)
      end
      tester:assert(gradcheck(f3to4, {x=torch.randn(3,3,3)}), "Incorrect gradient")


      local function f4to5(params)
         local y = torch.repeatTensor(params.x, 2, 2, 2, 2)*3
         return torch.sum(y)
      end

      tester:assert(autograd(f4to5)({x=torch.randn(3,3,3):cuda()}))


      -- tester:assert(gradcheck(f3to4, {x=torch.randn(3,3,3):float()}), "Incorrect gradient")
   end,

Error message if trying to take a grad and node is not returned

Elementwise assignments of intermediate tensors (based on parameter-dependent values)

I would like to calculate my scalar result based on intermediate tensors that are filled with parameter-dependent values. An example would be to generate a rotation matrix from axis and angle parameters while being interested in the gradient w.r.t. axis and angle...

Unfortunately even the simplest things I try fail as soon as my input parameters are 'passed' through intermediate tensors. It seems as if autograd treats my intermediate tensors simply as constants which do not depend on the parameters but actually all of their elements are assigned from computations that are functions of the parameters...

Trivial example:

function f(W)
  local x = torch.zeros(1)
  x[1] = torch.sin(W[1])
  return torch.sum(torch.abs(x))
end
df = grad(f)
dparams, loss = df(torch.ones(1))

Fails with A node type was not returned. This is either because a gradient was not defined, or the input is independent of the output...

Is the related to #3 and new-index not being implemented yet or is this deliberately not supported?

error when importing the library after installing it

I am getting this error when calling require autograd after installing it.

torch/install/share/lua/5.1/trepl/init.lua:363: torch/install/share/lua/5.1/trepl/init.lua:363: ...nsim/torch/install/share/lua/5.1/autograd/optim/init.lua:27: bad argument #1 to 'pairs' (table expected, got boolean)
stack traceback:
    [C]: in function 'error'
    torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require'
    [string "_RESULT={require 'autograd'}"]:1: in main chunk
    [C]: in function 'xpcall'
    torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl'
    ...nsim/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
    [C]: at 0x00406240

anyone has any idea what causes the problem ?

Unpooling with torch.autograd

I'm max pooling and saving the indices of max values. I'd like to use these indices when I'm upsampling later in the network. However, Unpooling layer doesn't have any parameters and whenever I run the program I get this error

"A node type was not returned. This is either because a gradient was not defined, or the input is independent of the output"

Could you suggest where exactly I'd need to make changes? Here's the code

-- Define a standard nn model:
require 'nn'
t = require 'torch'
grad = require 'autograd'

local model = nn.Sequential()
model:add(nn.SpatialConvolutionMM(1, 1, 3, 3, 1, 1, 1, 1))
local pool1 = nn.SpatialMaxPooling(2,2)
local idx = pool1.indices
model:add(pool1)

-- Note that this model could have been pre-trained, and reloaded from disk.

-- Functionalize the model:
local modelf, params = grad.functionalize(model)

-- The model can now be used as part of a regular autograd function:
local loss = grad.nn.MSECriterion()

function Unpool(h1,idx)

    local pred = t.Tensor(1,4,4):zero()

    D,H,W = h1:size(1),h1:size(2),h1:size(3)

    for d=1,D do
        for h=1,H do
            for w=1,W do
                pred[d][h*2][w*2] = h1[d][h][w]
        ---     idx_h = math.floor(idx[d][h][w]/4)+1
        ---     idx_w = math.fmod(idx[d][h][w],4)
        ---     pred[d][idx_h][idx_w] = h[d][h][w]
            end
        end
    end 

    return pred
end

neuralNet = function(params, x, y)
   local h1 = modelf(params, x)
   local pred= Unpool(h1,idx)
   return loss(pred, y)
end

-- gradients:
dneuralNet = grad(neuralNet)

-- some data:
x = t.randn(1,4,4)
y = t.Tensor(1,4,4):zero() 

-- compute loss and gradients wrt all parameters in params:
dparams, loss = dneuralNet(params,x, y)

print(loss)

wrapping plain numbers (not tensors)

This is more of a usage question than an issue: is any special handling needed for constant scaling factors? Specifically, will autograd properly handle constants which are (1) upvalues and (2) passed arguments (separate from parameters)? For instance:

local loss = lossf(yhat,y) * a1 + lossf(zhat,z) * a2

where a1 and a2 are numbers.

__mul function error

Hi,

I have this expression below in a function within autograd(...)

-- x is a table of Nodes, { Node, Node, ... }
local xt = torch.view(x[1], 1, inputFeatures)

-- line 77
dots = torch.cat(xt,hp,2) * p.W + torch.expand(p.b, 1, 2*hiddenFeatures)

and keep getting

/home/yongfei/torch/install/bin/luajit: bad argument #2 to '?' (number expected, got table)
stack traceback:
    [C]: at 0x7f3b08895ac0
    [C]: in function '__mul'
    /home/yongfei/codes/dmn-1/model/SimpleGRU.lua:77: in function 'episodeGRU'
    /home/yongfei/codes/dmn-1/model/DMN.lua:184: in function 'dmn'
    train.lua:112: in function 'fun'
    ...rch/install/share/lua/5.1/autograd/direct/DirectTape.lua:19: in function 'funOnly'
    ...rch/install/share/lua/5.1/autograd/direct/DirectTape.lua:114: in function 'dDmn'
    train.lua:122: in main chunk
    [C]: in function 'dofile'
    ...gfei/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:133: in main chunk
    [C]: at 0x00406670

Here's the values in the expression

xt.value ~= nil 
> false 
xt:type()   
> torch.DoubleTensor    

hp.value ~= nil 
> false 
hp:type()   
> torch.DoubleTensor    

p.W.value ~= nil    
> true  
p.W.value:type()    
> torch.DoubleTensor    

(torch.cat(xt,hp,2).value) ~= nil   
> false 
(torch.cat(xt,hp,2)):type() 
> torch.DoubleTensor

Support nested parameters in gradcheck

E.g.,
local params = {
W = {W1, W2, W3},
B = {B1, B2, B3}
}

functionalise on modules without parameters

It looks like autograd.functionalize assumes the passed module has a :parameters() method, which is not true for criterions for instance. The nnwrapper correctly handles this case for packages, however.

Error when loss contains scalar multiplication (eg. L2 regularization)

I ran across an issue trying to implement a sparsity-inducing term in the cost function, and in trying to put together a minimal repro case, it turned out that the example train-mnist-autoencoder.lua fails with the following output:

Training Epoch #1   
/home/alex/torch/install/bin/luajit: invalid arguments: number 
expected arguments: DoubleTensor | [*DoubleTensor*] DoubleTensor index
stack traceback:
    [C]: at 0x7f6b925e2940
    [C]: in function 'gf'
    ...all/share/lua/5.1/autograd/runtime/direct/DirectTape.lua:58: in function 'gradOnly'
    ...all/share/lua/5.1/autograd/runtime/direct/DirectTape.lua:124: in function 'df'
    train-mnist-autoencoder.lua:90: in main chunk
    [C]: in function 'dofile'
    ...alex/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x00406670

note to get this to run I had to remove the line
local Value = require 'autograd.Value'
and change the following:

--for i=1,Value.len(params.W) do
for i=1,#params.W do

Hopefully these modifications don't break the example.

There seems to be an issue around the scalar multiplication of the summed square weights with the l2Lambda parameter. If the l2Lambda term is removed from the loss calculation, but the sum of squares of weights is left in, then the above error does not occur:

--loss = loss + l2Lambda * torch.sum(torch.pow(params.W[i],2))
  loss = loss + torch.sum(torch.pow(params.W[i],2))

The same behaviour happens both with and without optimization enabled. If I set a debugHook, it is never triggered (presumably because this issue doesn't involve nan or inf). I've also tried rebuilding torch with Lua 5.1 instead of LuaJIT, and a similar error occurs.

error using grad.nn.SpatialBatchNormalization

I get this error:

luajit: ...h/install/share/lua/5.1/nn/SpatialBatchNormalization.lua:119: attempt to index field 'weight' (a nil value)
stack traceback:
    ...h/install/share/lua/5.1/nn/SpatialBatchNormalization.lua:119: in function 'normalizer'
    test_BN.lua:12: in function 'testbn'
    test_BN.lua:16: in main chunk
    [C]: in function 'dofile'
    ...ouis/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x01011c2bd0

From this code:

local grad = require 'autograd'

local im = torch.FloatTensor(1, 1, 10, 10):normal()
local W = torch.FloatTensor(1, 9):normal()
local b = torch.FloatTensor(1):zero()

local conv = grad.nn.SpatialConvolutionMM(1, 1, 3, 3)
local normalizer = grad.nn.SpatialBatchNormalization(1)

local testbn = function(params, input)
   local c = conv(input, params.W, params.B)
   local n = normalizer(c)
   return torch.sum(n)
end

print(testbn({W=W, B=b}, im))
df = grad(testbn)
testbn({W=W, B=b}, im)
grads, val = df({W=W, B=b}, im)
print(grads.W)

Error when using require autograd

Hi, I've installed torch-autograd as mentioned in the github page. After installation whenever I try to run an example code, it throws the following error:

/usr/local/share/lua/5.1/autograd/gradfuns.lua:290: table index is nil stack traceback: [C]: in function 'error' /home/monarch/torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require' [string "_RESULT={require 'autograd'}"]:1: in main chunk [C]: in function 'xpcall' /home/monarch/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl' ...arch/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk [C]: at 0x00406670

Then I noticed that the same error pops up when I just try to simply require 'autograd'. Help. Thanks in advance.

Optimized_tests failing, Direct_tests very slow

Is there a stable version of this package yet? Currently, if running against HEAD, 24/35 of the Optimized_* tests fail with an error. The Direct tests pass for me, but some are very slow (>30s per test):

Direct_NNFunc_CNN
Direct_Models_SpatialNetwork

gradCheck test

Hi,

I have a model which uses a normal Torch NN with an Autograd custom criterion. The gradCheck call returns true but I think I may be using it incorrectly.

The following file contains the test,
https://github.com/ronanmoynihan/autograd-criterion/blob/master/gradCheck.lua

If I comment out the following lines (27-30) then the gradCheck fails, even though the model is not referenced anywhere else in the code.

local model = nn.Sequential()
model:add(nn.Linear(d, numhid1))
model:add(nn.Sigmoid())
model:add(nn.Linear(numhid1, n_outputs))

issue using functionalize

Below is a self-contained bit of code that I think should work, but is crashing. Basically, I just took the example for autograd.functionalize in the README, but changed it to a simpler architecture, and tried to actually evaluate the function on some data. Any thoughts as to what I'm doing wrong.

Here's the error:

torch/install/share/lua/5.1/nn/Threshold.lua:20: bad argument #1 (field threshold does not exist)
stack traceback:
[C]: in function 'Threshold_updateOutput'
...rs/belanger/torch/install/share/lua/5.1/nn/Threshold.lua:20: in function 'updateOutput'

Here's the code:

require 'nn'
local autograd = require 'autograd'

local model = nn.Sequential():add(nn.Linear(5,10)):add(nn.ReLU())
local testdata = torch.Tensor(32,5)

local modelf, params = autograd.functionalize(model)

local x = modelf(params,testdata)
print(x)

Batch matrix multplication

I'm using nn.MM for batch matrix multiplication but it I got an error in torch-autograd. Does torch-autograd support 3D Tensor multiplication at the current state?

Here's is the example code I ran

local t = require 'torch'
local d = require 'autograd'

local mm = d.nn.MM(false, true)
-- toy examples
params = {}
params.x = torch.randn(2,1,5)
params.y = torch.randn(2,3,5)

test = function(params)
    local w = mm({params.x, params.y})
    return t.sum(w)
end

dt = d(test)
dparams, loss = dt(params)
print(loss)

And I got the following errors

/torch/install/share/lua/5.1/autograd/nnwrapper.lua:110: attempt to call method 'type' (a nil value)
stack traceback:
    ...etran/torch/install/share/lua/5.1/autograd/nnwrapper.lua:110: in function 'fun'
    ...rch/install/share/lua/5.1/autograd/direct/DirectNode.lua:73: in function 'mm'
    toy.lua:11: in function 'fun'
    ...rch/install/share/lua/5.1/autograd/direct/DirectTape.lua:18: in function 'funOnly'
    ...rch/install/share/lua/5.1/autograd/direct/DirectTape.lua:100: in function 'dt'
    toy.lua:16: in main chunk
    [C]: in function 'dofile'
    ...tran/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
    [C]: at 0x010ec1b2d0

Latest commit increased memory requirements in both modes (direct and cached)

Just a heads up that it looks like one of the latest commits blew up memory requirements. Probably cd6b9e0. Error below; it's asking for more than 12GB, which might indicate a bug? The same trainable function was working with direct mode prior to the new commit, but I did not test it with cached mode.

/usr/local/bin/luajit: ...re/lua/5.1/autograd/runtime/codegen/backend/lua/init.lua:262: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-8457/cutorch/lib/THC/THCStorage.cu:44
stack traceback:
        [C]: in function 'new'
        ...re/lua/5.1/autograd/runtime/codegen/backend/lua/init.lua:262: in function 'createSymbolTable'
        ...re/lua/5.1/autograd/runtime/codegen/backend/lua/init.lua:413: in function 'generateCode'
        ...re/lua/5.1/autograd/runtime/codegen/backend/lua/init.lua:574: in function 'generateFn'
        /usr/local/share/lua/5.1/autograd/runtime/codegen/init.lua:66: in function 'd_train_net'

Tapes of tapes

We need to add functionality for 2nd and higher derivatives for e.g. Hessian-vector products

Proper use of optim with autograd

Let say I have the following simple self-contained model and dataset:

local autograd = require 'autograd'
torch.manualSeed(123)

local params = {
   W = {
      torch.randn(10, 1)
   },
   b = {
      torch.randn(1)
   }
}

-- simple model
local f = function(params, x, y)
   local h1 = torch.tanh(x * params.W[1] + params.b[1])
   return torch.sqrt(torch.sum(torch.pow(y - h1, 2)))
end

local df = autograd(f)

-- some easy data
local nData = 5000
local xs = torch.randn(nData, 10)
local ys = torch.Tensor(nData, 1)
for i=1, nData do ys[i][1] = math.tanh(xs[i]:sum()) end

I can train it easily with:

local learningRate = 1e-3
for e=1, 10 do
   local loss = 0
   for i=1,nData  do
      local grads, l = df(params, xs:narrow(1, i, 1), ys:narrow(1, i, 1))
      loss = loss + l
      params.W[1]:add(-learningRate, grads.W[1])
      params.b[1]:add(-learningRate, grads.b[1])
   end
   print('epoch #' .. e .. ', loss = ' .. loss / nData)
end

But what if I need another optimisation algorithm than SGD such as ADAM or ADADELTA ?
The optim package makes it very easy in torch but I am struggling to use it with autograd.

The following code does run but does not get close to the results I have with the training above and takes much more time (certainly because I flatten the grads every time).

require 'nn'
require 'optim'
local flattenParams = nn.Module.flatten({params.W[1], params.b[1]})
local state = { learningRate = 1e-3 }
for e=1, 10 do
   local loss = 0
   for i=1,nData do
      local feval = function(x)
         local grads, l = df(params, xs:narrow(1, i, 1), ys:narrow(1, i, 1))
         return l, nn.Module.flatten({grads.W[1], params.b[1]}) -- ugly
      end
      local _, l = optim.sgd(feval, flattenParams, state)
      loss = loss + l[1]
   end
   print('epoch #' .. e .. ', loss = ' .. loss / nData)
end

What would be the good approach to use optim with autograd ?

Thank you for your help

Error on Require autograd

I'm getting this error when trying to use autograd. I used 'luarocks install autograd' to install it.

So far I've tried:

reinstalling with luarocks remove autograd
luarocks install optim

th> grad = require 'autograd'
/home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:363: /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:363: /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:363: ...buntu/torch/install/share/lua/5.1/autograd/nnwrapper.lua:297: bad argument #1 to 'pairs' (table expected, got boolean)
stack traceback:
        [C]: in function 'error'
        /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:363: in function 'require'
        [string "grad = require 'autograd'"]:1: in main chunk
        [C]: in function 'xpcall'
        /home/ubuntu/torch/install/share/lua/5.1/trepl/init.lua:648: in function 'repl'
        ...untu/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
        [C]: at 0x00406670

Link: nnwrapper.lua:297

twitter-archive / torch-autograd Goto Github PK

torch-autograd's Introduction

Autograd

Scope

Updates

Install

Examples

Autograd example

Optimization

Wrapping nn modules

Creating auto-differentiated nn modules

Gradient checks

Model Primitives

autograd.model.NeuralNetwork

autograd.model.SpatialNetwork

autograd.model.RecurrentNetwork

autograd.model.RecurrentLSTMNetwork

Loss Primitives

Gradients of gradients

License

torch-autograd's People

Contributors

Stargazers

Watchers

Forkers

torch-autograd's Issues

Recommend Projects

Recommend Topics

Recommend Org