I'm trying to use the implementation of the adam optimizer available in the <code clas

I'm not sure the adam.lua code in the <code class="no

config for Adam about vae-torch HOT 8 CLOSED

y0ast commented on July 21, 2024

config for Adam

from vae-torch.

Comments (8)

AjayTalati commented on July 21, 2024

Update - the adam optimizer was fixed yesterday - it works now, with default parameters on the Rosenbrock test problem.

I'm testing it now to see if gives an improvement?

from vae-torch.

y0ast commented on July 21, 2024

Great! Curious to see your result.

from vae-torch.

AjayTalati commented on July 21, 2024

Hi Joost,

unfortunately I have'nt managed to get any convergence yet with adam, over a range of different config parameters?

Looking at figure 4 a) of the adam paper, it seems that

beta_1 = 0.1
beta_2 = 0.0001
alpha = 0.002
epsilon = 1e-8
lambda = 1 - 1e-8

with the model they state,

dim_hidden = 50
hidden_units_encoder = 500
hidden_units_decoder = 500

should get convergence after after 10 epochs, but I can't reproduce their results?

from vae-torch.

y0ast commented on July 21, 2024

Clearly the result is much better after 100 epochs (4 b) so those figures do not indicate convergence. It shows the value of the negLL after a set amount of epochs for different learning rates (x-axis) and illustrates the necessity of the bias-correction factor.

I am not sure exactly how many epochs are necessary with Adam, I never tested that.

from vae-torch.

AjayTalati commented on July 21, 2024

Hi, yes sorry I was sloppy with my language, applogies.

Basically I've tried all the grid points they mention in their paper, i.e. the bias correction terms beta_1 and beta_2 and learning rate - but I still get the negLL after 10 epochs to be about

-4 e+155

i.e. basically -infinity. Maybe you want to give it try? If you pull the latest optim module

luarocks install optim

I think its then just a task of

i) changing your while loop to a for loop, running for 10 epochs, and writing the last negLL to a table

ii) Constructing a small grid and iterating your above code over the grid points.

If you really want to be fancy, there's a Bayesian optimization pakage on git called spearmint, which uses a fancy iterative Gaussian process scheme for continuous hyper-parameter optimization - its coded in python. I'm trying it now.

I'm guessing if we both can't get it to work, it might be a problem with the adam.lua code?

from vae-torch.

AjayTalati commented on July 21, 2024

I'm not sure the adam.lua code in the optim package works.

Maybe it's best to try to use Dirk Kingma's adam implementation here,

https://github.com/dpkingma/nips14-ssl/blob/master/adam.py

with your theano implementation?

from vae-torch.

y0ast commented on July 21, 2024

I have tried adam before (in Theano) and it works nicely, so this does indeed sound like a bug, either in my code or in the adam implementation.

I will take a look at it tomorrow.

from vae-torch.

y0ast commented on July 21, 2024

Fixed by setting negative learning rate (gradient ascent)

from vae-torch.

config for Adam about vae-torch HOT 8 CLOSED

Comments (8)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent