openai / iaf Goto Github PK

View Code? Open in Web Editor NEW

517.0 189.0 131.0 138 KB

Code for reproducing key results in the paper "Improving Variational Inference with Inverse Autoregressive Flow"

Home Page: https://arxiv.org/abs/1606.04934

License: MIT License

Python 100.00%

paper

iaf's Issues

RuntimeError: curand error generating random normals 102

Using gpu device 0: GeForce GTX 980 (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)
/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/init.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
warnings.warn(warn)
[graphy] floatX = float32
Logpath: /media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/log
WARNING - Deep VAE - No observers have been added to this run
INFO - Deep VAE - Running command 'train'
INFO - Deep VAE - Started
CVAE1 with {'depths': [2, 2, 2], 'nl': u'elu', 'n_h2': 64, 'n_z': 32, 'shape_x': [3, 32, 32], 'optim': u'adamax', 'weightsharing': False, 'px': u'logistic', 'kernel_x': [5, 5], 'n_h1': 64, 'prior': u'diag', 'posterior': u'down_iaf2_nl', 'pad_x': 0, 'beta2': 0.001, 'beta1': 0.1, 'depth_ar': 1, 'alpha': 0.002, 'kl_min': 0.25, 'downsample_type': u'nn', 'kernel_h': [3, 3]}
ERROR - Deep VAE - Failed after 0:00:01!
Traceback (most recent calls WITHOUT Sacred internals):
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/train.py", line 185, in train
model = construct_model(data_init)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/train.py", line 128, in construct_model
model = models.cvae1(**margs)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 543, in cvae1
f_encode_decode(w)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 446, in f_encode_decode
h = layers[i][j].up(h, w)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 144, in up
qz[0] = N.rand.gaussian_diag(qz_mean, 2*qz_logsd)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/graphy/nodes/rand.py", line 81, in gaussian_diag
eps = G.rng_curand.normal(size=mean.shape)
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/rng_curand.py", line 368, in normal
self.next_seed())
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/rng_curand.py", line 108, in new_auto_update
o_gen, sample = self(generator, cast(v_size, 'int32'))
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/op.py", line 668, in call
required = thunk()
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/op.py", line 883, in rval
fill_storage()
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/cc.py", line 1707, in call
reraise(exc_type, exc_value, exc_trace)
File "", line 2, in reraise
RuntimeError: curand error generating random normals 102

conv2d running very slowly

Hi,

I am running the tf_train.py and tf_utils code out of the box. Our tensorflow's version is 1.3.0 and GPU is GeForce GTX TITAN X. The conv2d function in tf_utils/layers.py is running very slowly. Specifically, the following two lines in conv2d take a long time:
_ = tf.get_variable("g", initializer=tf.log(scale_init) / 3.0)
_ = tf.get_variable("b", initializer=-m_init * scale_init)

I think due to lazy evaluation, what is actually taking time is this line:
m_init, v_init = tf.nn.moments(x_init, [0, 2, 3])
as both m_init and scale_init depend on the moments.

When running conv2d, nvidia-smi shows 'No running processes found' and 'GPU-Util Compute M.' is 0%. The CPU utilization is ~ 95%, which means it isn't exploiting the multi-core CPU architecture either. I wonder how I can speed it up.

Thank you!

#8 causes NaNs almost immediately during training

If I run the TensorFlow version of this code (tf_train.py) with #8 applied, I get a NaN within the first few iterations and training stops. If I remove that change, training proceeds fine. @pukkapies were you ever able to get the model training appropriately with your changes applied? If so, what hyperparameter settings were you using?

10 Python 3 syntax errors

flake8 testing of https://github.com/openai/iaf on Python 3.6.3

$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics

./models.py:339:49: E999 SyntaxError: invalid syntax
            print "TODO: SAMPLES FROM MADE PRIOR"
                                                ^
./train.py:5:29: E999 SyntaxError: invalid syntax
from __builtin__ import False
                            ^
./graphy/__init__.py:15:26: E999 SyntaxError: invalid syntax
print '[graphy] floatX = '+floatX
                         ^
./graphy/function.py:16:44: E999 SyntaxError: invalid syntax
                print '*** NaN detected ***'
                                           ^
./graphy/ndict.py:157:19: E999 SyntaxError: invalid syntax
            print d.keys()
                  ^
./graphy/misc/data.py:49:33: E999 SyntaxError: invalid syntax
        print "Full training set"
                                ^
./graphy/misc/optim.py:10:15: E999 SyntaxError: invalid syntax
    print 'SGD', 'alpha:',alpha
              ^
./graphy/nodes/__init__.py:72:68: E999 SyntaxError: invalid syntax
        print 'WARNING: constant rescale, these weights arent saved'
                                                                   ^
./graphy/nodes/ar.py:61:68: E999 SyntaxError: invalid syntax
        print 'WARNING: constant rescale, these weights arent saved'
                                                                   ^
./graphy/nodes/conv.py:87:115: E999 SyntaxError: invalid syntax
        print 'new code, requires that the minibatch size "x.tag.test_value.shape[0]" is the same during execution' 
                                                                                                                  ^
10    E999 SyntaxError: invalid syntax
10

errors upon running the project

Hi, I was trying to test the project implementation but I'm running on some errors running the following command python train.py with problem=cifar10 n_z=32 n_h=64 depths=[2,2,2] margs.depth_ar=1 margs.posterior=down_iaf2_NL margs.kl_min=0.25

[graphy] floatX = float32 Traceback (most recent call last): File "train.py", line 1, in <module> import graphy as G File "/home/user/projects/python/theano/iaf/graphy/__init__.py", line 45, in <module> import misc.data File "/home/user/projects/python/theano/iaf/graphy/misc/data.py", line 6, in <module> basepath = os.environ['ML_DATA_PATH'] File "/home/user/anaconda2/lib/python2.7/UserDict.py", line 40, in __getitem__ raise KeyError(key) KeyError: 'ML_DATA_PATH'

Any suggestions much appreciated!

Small bug in tf_utils/layers.py

In line 70 I think there is a small mistake, it should be:
int(input_shape[2] * strides[2]), int(input_shape[3] * strides[3])]
It's not a major bug, as it would throw an error if the output shape didn't work out correctly. I think it runs OK just because the input is square.

Memory space is increasing

I am executing tf_train.py (num_gpus=1). In forward function for two for loops it is running fine for i=0,j=0;i=0,j=1; but continuously it is taking huge memory. For i=1,j=0 of for loop in forward function it calls sub_layer.up which in turn calls conv2d, it got almost very slow and memory space is increasing at execution of line no. 49 (layer.py) of con2d function. Anyone please help me to resolve this increased memory issue.

Possible bug in the source

On line 373 and 374 of models.py, you have "if posterior_conv3 != None: modules.append(posterior_conv4)", which looks strongly like a bug based on the surrounding context. I might be mistaken since I have only started to look at the source, but I wanted to make you aware in case it is a bug. The context of these lines is below

361:    def postup(updates, w):
362-        modules = [up_conv1,up_conv2,down_conv1,down_conv2]
363-        if downsample and downsample_type == 'conv':
364-            modules += [up_conv3,down_conv3]
365-        if prior_conv1 != None:
366-            modules.append(prior_conv1)
367-        if posterior_conv1 != None:
368-            modules.append(posterior_conv1)
369-        if posterior_conv2 != None:
370-            modules.append(posterior_conv2)
371-        if posterior_conv3 != None:
372-            modules.append(posterior_conv3)
373-        if posterior_conv3 != None:
374-            modules.append(posterior_conv4)
375-        for m in modules:
376-            updates = m.postup(updates, w)

Update to support TF1.1 +

After some fixes to the summary & split calls (i.e. they were refactored in tf1.0) I still can't get this code to work:

(.venv) ➜  iaf git:(master) ✗ CIFAR10_PATH="./CIFAR10" optirun -b primus python tf_train.py --logdir ./logs --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode train
                      
Traceback (most recent call last):
  File "tf_train.py", line 397, in <module>
    tf.app.run()
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "tf_train.py", line 391, in main
    run(hps)
  File "tf_train.py", line 237, in run
    model = CVAE1(hps, "train", x)
  File "tf_train.py", line 152, in __init__
    self.train_op = opt.apply_gradients(grad, global_step=self.global_step)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
    self._create_slots([_get_variable_for(v) for v in var_list])
  File "/home/jramapuram/Dropbox/projects/iaf/tf_utils/adamax.py", line 37, in _create_slots
    self._zeros_slot(v, "m", self._name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
    named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
    dtype)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
    validate_shape=validate_shape)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable model/model/dec_log_stdv/Adamax/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

The same result utiziling opt = tf.train.AdamOptimizer(hps.learning_rate)

(.venv) ➜  iaf git:(master) ✗ CIFAR10_PATH="./CIFAR10" optirun -b primus python tf_train.py --logdir ./logs --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode train                                                                                                                                                                      
                      
Traceback (most recent call last):
  File "tf_train.py", line 397, in <module>
    tf.app.run()
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "tf_train.py", line 391, in main
    run(hps)
  File "tf_train.py", line 237, in run
    model = CVAE1(hps, "train", x)
  File "tf_train.py", line 152, in __init__
    self.train_op = opt.apply_gradients(grad, global_step=self.global_step)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
    self._create_slots([_get_variable_for(v) for v in var_list])
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 122, in _create_slots
    self._zeros_slot(v, "m", self._name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
    named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
    colocate_with_primary=colocate_with_primary)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
    dtype)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
    validate_shape=validate_shape)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
    use_resource=use_resource, custom_getter=custom_getter)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
    validate_shape=validate_shape, use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
    use_resource=use_resource)
  File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
    "VarScope?" % name)
ValueError: Variable model/model/dec_log_stdv/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

I tried setting reuse=None to no avail. I'm probably missing something stupid here.
Here is my fork with the changes: https://github.com/jramapuram/iaf/tree/hotfix/tf1.0

Constant variance for the generating network of autoencoder

Why are we using constant variance for the generating network of autoencoder instead of learning it like mean from the network itself. What advantage does it have over the learnable variance? This is done in the models.py file at line numbers 473 and 681.

mean_x = T.clip(output+.5, 0+1/512., 1-1/512.)
logsd_x = 0*mean_x + w['logsd_x']

openai / iaf Goto Github PK

iaf's Issues

RuntimeError: curand error generating random normals 102

conv2d running very slowly

#8 causes NaNs almost immediately during training

10 Python 3 syntax errors

errors upon running the project

Small bug in tf_utils/layers.py

Memory space is increasing

Possible bug in the source

Update to support TF1.1 +

Constant variance for the generating network of autoencoder

Incorrect initialisation in tf_utils/layers.py

Siete

Command Line for MNIST

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent