openai / iaf Goto Github PK
View Code? Open in Web Editor NEWCode for reproducing key results in the paper "Improving Variational Inference with Inverse Autoregressive Flow"
Home Page: https://arxiv.org/abs/1606.04934
License: MIT License
Code for reproducing key results in the paper "Improving Variational Inference with Inverse Autoregressive Flow"
Home Page: https://arxiv.org/abs/1606.04934
License: MIT License
Using gpu device 0: GeForce GTX 980 (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)
/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/init.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
warnings.warn(warn)
[graphy] floatX = float32
Logpath: /media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/log
WARNING - Deep VAE - No observers have been added to this run
INFO - Deep VAE - Running command 'train'
INFO - Deep VAE - Started
CVAE1 with {'depths': [2, 2, 2], 'nl': u'elu', 'n_h2': 64, 'n_z': 32, 'shape_x': [3, 32, 32], 'optim': u'adamax', 'weightsharing': False, 'px': u'logistic', 'kernel_x': [5, 5], 'n_h1': 64, 'prior': u'diag', 'posterior': u'down_iaf2_nl', 'pad_x': 0, 'beta2': 0.001, 'beta1': 0.1, 'depth_ar': 1, 'alpha': 0.002, 'kl_min': 0.25, 'downsample_type': u'nn', 'kernel_h': [3, 3]}
ERROR - Deep VAE - Failed after 0:00:01!
Traceback (most recent calls WITHOUT Sacred internals):
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/train.py", line 185, in train
model = construct_model(data_init)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/train.py", line 128, in construct_model
model = models.cvae1(**margs)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 543, in cvae1
f_encode_decode(w)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 446, in f_encode_decode
h = layers[i][j].up(h, w)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/models.py", line 144, in up
qz[0] = N.rand.gaussian_diag(qz_mean, 2*qz_logsd)
File "/media/vismod/148fd670-6cd2-4eda-9e3c-339a23098e8d/vismod/iaf-master/graphy/nodes/rand.py", line 81, in gaussian_diag
eps = G.rng_curand.normal(size=mean.shape)
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/rng_curand.py", line 368, in normal
self.next_seed())
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/sandbox/cuda/rng_curand.py", line 108, in new_auto_update
o_gen, sample = self(generator, cast(v_size, 'int32'))
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/op.py", line 668, in call
required = thunk()
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/op.py", line 883, in rval
fill_storage()
File "/home/vismod/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/cc.py", line 1707, in call
reraise(exc_type, exc_value, exc_trace)
File "", line 2, in reraise
RuntimeError: curand error generating random normals 102
Hi,
I am running the tf_train.py and tf_utils code out of the box. Our tensorflow's version is 1.3.0 and GPU is GeForce GTX TITAN X. The conv2d function in tf_utils/layers.py is running very slowly. Specifically, the following two lines in conv2d take a long time:
_ = tf.get_variable("g", initializer=tf.log(scale_init) / 3.0)
_ = tf.get_variable("b", initializer=-m_init * scale_init)
I think due to lazy evaluation, what is actually taking time is this line:
m_init, v_init = tf.nn.moments(x_init, [0, 2, 3])
as both m_init and scale_init depend on the moments.
When running conv2d, nvidia-smi shows 'No running processes found' and 'GPU-Util Compute M.' is 0%. The CPU utilization is ~ 95%, which means it isn't exploiting the multi-core CPU architecture either. I wonder how I can speed it up.
Thank you!
If I run the TensorFlow version of this code (tf_train.py
) with #8 applied, I get a NaN within the first few iterations and training stops. If I remove that change, training proceeds fine. @pukkapies were you ever able to get the model training appropriately with your changes applied? If so, what hyperparameter settings were you using?
flake8 testing of https://github.com/openai/iaf on Python 3.6.3
$ flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics
./models.py:339:49: E999 SyntaxError: invalid syntax
print "TODO: SAMPLES FROM MADE PRIOR"
^
./train.py:5:29: E999 SyntaxError: invalid syntax
from __builtin__ import False
^
./graphy/__init__.py:15:26: E999 SyntaxError: invalid syntax
print '[graphy] floatX = '+floatX
^
./graphy/function.py:16:44: E999 SyntaxError: invalid syntax
print '*** NaN detected ***'
^
./graphy/ndict.py:157:19: E999 SyntaxError: invalid syntax
print d.keys()
^
./graphy/misc/data.py:49:33: E999 SyntaxError: invalid syntax
print "Full training set"
^
./graphy/misc/optim.py:10:15: E999 SyntaxError: invalid syntax
print 'SGD', 'alpha:',alpha
^
./graphy/nodes/__init__.py:72:68: E999 SyntaxError: invalid syntax
print 'WARNING: constant rescale, these weights arent saved'
^
./graphy/nodes/ar.py:61:68: E999 SyntaxError: invalid syntax
print 'WARNING: constant rescale, these weights arent saved'
^
./graphy/nodes/conv.py:87:115: E999 SyntaxError: invalid syntax
print 'new code, requires that the minibatch size "x.tag.test_value.shape[0]" is the same during execution'
^
10 E999 SyntaxError: invalid syntax
10
Hi, I was trying to test the project implementation but I'm running on some errors running the following command python train.py with problem=cifar10 n_z=32 n_h=64 depths=[2,2,2] margs.depth_ar=1 margs.posterior=down_iaf2_NL margs.kl_min=0.25
[graphy] floatX = float32 Traceback (most recent call last): File "train.py", line 1, in <module> import graphy as G File "/home/user/projects/python/theano/iaf/graphy/__init__.py", line 45, in <module> import misc.data File "/home/user/projects/python/theano/iaf/graphy/misc/data.py", line 6, in <module> basepath = os.environ['ML_DATA_PATH'] File "/home/user/anaconda2/lib/python2.7/UserDict.py", line 40, in __getitem__ raise KeyError(key) KeyError: 'ML_DATA_PATH'
Any suggestions much appreciated!
In line 70 I think there is a small mistake, it should be:
int(input_shape[2] * strides[2]), int(input_shape[3] * strides[3])]
It's not a major bug, as it would throw an error if the output shape didn't work out correctly. I think it runs OK just because the input is square.
I am executing tf_train.py (num_gpus=1). In forward function for two for loops it is running fine for i=0,j=0;i=0,j=1; but continuously it is taking huge memory. For i=1,j=0 of for loop in forward function it calls sub_layer.up which in turn calls conv2d, it got almost very slow and memory space is increasing at execution of line no. 49 (layer.py) of con2d function. Anyone please help me to resolve this increased memory issue.
On line 373 and 374 of models.py, you have "if posterior_conv3 != None: modules.append(posterior_conv4)", which looks strongly like a bug based on the surrounding context. I might be mistaken since I have only started to look at the source, but I wanted to make you aware in case it is a bug. The context of these lines is below
361: def postup(updates, w):
362- modules = [up_conv1,up_conv2,down_conv1,down_conv2]
363- if downsample and downsample_type == 'conv':
364- modules += [up_conv3,down_conv3]
365- if prior_conv1 != None:
366- modules.append(prior_conv1)
367- if posterior_conv1 != None:
368- modules.append(posterior_conv1)
369- if posterior_conv2 != None:
370- modules.append(posterior_conv2)
371- if posterior_conv3 != None:
372- modules.append(posterior_conv3)
373- if posterior_conv3 != None:
374- modules.append(posterior_conv4)
375- for m in modules:
376- updates = m.postup(updates, w)
After some fixes to the summary & split calls (i.e. they were refactored in tf1.0) I still can't get this code to work:
(.venv) ➜ iaf git:(master) ✗ CIFAR10_PATH="./CIFAR10" optirun -b primus python tf_train.py --logdir ./logs --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode train
Traceback (most recent call last):
File "tf_train.py", line 397, in <module>
tf.app.run()
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "tf_train.py", line 391, in main
run(hps)
File "tf_train.py", line 237, in run
model = CVAE1(hps, "train", x)
File "tf_train.py", line 152, in __init__
self.train_op = opt.apply_gradients(grad, global_step=self.global_step)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
self._create_slots([_get_variable_for(v) for v in var_list])
File "/home/jramapuram/Dropbox/projects/iaf/tf_utils/adamax.py", line 37, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
dtype)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
validate_shape=validate_shape)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
"VarScope?" % name)
ValueError: Variable model/model/dec_log_stdv/Adamax/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
The same result utiziling opt = tf.train.AdamOptimizer(hps.learning_rate)
(.venv) ➜ iaf git:(master) ✗ CIFAR10_PATH="./CIFAR10" optirun -b primus python tf_train.py --logdir ./logs --hpconfig depth=1,num_blocks=20,kl_min=0.1,learning_rate=0.002,batch_size=32 --num_gpus 1 --mode train
Traceback (most recent call last):
File "tf_train.py", line 397, in <module>
tf.app.run()
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "tf_train.py", line 391, in main
run(hps)
File "tf_train.py", line 237, in run
model = CVAE1(hps, "train", x)
File "tf_train.py", line 152, in __init__
self.train_op = opt.apply_gradients(grad, global_step=self.global_step)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 446, in apply_gradients
self._create_slots([_get_variable_for(v) for v in var_list])
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/adam.py", line 122, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/optimizer.py", line 766, in _zeros_slot
named_slots[_var_key(var)] = slot_creator.create_zeros_slot(var, op_name)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 174, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 146, in create_slot_with_initializer
dtype)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
validate_shape=validate_shape)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1049, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 948, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 356, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 341, in _true_getter
use_resource=use_resource)
File "/home/jramapuram/.venv/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 671, in _get_single_variable
"VarScope?" % name)
ValueError: Variable model/model/dec_log_stdv/Adam/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
I tried setting reuse=None
to no avail. I'm probably missing something stupid here.
Here is my fork with the changes: https://github.com/jramapuram/iaf/tree/hotfix/tf1.0
Why are we using constant variance for the generating network of autoencoder instead of learning it like mean from the network itself. What advantage does it have over the learnable variance? This is done in the models.py file at line numbers 473 and 681.
mean_x = T.clip(output+.5, 0+1/512., 1-1/512.)
logsd_x = 0*mean_x + w['logsd_x']
In line 93 of the above file, it looks like the normalisation is performed over the out_channels
, instead of in_channels
. I think instead it should be:
v_norm = tf.nn.l2_normalize(v, [0, 1, 3])
.
What command for the lasagne implementation was passed for the MNIST experiment in the paper? In particular, what option modifies the fully connected layer to have 450 neurons?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.