jaanli / variational-autoencoder Goto Github PK

Variational autoencoder implemented in tensorflow and pytorch (including inverse autoregressive flow)

Home Page: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

License: MIT License

Python 99.41% Shell 0.59%

vae machine-learning tensorflow variational-autoencoder variational-inference probabilistic-graphical-models deep learning deep-learning deep-neural-networks

variational-autoencoder's Introduction

Variational Autoencoder in tensorflow and pytorch

Reference implementation for a variational autoencoder in TensorFlow and PyTorch.

I recommend the PyTorch version. It includes an example of a more expressive variational family, the inverse autoregressive flow.

Variational inference is used to fit the model to binarized MNIST handwritten digits images. An inference network (encoder) is used to amortize the inference and share parameters across datapoints. The likelihood is parameterized by a generative network (decoder).

Blog post: https://jaan.io/what-is-variational-autoencoder-vae-tutorial/

PyTorch implementation

(anaconda environment is in environment-jax.yml)

Importance sampling is used to estimate the marginal likelihood on Hugo Larochelle's Binary MNIST dataset. The final marginal likelihood on the test set was -97.10 nats is comparable to published numbers.

$ python train_variational_autoencoder_pytorch.py --variational mean-field --use_gpu --data_dir $DAT --max_iterations 30000 --log_interval 10000
Step 0          Train ELBO estimate: -558.027   Validation ELBO estimate: -384.432      Validation log p(x) estimate: -355.430  Speed: 2.72e+06 examples/s
Step 10000      Train ELBO estimate: -111.323   Validation ELBO estimate: -109.048      Validation log p(x) estimate: -103.746  Speed: 2.64e+04 examples/s
Step 20000      Train ELBO estimate: -103.013   Validation ELBO estimate: -107.655      Validation log p(x) estimate: -101.275  Speed: 2.63e+04 examples/s
Step 29999      Test ELBO estimate: -106.642    Test log p(x) estimate: -100.309
Total time: 2.49 minutes

Using a non mean-field, more expressive variational posterior approximation (inverse autoregressive flow, https://arxiv.org/abs/1606.04934), the test marginal log-likelihood improves to -95.33 nats:

$ python train_variational_autoencoder_pytorch.py --variational flow
step:   0       train elbo: -578.35
step:   0               valid elbo: -407.06     valid log p(x): -367.88
step:   10000   train elbo: -106.63
step:   10000           valid elbo: -110.12     valid log p(x): -104.00
step:   20000   train elbo: -101.51
step:   20000           valid elbo: -105.02     valid log p(x): -99.11
step:   30000   train elbo: -98.70
step:   30000           valid elbo: -103.76     valid log p(x): -97.71

jax implementation

Using jax (anaconda environment is in environment-jax.yml), to get a 3x speedup over pytorch:

$ python train_variational_autoencoder_jax.py --variational mean-field 
Step 0          Train ELBO estimate: -566.059   Validation ELBO estimate: -565.755      Validation log p(x) estimate: -557.914  Speed: 2.56e+11 examples/s
Step 10000      Train ELBO estimate: -98.560    Validation ELBO estimate: -105.725      Validation log p(x) estimate: -98.973   Speed: 7.03e+04 examples/s
Step 20000      Train ELBO estimate: -109.794   Validation ELBO estimate: -105.756      Validation log p(x) estimate: -97.914   Speed: 4.26e+04 examples/s
Step 29999      Test ELBO estimate: -104.867    Test log p(x) estimate: -96.716
Total time: 0.810 minutes

Inverse autoregressive flow in jax:

$ python train_variational_autoencoder_jax.py --variational flow 
Step 0          Train ELBO estimate: -727.404   Validation ELBO estimate: -726.977      Validation log p(x) estimate: -713.389  Speed: 2.56e+11 examples/s
Step 10000      Train ELBO estimate: -100.093   Validation ELBO estimate: -106.985      Validation log p(x) estimate: -99.565   Speed: 2.57e+04 examples/s
Step 20000      Train ELBO estimate: -113.073   Validation ELBO estimate: -108.057      Validation log p(x) estimate: -98.841   Speed: 3.37e+04 examples/s
Step 29999      Test ELBO estimate: -106.803    Test log p(x) estimate: -97.620
Total time: 2.350 minutes

(The difference between a mean field and inverse autoregressive flow may be due to several factors, chief being the lack of convolutions in the implementation. Residual blocks are used in https://arxiv.org/pdf/1606.04934.pdf to get the ELBO closer to -80 nats.)

Generating the GIFs

Run python train_variational_autoencoder_tensorflow.py
Install imagemagick (homebrew for Mac: https://formulae.brew.sh/formula/imagemagick or Chocolatey in Windows: https://community.chocolatey.org/packages/imagemagick.app)
Go to the directory where the jpg files are saved, and run the imagemagick command to generate the .gif: convert -delay 20 -loop 0 *.jpg latent-space.gif

variational-autoencoder's People

Contributors

Stargazers

Watchers

Forkers

zkailinzhang lizhangzhan fulquan weibogit kourouklides bayesianhuman aporia3517 xuqy1981 hodapp87 satpreetsingh wenhuach vyraun oppa3109 hedgefair liean es-kang ducta-qc yangliu2 ehfo0 syllcs vonzunlei zhangjiulong gchoi zhengyi0310 alxsoares lllabmaster life-timelearninglab diazandr3s zhixiangxu cold-blue winwinjjiang jianning-li dionman shubhampachori12110095 hongjieren diyuanlu lbnphoenix ibrahim85 jaykimbravekjh harveyyan dilee jis958 pjmin batermj dasurax abiraja2004 hal2001 sdwfrost amina11 zoujun123 jungi21cc jhonsonzhangxing praveenkumarchandaliya codeboxldd durgaprasd sumihui oesteban kentwait birajaghoshal shi27feng handsomeboy breaktire marthavk wellbeing18 afcarl qniguoym zhihaolzh qxcssdl zhuguangqiang frank-lb zackzhengkai mxochicale acejoy tamerabdelaziz kaixinzuochuxi xiyacao xingyewuyu lidan456 nakumgaurav lawyerphx1 robert-giaquinto binroot zn16 rionaj dengwx11 machine4life wh-forker vishalbelsare sbanerj2 andrew-stier suyanzhou626 elephann kanbo0409 yzabc007 gumpw ischurov yonghoonkwon bhattg dimitriscc dzwu98

variational-autoencoder's Issues

SystemExit

Hi Altosaar,

I run your code on collab.research.google.com and it gave this error:
"SystemExit
/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py:2890: UserWarning: To exit: use 'exit', 'quit', or Ctrl-D.
warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)"

Can you please run it on colllab ?

Thank you!

Possible error in loss function

The KL divergence part of your loss is inconsistent with your documentation in https://jaan.io/what-is-variational-autoencoder-vae-tutorial/ .

In the documentation, you said use the posterior q(z|x) but in the code, you applied prior~q(z).

https://github.com/altosaar/variational-autoencoder/blob/96337fa367e720bd59da6979b64b2a526e96cfc1/train_variational_autoencoder_tensorflow.py#L132

ELBO_i(\theta, \phi) = \mathbb{E}{q_\theta(z\mid x_i)}[\log p_\phi(x_i\mid z)] - \mathbb{KL}(q_\theta(z\mid x_i) \mid\mid p(z)).ELBO i (θ,ϕ)=Eq θ (z∣x i )[logp ϕ (x i ∣z)]−KL(q θ (z∣x i )∣∣p(z)).

Error in variational-autoencoder code

Hi @altosaar thank you very much for sharing. There is an error on executing the following code:

CODE:

for i in range(30000):
    batch = [np.reshape(b, [28, 28]) for b in mnist.train.next_batch(batch_size=batch_size)[0]]
    sess.run(optimizer, feed_dict = {X_in: batch, Y: batch, keep_prob: 0.8})
        
    if not i % 200:
        ls, d, i_ls, d_ls, mu, sigm = sess.run([loss, dec, img_loss, latent_loss, mn, sd], feed_dict = {X_in: batch, Y: batch, keep_prob: 1.0})
        plt.imshow(np.reshape(batch[0], [28, 28]), cmap='gray')
        plt.show()
        plt.imshow(d[0], cmap='gray')
        plt.show()
        print(i, ls, np.mean(i_ls), np.mean(d_ls))

ERROR:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1277     try:
-> 1278       return fn(*args)
   1279     except errors.OpError as e:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1262       return self._call_tf_sessionrun(
-> 1263           options, feed_dict, fetch_list, target_list, run_metadata)
   1264 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _call_tf_sessionrun(self, options, feed_dict, fetch_list, target_list, run_metadata)
   1349         self._session, options, feed_dict, fetch_list, target_list,
-> 1350         run_metadata)
   1351 

InvalidArgumentError: Input to reshape is a tensor with 3200 values, but the requested shape requires a multiple of 49
	 [[Node: decoder/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/dense_1/Maximum, decoder/Reshape/shape)]]

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-13-eec091ce993b> in <module>()
      1 for i in range(30000):
      2     batch = [np.reshape(b, [28, 28]) for b in mnist.train.next_batch(batch_size=batch_size)[0]]
----> 3     sess.run(optimizer, feed_dict = {X_in: batch, Y: batch, keep_prob: 0.8})
      4 
      5     if not i % 200:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    875     try:
    876       result = self._run(None, fetches, feed_dict, options_ptr,
--> 877                          run_metadata_ptr)
    878       if run_metadata:
    879         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1098     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1099       results = self._do_run(handle, final_targets, final_fetches,
-> 1100                              feed_dict_tensor, options, run_metadata)
   1101     else:
   1102       results = []

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1270     if handle is None:
   1271       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1272                            run_metadata)
   1273     else:
   1274       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1289         except KeyError:
   1290           pass
-> 1291       raise type(e)(node_def, op, message)
   1292 
   1293   def _extend_graph(self):

InvalidArgumentError: Input to reshape is a tensor with 3200 values, but the requested shape requires a multiple of 49
	 [[Node: decoder/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/dense_1/Maximum, decoder/Reshape/shape)]]

Caused by op 'decoder/Reshape', defined at:
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/usr/local/lib/python3.6/dist-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/usr/local/lib/python3.6/dist-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/usr/local/lib/python3.6/dist-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/usr/local/lib/python3.6/dist-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2718, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2822, in run_ast_nodes
    if self.run_code(code, result):
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-69f122965864>", line 3, in <module>
    dec = decoder(sampled, keep_prob)
  File "<ipython-input-9-2223e58c19ec>", line 6, in decoder
    x = tf.reshape(x, reshaped_dim)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 6199, in reshape
    "Reshape", tensor=tensor, shape=shape, name=name)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1717, in __init__
    self._traceback = tf_stack.extract_stack()

InvalidArgumentError (see above for traceback): Input to reshape is a tensor with 3200 values, but the requested shape requires a multiple of 49
	 [[Node: decoder/Reshape = Reshape[T=DT_FLOAT, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](decoder/dense_1/Maximum, decoder/Reshape/shape)]]

Reduce_sum instead of reduce_mean

Hi @altosaar

First, thank you for this small tutorial on using tf.distributions with a VAE!

elbo = tf.reduce_sum(expected_log_likelihood - kl, 0)

I wanted to know why you chose to use reduce_sum instead of reduce_mean when computing the ELBO?

Thanks!

Size of output weights file

Hi, all.
I'm curious about the size of the output weights file (Pytorch), since I plan to apply this to programs without academic purposes.

Shapes are inconsistent in your implementation

Hello, while I like your implementation very much, I've found something strange.
Specifically, I think you should correct the shape of the Normal distribution p_z as follows:

# Take samples from the prior
  with tf.variable_scope('model', reuse=True):
    p_z = distributions.Normal(loc=np.zeros(FLAGS.latent_dim, dtype=np.float32),
                               scale=np.ones(FLAGS.latent_dim, dtype=np.float32))
    p_z_sample = p_z.sample(FLAGS.n_samples)
    p_x_given_z_logits = generative_network(z=p_z_sample,
                                            hidden_size=FLAGS.hidden_size)
    prior_predictive = distributions.Bernoulli(logits=p_x_given_z_logits)
    prior_predictive_samples = prior_predictive.sample()
    tf.summary.image('prior_predictive',
                     tf.cast(prior_predictive_samples, tf.float32))

Obviously, p_z has the shape of latent_dim, therefore, it produces a 1D tensor when sampling.
As a result, the computation of kl divergence won't be correct.

With regards

Beta lower than one

is it right to have beta lower than one multiplied by KLd?

no attribute 'stochastic_graph'

Hi Jaan,
there seem to have been some changes in tensorflow. I get the error

module 'tensorflow.contrib.bayesflow' has no attribute 'stochastic_graph

tensorflow-1.5.0
python 3.6

Bye
parsifal9

Adaptation to CNN

Hello again!

Was wondering if you had any advice as to what it would take to turn this into a CVAE implementation. Would it be enough to change the VAE part, or would the impact trickle down to all other networks (mainly thinking of the flow.py models).

Thanks in advance!

Kind regards,
Theodore.

Working version for Python 3+

Hello,

I run your code with Python 3.5+ and it throws several errors. I suppose it is supposed to run with Python 2.7?
Some lines that could be edited for Python 3.5+:

uid = map(lambda x: profile2id[x], tp['userId'])
sid = map(lambda x: show2id[x], tp['movieId'])

Should be:

uid = list(map(lambda x: profile2id[x], tp['userId'])) 
sid = list(map(lambda x: show2id[x], tp['movieId']))

idxlist = range(N) should be idxlist = np.arange(N)

Surprising results with no convergence

Hi there,
I am training this VAE on top of an input space of size 2048 (Sentence embedding space), I am trying to tweak the parameters in order to reconstruct the input space correctly but I cannot make it converge.

With config as follows:

config = """
latent_size: 128
variational: flow
flow_depth: 2
data_size: 2048
learning_rate: 0.00001
batch_size: 128
test_batch_size: 64
max_iterations: 100000
log_interval: 1000
early_stopping_interval: 5
n_samples: 128
use_gpu: false
train_dir: $TMPDIR
data_dir: $TMPDIR
seed: 582838
"""

It have the following results:

step:   0       train elbo: -1458.64
step:   0               valid elbo: -1452.85    valid log p(x): -1437.75
step:   1000    train elbo: 175476434.95
step:   1000            valid elbo: 177791955.55        valid log p(x): 177795714.31
step:   2000    train elbo: nan
step:   2000            valid elbo: nan valid log p(x): nan

I thought about reducing the learning rate, changing the size of the latent space (I don't know whether increasing it or decreasing it is better in that case where input_dim = 2048), changing batch size, but nothing seems to be conclusive.

Also, why is the log-likelihood getting such high values? (p(x) should be in [0,1] ...)

Any idea on this matter? :)
Thanks for the great work.

Why not average over batch dimension?

in train_variational_autoencoder_pytorch.py file

sum over batch dimension

loss = -elbo.sum(0)

change into

average over batch dimension

loss = -elbo.mean(0)

the expected_log_likelihood is not a expected value, but only an log likelihood

May I confirm line 133 in train_variational_autoencoder_tensorflow.py:

 expected_log_likelihood = tf.reduce_sum(p_x_given_z.log_prob(x),
                                          [1, 2, 3])

It seems the reduce_sum returns not an expected value, but only a log likelihood.

Regarding the loss function

Hi, thank you for the nice implementation. Are there any references to the loss function "log_p_x_and_z - log_q_z" used in the Pytorch implementation? Both the two logarithm terms refer to a "NormalLogProb" class. It seems quite different from the one in the paper "Tutorial: Deriving the Standard Variational Autoencoder (VAE) Loss Function". Thank you for your time reading this.

Compatibility with TF v1.2

Great tutorial! Thanks.

To run with TensorFlow version 1.2, I had to change:

kl = tf.reduce_sum(distributions.kl(q_z.distribution, p_z), 1)

kl = tf.reduce_sum(distributions.kl_divergence(q_z.distribution, p_z), 1)

Eitan

Interpretation of each dimension on the shape

In the train_variational_autoencoder_pytorch.py

if __name__ == '__main__':
...
    for step, batch in enumerate(cycle(train_data)):
        x = batch[0].to(device)
        model.zero_grad()  # Sets gradients of all model parameters to zero.
        variational.zero_grad()
        z, log_q_z = variational(x) 
        log_p_x_and_z = model(z, x)  
        # average over sample dimension
        elbo = (log_p_x_and_z - log_q_z).mean(1) 
...

The reason is not mean (0) or mean(2),because:
log_p_x_and_z's and log_q_z's shape both are 3D. According to my understanding of ELBO, the interpretation of each dimension on the shape is as follows:
(batch_size, sample_times, lantent_size)
In your code, the sample_times of the lantent variables is once, so shape[1] = 1.
So, mean(1) is an estimate of the ELBO expectation.

Is my understanding correct?

Graphs for pyTorch version

Hello! Thanks for your implementation it was really helpful to try and set it up on my own dataset!

Was wondering if you had a script somewhere that includes the sampling graphs for the pyTorch version, as I feel they were only included in the TF code.

Thanks in advance!

Kind regards,
Theodore.

A question regarding q_z

I follow the torch version.
Elbo = Eq[log(P(x}z) -Kl(q(z|x)||P(z))]
Denote Log(P(x|z) -KL(Q(Z|X)||p(Z) as t(z,x) we have
Elbo= Eq(t(z,x))

For each point you perform calculate only the t(z,x). Namely for a batch of 16 samples each point t(z}xi) get the same weight , how come?

Dataset is lost

url "http://www.cs.toronto.edu/~larocheh/public/datasets/binarized_mnist" is missing now,where can i find a new one,tankyou~

Follow up on why inputs must be between 0 and 1

I was wondering why the inputs must be scaled to be between 0 and 1. Ref: #25

Reading your blog article, it seems to me that there is no such requirement imposed mathematically. (Or is there?)

deprecation warnings for tensorflow

Using a pretty up-to-date container https://github.com/jupyter/docker-stacks/tree/master/tensorflow-notebook I encountered the following Deprecation Warnings... there are a lot of them.

WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py:1624: flatten (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.flatten instead.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From variational-autoencoder/train_variational_autoencoder_tensorflow.py:98: Normal.__init__ (from tensorflow.python.ops.distributions.normal) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/distributions/normal.py:160: Distribution.__init__ (from tensorflow.python.ops.distributions.distribution) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From variational-autoencoder/train_variational_autoencoder_tensorflow.py:106: Bernoulli.__init__ (from tensorflow.python.ops.distributions.bernoulli) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From variational-autoencoder/train_variational_autoencoder_tensorflow.py:132: kl_divergence (from tensorflow.python.ops.distributions.kullback_leibler) is deprecated and will be removed after 2019-01-01.
Instructions for updating:
The TensorFlow Distributions library has moved to TensorFlow Probability (https://github.com/tensorflow/probability). You should update all references to use `tfp.distributions` instead of `tf.distributions`.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
2019-09-11 20:55:51.252283: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-09-11 20:55:51.274283: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2592000000 Hz
2019-09-11 20:55:51.275538: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5638c76eb780 executing computations on platform Host. Devices:
2019-09-11 20:55:51.275599: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
WARNING:tensorflow:From variational-autoencoder/train_variational_autoencoder_tensorflow.py:151: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.<locals>.wrap.<locals>.wrapped_fn (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please use urllib or similar directly.
Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:262: extract_images (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/dat/train-images-idx3-ubyte.gz
Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:267: extract_labels (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.data to implement this functionality.
Extracting /tmp/dat/train-labels-idx1-ubyte.gz
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:110: dense_to_one_hot (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use tf.one_hot on tensors.
Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
Extracting /tmp/dat/t10k-images-idx3-ubyte.gz
Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.
Extracting /tmp/dat/t10k-labels-idx1-ubyte.gz
WARNING:tensorflow:From /opt/conda/lib/python3.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:290: DataSet.__init__ (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.

Then this one follows for each iteration summary:

Saving TensorBoard summaries and images to: /tmp/log/
Iteration: 0 ELBO: -529.328 s/iter: 2.030e-04
Lossy conversion from float32 to uint8. Range [0, 1]. Convert image to uint8 prior to saving to suppress this warning.

Thanks

An excellent explanation of VAE for Bayesians was very useful for me. Thanks !

unable to open file: name = 'dat/binarized_mnist.hdf5'

Anyone direct me to where this file should be downloaded or tell me how to generate it? Thank you in advance!

AttributeError: module 'flow' has no attribute 'InverseAutoregressiveFlow'

Hi, i am currently facing this error but have no idea how to solve it, can you help me out?

Tensor size mismatch in VariationalMeanField.forward

I'm trying to train model with variational: mean-field and get the following error:

user@maccie:prj/variational-autoencoder ‹master*›$ python3 train_variational_autoencoder_pytorch.py --variational mean-field
train_variational_autoencoder_pytorch.py:184: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  dictionary = yaml.load(config)
step:   0       train elbo: -557.94
Traceback (most recent call last):
  File "train_variational_autoencoder_pytorch.py", line 233, in <module>
    valid_elbo, valid_log_p_x = evaluate(cfg.n_samples, model, variational, valid_data)
  File "train_variational_autoencoder_pytorch.py", line 173, in evaluate
    elbo = log_p_x_and_z - log_q_z
RuntimeError: The size of tensor a (128) must match the size of tensor b (512) at non-singleton dimension 1

I very much hope that you can also give this paper Code implementation

Many people are actually based on the code written in this paper（https://arxiv.org/pdf/1611.01144.pdf）, but none of them implement the implementation part of Bernoulli variables. According to https://arxiv.org/pdf/1611.00712.pdf, obeying the Bernoulli distribution hidden variable is a very important special case in Gumbel Softmax, and its formula is different from the general Categorical distribution.

I think the VAE you implement is very authoritative (because you insist on using the original form of ELBO, and finally only sum, rather than seeking experience average, in https://arxiv.org/pdf/1611.00712.pdf, the author also Said ELBO's expression should be the expected form, the inclusion of KL divergence is actually unreasonable. I think you and they both think of a piece), so I very much hope that if you have time, you can also give this paper Code implementation of(https://arxiv.org/pdf/1611.00712.pdf).
Thanks, :)