jiamings / wgan Goto Github PK

View Code? Open in Web Editor NEW

237.0 10.0 82.0 11 KB

Tensorflow Implementation of Wasserstein GAN (and Improved version in wgan_v2)

Python 100.00%

tensorflow generative generative-adversarial-network generative-model tensorflow-models tensorflow-experiments

wgan's Introduction

Wasserstein GAN

Tensorflow implementation of Wasserstein GAN.

Two versions:

wgan.py: the original clipping method.
wgan_v2.py: the gradient penalty method. (Improved Training of Wasserstein GANs).

How to run (an example):

python wgan_v2.py --data mnist --model mlp --gpus 0

wgan's People

Contributors

Stargazers

Watchers

Forkers

holycaonima yuanezhou stikbuf davidpeng11 tonyabracadabra peterouzh yuguangtong collector-m kuanih liaoheping anitan0925 zqaidwj1314 codevampireg countkisg daigo0927 salopge vonzunlei paidamoyo mmarklar zhl10154 yzbdt yejigtolearn zvcxoyo zxu7 jmern3250 duxuhao deermini phoenix1992 bannaboxer shubhampachori12110095 shooter2062424 saddlekiller duke24k toddwyl gaoyz0625 xzllxls yonghoonkwon davidreiman caowenming0419 afcarl litpuvn simon-2021 yujack333 blair-hu kaiqiao1992 wrccrwx hujunxianligong imstackoverflow eastskyk jackustc paolomarcatili diefimov daviscooper preyasgarg lixingbao ligangzheng guker liuweiping2020 yokinshita austinjstromme hsh41 yunlongyu lazywolf007 xkotaro jackdoll qguo96 cmparacn baigm11 miguellopes anandpaiv gunagg marieleyse nealcl frankzhangyk ljj898 fagan2888 andrebluhm tantantetetao lijiam13 snehashish2403 frankyhlucky

wgan's Issues

Reason behind not running discriminator extra iterations during initial training levels

I believe, they run their discriminators extra iterations during initial training. Is there more stability in gradient clipping methods that we don't do this? Or is it actually not better to do this in this case?

Weight clipping should occur AFTER critic update

Thanks again for providing this code. One thing I wanted to point out is that in the psuedo-code provided in the paper they clip the weights after gradient update for the critic. You would just need to move sess.run(d_clip) after the rms_prop update.

GAN code to convert synthetic images to real.. i.e. working with paired images

Hi, please share a WGAN code to convert synthetic images to real (with paired images) if you know. Any other vanila GAN would also be helpful.

Batch Normalization

Thanks for sharing the code. The code is elegant and well structured, I'm going to pick this one as a starting point.
But there might be a glitch in the batch normalization.
Are you using batch normalization the in the same way during training and testing?
Or am I missing anything in the paper suggesting using batch normalization in this way?

If you're using BN in the normal sense, it seems that in your training the moving_mean and moving_variance is not updated.
Check this link: tensorflow/tensorflow#1122
And perhaps also this: http://r2rt.com/implementing-batch-normalization-in-tensorflow.html
A way to update the variables might be the following:

            with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
            update_ops = tf.no_op()

Please let me know if I miss anything

Wrong objective

wgan/wgan_v2.py

Line 29 in e46cae1

self.g_loss = tf.reduce_mean(self.d_)

I think according to the paper there is a minus missing in this line. the paper is optimising -D(G(z))

why the loss in wgan.py is different with the original paper?

It makes me confused that which one is correct?
As implemented in wgan.py, we have
self.g_loss = tf.reduce_mean(self.d_)
self.d_loss = tf.reduce_mean(self.d) - tf.reduce_mean(self.d_)
however, according to the original paper of wgan, it seems that we should minimize (-1)*self.g_loss, instead of self.g_loss. Could you tell me why the losses are implemented in the above form? Anyway, it seems that using the implementation in wgan.py or wgan_v2.py, I can still get some results. This makes me more confused.

How about the losses as follows
self.g_loss = tf.reduce_mean(tf.scalar_mul(-1,self.d_))
self.d_loss = tf.reduce_mean(self.d_) - tf.reduce_mean(self.d)
?

Thank you!

Is it wrong for tf.reduce_sum(tf.square(ddx), axis=1)?

@jiamings In your wgan-v2.py, you write
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
this will result a shape (50, 64, 64, 3) to (50, 64, 3) in tensorflow 1.6.0.
If I understand right for the paper Improved Training of Wasserstein GANs, this should be
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=[1,2,3])) which get a shape of (50,) indicate the batch size.

License question

Hey there, thanks a lot for this implementation of wgan in tensorflow. I was just wondering if it's actually free to use? I believe without an explicit license in the repo, the code defaults to being completely unusable by other people (this includes potential contributors). https://choosealicense.com/no-permission/

It's fine if this is actually your intent, but just making sure!

Thank you for posting this. I have some Tensorflow language candy to share.

Thank you for sharing your code. It actually helped me a lot!

for ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
you can actually use ddx = tf.norm(ddx, axis=1), I tried this it's actually the same result
for the Discriminator in mlp, I use tf.layers, that saves a lot of lines of code. As below.

def discriminator(x):
    with tf.variable_scope('discriminator'):
        nn_x  = tf.reshape(x, [tf.shape(x)[0], 28, 28, 1])
        conv1 = tf.layers.conv2d(nn_x, filters=64, kernel_size=4, strides=2, activation=leaky_relu)
        conv2 = tf.layers.conv2d(conv1, filters=128, kernel_size=4, strides=2, activation=leaky_relu)
        bn    = tf.layers.batch_normalization(conv2, training=True)
        flt   = tf.contrib.layers.flatten(bn)
        dense = tf.layers.dense(flt, 1024, activation=leaky_relu)
        logits= tf.layers.dense(dense, 1)
        return logits

Generator loss Interpretation

In the standard GAN's, the Generator is optimized in a way that D mistakes a generated sample as a real sample. And the Discriminator output (sigmoid activation) implies the probability of sample coming from Real(1) or generated(0) distribution.

However in WGAN's, losses are defined as: Dloss= D(real)-D(fake), Gloss= D(fake). We minimise both the loss in an alternate fashion with more iterations to critic.

What does minimising Gloss = D(fake) infers? Since the Output of critic does not imply to any physical quantity.. what does minimising Gloss do?

can you put some samples generated by your NN architecture?

Hey,

it would be good if you could show some final images to illustrate a little bit.
otherwise thx for sharing it ! it helps

Thx !