Thanks for your very insightful tricks for training the GAN. But I h

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

What do you mean by trick 4? about ganhacks HOT 9 OPEN

soumith commented on June 22, 2024 2

What do you mean by trick 4?

from ganhacks.

Comments (9)

spurra commented on June 22, 2024 2

Yes, the trick states that you should train D on one mini-batch of only real samples and one mini-batch of only synthetic samples. Why this performs better, I do not know.

from ganhacks.

ivannz commented on June 22, 2024 1

@soumith Do you have any explanation as to why pooling samples is not recommended?

The batchnorm is a very tricky layer: after each forward pass through the discriminator D, the layer changes, namely its exponentially moving average statistics accumulators get updated. Therefore calling D(real) and then D(fake) give forward passes through slightly different networks. I suspect that by doing this some extra information about the synthetic / real samples could be involuntarily leaked to the discriminator through batchnorm's statistic accumulators.

I made a simple experiment in theano/lasagne: used simple 4-layer GANs to train a generator for scikit's circles dataset. There were 10 updates of the discriminator per 1 update of the generator.

The networks without BN layers train slowly, but in the end the generator wins. After introducing BN layers and first feeding real samples D(X) and then synthetic ones D(G(Z)), every experiment ended in the discriminator completely defeating the generator (also the output of the generator was wildly unstable). Tuning the number of updates didn't solve the problem.

To remedy this, and having observed the global effect of batchnorm layer, I pooled the real and fake samples (lasagne's ConcatLayer along the batch axis), fed the joint batch though the discriminator, and then split the D's output accordingly. This resulted in both a speed up in training, and a winning generator.

from ganhacks.

MalteFlender commented on June 22, 2024 1

I wonder at how one would implement this trick into code, e.g. TensorFlow.
Having a loss like this

disc_loss = -tf.reduce_mean(tf.log(disc_corpus_prediction) + tf.log(1 - disc_from_gen_prediction))

seems to be unintuitive at how to split this loss function to its parts.
Does anyone have a small example of how to do this ?

As far as I think the reason for this trick to work is partly described in this paper especially in section 3.2.

from ganhacks.

shuzhangcasia commented on June 22, 2024

@spurra
Thanks for the reply.
In practice do I need to train in this fashion (Train D (positive) -> Train G -> Train D (negative)) ? Or do I need to Train D(positive)->Train D(negative -> Train G?

from ganhacks.

spurra commented on June 22, 2024

@shuzhangcasia Train D(positive)->Train D(negative) -> Train G makes more sense, as you're training first D completely and then G can learn from D and I haven't seen the first order you mentioned. That does not mean it would not work :)

from ganhacks.

engharat commented on June 22, 2024

I tried to alternate D(positive) and D(negative) with G training, and the resulting Gan is wildly oscillating. I got good results by training D(positive) and D(negative) each time before G train.

from ganhacks.

thematrixduo commented on June 22, 2024

This trick is working for me. However do you have any reference or ideas why putting real and fake examples in the same batch does not work? Thanks :D

from ganhacks.

commented on June 22, 2024

My discriminator is unable to learn anything when I create two separate batches, even if I don't update the generator at all...

from ganhacks.

DEKHTIARJonathan commented on June 22, 2024

@vojavocni typical of a bad implementation. Check your code, your error is not in the loss

from ganhacks.

What do you mean by trick 4? about ganhacks HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent