Comments (9)
Yes, the trick states that you should train D on one mini-batch of only real samples and one mini-batch of only synthetic samples. Why this performs better, I do not know.
from ganhacks.
@soumith Do you have any explanation as to why pooling samples is not recommended?
The batchnorm is a very tricky layer: after each forward pass through the discriminator D, the layer changes, namely its exponentially moving average statistics accumulators get updated. Therefore calling D(real) and then D(fake) give forward passes through slightly different networks. I suspect that by doing this some extra information about the synthetic / real samples could be involuntarily leaked to the discriminator through batchnorm's statistic accumulators.
I made a simple experiment in theano/lasagne: used simple 4-layer GANs to train a generator for scikit's circles dataset. There were 10 updates of the discriminator per 1 update of the generator.
The networks without BN layers train slowly, but in the end the generator wins. After introducing BN layers and first feeding real samples D(X) and then synthetic ones D(G(Z)), every experiment ended in the discriminator completely defeating the generator (also the output of the generator was wildly unstable). Tuning the number of updates didn't solve the problem.
To remedy this, and having observed the global effect of batchnorm layer, I pooled the real and fake samples (lasagne's ConcatLayer along the batch axis), fed the joint batch though the discriminator, and then split the D's output accordingly. This resulted in both a speed up in training, and a winning generator.
from ganhacks.
I wonder at how one would implement this trick into code, e.g. TensorFlow.
Having a loss like this
disc_loss = -tf.reduce_mean(tf.log(disc_corpus_prediction) + tf.log(1 - disc_from_gen_prediction))
seems to be unintuitive at how to split this loss function to its parts.
Does anyone have a small example of how to do this ?
As far as I think the reason for this trick to work is partly described in this paper especially in section 3.2.
from ganhacks.
@spurra
Thanks for the reply.
In practice do I need to train in this fashion (Train D (positive) -> Train G -> Train D (negative)) ? Or do I need to Train D(positive)->Train D(negative -> Train G?
from ganhacks.
@shuzhangcasia Train D(positive)->Train D(negative) -> Train G makes more sense, as you're training first D completely and then G can learn from D and I haven't seen the first order you mentioned. That does not mean it would not work :)
from ganhacks.
I tried to alternate D(positive) and D(negative) with G training, and the resulting Gan is wildly oscillating. I got good results by training D(positive) and D(negative) each time before G train.
from ganhacks.
This trick is working for me. However do you have any reference or ideas why putting real and fake examples in the same batch does not work? Thanks :D
from ganhacks.
My discriminator is unable to learn anything when I create two separate batches, even if I don't update the generator at all...
from ganhacks.
@vojavocni typical of a bad implementation. Check your code, your error is not in the loss
from ganhacks.
Related Issues (20)
- Question: Any interest in putting together a premium version for sale on SugarKubes? HOT 3
- Need help in understanding #1 and #16 HOT 2
- Unexplainable GAN losses, need help HOT 3
- is the training normal? how is the right train loss curve?
- PixelShuffle, ConvTranspose2d + stride will get bad checkerboard HOT 2
- Evaluate the Discriminator module over different set of images
- Paper / talk associated?
- Understand #10
- Hacks relevancy in 2020 HOT 1
- balance loss maybe is good idea?
- Discriminator loss of WGAN-gp drop sharply to zero
- > @KrnTneja : Thanks for your tricks. Could you provide any code to do it? I also meet the problem of loss D goes to zero HOT 1
- Reference for trick#16
- Dose it need to transfer pretrained discriminator when transfer learning for GAN? HOT 1
- Tricks to sharpen GAN's synthesis
- CycleGAN
- MAE increasing when training!
- Activation functions for training and testing phases HOT 1
- GAN training example
- Generator and discriminator schedule
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ganhacks.