Making use of generative models to defend adversarial attacks. Here Variational Auto-Encoder with multiple decoders is used to test the idea. Network structure:
Tensorflow 1.1.0
- assets
- adversarial_attacks
get
- generative_classifier
- VAE_subnet.py
training of generative model
- generator_models.py
networks structure
- VAE_subnet.py
- original_classifier
- utils
You can find pretrained models in /assets/pretrained_generator and /assets/pretrained_lenet
You can also train the discriminator from scratch, by running:
python ./original_classifier/lenet_mnist.py
And to train the generator, by running:
python ./generative_classifier/VAE_subnet.py
To monitor the training process, run:
$ tensorboard --logdir=./saved_logs
To obtain classification accuracy on MNIST/ adversarial images, run:
python ./generative_classifier/classification.py
To obtain church window plot, run:
python ./adversarial_attacks/plot_church_window.py
Here is a comparison of the overall accuracy with binary filtered classifier:
Visualization of different defense method of gradient based attack using church window plot:
X axis is the adversarial gradient direction, Y axis is an arbitary orthogonal direction. Zoom in around origin:
At the boundary on X axis, critical FGSM images:
-[ ] weight sharing in decoders
-[ ] smallNORB dataset
-[ ] overlapping mnist