This repo contains code to reproduce the experiments presented in "Adversarial examples detection in features distance spaces". The code trains models for adversarial detection based on intermediate features of the attacked classifier embedded into dissimilarity spaces.
The main requirements are:
- Python 3
- pytorch 0.4 + torchvision
- tensorflow 1.8 + cleverhans
and can be installed with:
pip3 install -r requirements.txt
You will also need the following datasets to replicate the experiments:
- Create the folder
images/original
in the project folder and put the NIPS DEV images in it - Modify the
IMAGENET
variable in reproduce.sh to point to the folder containing the ILSVRC'12 dataset (the script will point to the$IMAGENET/train/
folder) - Run reproduce.sh
./reproduce.sh
The reproduce.sh bash script runs all the steps needed to reproduce the experiments presented in the paper, that is:
- Features extraction from ILSVRC'12 TRAIN dataset
- Class centroid / medoid computation
- Generation of adversarial examples
- Training of multiple detectors
- Reproducing ROC plots