The pytorch-adversarial-attack-baselines-for-imagenet-cifar10-mnist from hikmatkhan

PyTorch Adversarial Attack Baselines for ImageNet, CIFAR10, and MNIST

PyTorch adversarial attack baselines for ImageNet, CIFAR10, and MNIST (state-of-the-art attacks comparison)

This repository provides simple PyTorch implementations for evaluating various adversarial attacks.
This repository shows state-of-the-art attack success rates for each dataset.
- This repository utilizes attack libraries such as Advertorch, Foolbox, etc.
If you have questions about this repository, please send an e-mail to me ([email protected]) or make an issue.

What does the distance metric mean?

Generally, each pixel value is normalized between [0, 1].
A perturbation with L0 norm of 1,000 could change 1,000 pixels (the number of changed pixels).
A perturbation with L2 norm of 1.0 could change one pixel by 255, ten pixels by 80, 100 pixels by 25, or 1000 pixels by 8.
A perturbation with Linf norm of 0.003922 could change all pixels by 1 (the maximum changeable amount of each pixel).
A perturbation with MSE of 0.001 or lower generally seems imperceptible to humans.

1. ImageNet Dataset

This repository provides a small ImageNet validation dataset of 1,000 classes.
- This dataset has 5 images per class (total 5,000 images).
This is a subset of the ImageNet validation dataset.
The size of adversarial examples: 224 x 224 x 3 (150,528 parameters)
The basic architecture: ResNet-50 (top-1 accuracy: 76.06%)

1) Linf FGSM (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].

Epsilon size	1/255	2/255	4/255	8/255	16/255	32/255
Robust accuracy	10.96%	6.34%	5.40%	6.86%	8.98%	7.36%
Average L0 distance	148600	148600	148600	148600	148600	148600
Average L2 distance	1.5113	3.0161	6.0128	11.9674	23.7250	46.5814
Average MSE	0.00001518	0.00006047	0.00024037	0.00095245	0.00374475	0.01444550
Average Linf distance	0.003922	0.007843	0.015686	0.031373	0.062745	0.125490

2) Linf PGD (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
PGD attack is the 7-step PGD attack.

Epsilon size	1/255	2/255	4/255	8/255	16/255	32/255
Robust accuracy	0.56%	0.06%	0.02%	0.02%	0.00%	0.00%
Average L0 distance	148808	147740	147280	147590	147956	148511
Average L2 distance	1.1457	2.1242	4.0156	7.7791	15.2752	29.9894
Average MSE	0.00000874	0.00003002	0.00010719	0.00040217	0.00155080	0.00597913
Average Linf distance	0.003922	0.007843	0.015686	0.031373	0.062745	0.125490

3) L2 PGD (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
PGD attack is the 7-step PGD attack.

Epsilon size	0.25	0.5	1.0	2.0	4.0	8.0
Robust accuracy	23.56%	5.78%	0.76%	0.14%	0.08%	0.06%
Average L0 distance	149952	150033	150127	150197	150236	150262
Average L2 distance	0.25	0.5	1.0	2.0	4.0	8.0
Average MSE	0.00000041	0.00000166	0.00000664	0.00002657	0.00010629	0.00042517
Average Linf distance	0.009559	0.016910	0.029153	0.050194	0.088268	0.156057

4) L2 CW (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
We set the binary search steps to 4.

Number of iterations	100
Robust accuracy	00.00%
Average L0 distance	148926
Average L2 distance	0.45
Average MSE	0.00000251
Average Linf distance	0.004477

5) L2 CW (Targeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
We set the binary search steps to 4.
We assign a random target label to each original image.
The attack success rate only counts the case the adversarial example is classified as a target class.

Number of iterations	100
Attack success rate	99.92%
Average L0 distance	148926
Average L2 distance	0.70
Average MSE	0.00000382
Average Linf distance	0.025385

6) Boundary Attack (Untargeted)

Google Colab source code
The Foolbox provides a Boundary Attack implementation.
Each pixel (parameter) value is normalized between [0, 1].
The basic untargeted method includes Blended Uniform Noise Attack as a default for the initialization.
- This attack finds a noise that is not classified as a original class.
Boundary Attack is always successful because the result images are always adversarial.

Number of iterations	100	500	1000
Robust accuracy	00.00%	00.00%	00.00%
Average L0 distance	114484	114372	117345
Average L2 distance	36.31	33.34	30.97
Average MSE	0.01487032	0.01322446	0.01143493
Average Linf distance	0.232628	0.215504	0.203864

7) Boundary Attack (Targeted)

Google Colab source code
The Foolbox provides a Boundary Attack implementation.
Each pixel (parameter) value is normalized between [0, 1].
We assign a random target image and label to each original image.
The attack success rate only counts the case the adversarial example is classified as a target class.
Boundary Attack is always successful because the result images are always adversarial.

Number of iterations	100	500	1000	3000	5000
Attack success rate	100.00%	100.00%	100.00%	100.00%	100.00%
Average L0 distance	150515	150517	150516	150512	150521
Average L2 distance	85.19	76.85	74.54	55.04	41.63
Average MSE	0.05373115	0.04324187	0.04044148	0.02298234	0.01335199
Average Linf distance	0.585978	0.534655	0.526516	0.416445	0.337554

2. CIFAR10 Dataset

The size of adversarial examples: 32 x 32 x 3 (3,072 parameters)
The basic architecture: ResNet-18 (top-1 accuracy: 95.28%)

1) Linf FGSM (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].

Epsilon size	1/255	2/255	4/255	8/255	16/255	32/255
Robust accuracy	67.40%	58.08%	52.53%	48.15%	35.29%	16.74%
Average L0 distance	3053	3053	3053	3053	3053	3053
Average L2 distance	0.2166	0.4328	0.8640	1.7232	3.4283	6.7710
Average MSE	0.00001529	0.00006098	0.00024307	0.00096716	0.00382864	0.01494123
Average Linf distance	0.003922	0.007843	0.015686	0.031373	0.062745	0.125490

2) Linf PGD (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
PGD attack is the 7-step PGD attack.

Epsilon size	1/255	2/255	4/255	8/255	16/255	32/255
Robust accuracy	51.26%	26.75%	8.28%	1.04%	0.08%	0.00%
Average L0 distance	3047	3011	3001	3015	3029	3042
Average L2 distance	0.1867	0.3461	0.6319	1.1733	2.2501	4.4030
Average MSE	0.00001362	0.00003906	0.00013022	0.00044856	0.00164901	0.00631511
Average Linf distance	0.003922	0.007843	0.015686	0.031373	0.062745	0.125490

3) L2 PGD (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
PGD attack is the 7-step PGD attack.

Epsilon size	0.25	0.5	1.0	2.0	4.0	8.0
Robust accuracy	31.74%	13.56%	2.38%	0.13%	0.00%	0.00%
Average L0 distance	3066	3067	3068	3069	3069	3069
Average L2 distance	0.25	0.5	1.0	2.0	4.0	8.0
Average MSE	0.00002035	0.00008138	0.00032551	0.00130208	0.00520833	0.02083333
Average Linf distance	0.025261	0.043822	0.077334	0.141376	0.267958	0.523127

4) Boundary Attack (Targeted)

Google Colab source code
The Foolbox provides a Boundary Attack implementation.
Each pixel (parameter) value is normalized between [0, 1].
We assign a random target image and label to each original image.
The attack success rate only counts the case the adversarial example is classified as a target class.
Boundary Attack is always successful because the result images are always adversarial.

Number of iterations	100	500	1000	3000	5000	10000
Attack success rate	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%
Average L0 distance	3071	3071	3071	3059	3062	3056
Average L2 distance	9.63	7.36	6.02	2.17	1.08	0.42
Average MSE	0.03306946	0.01989065	0.01330044	0.00196387	0.00054490	0.00008164
Average Linf distance	0.440098	0.344276	0.286082	0.110370	0.061091	0.032690

5) HopSkipJump Attack (Targeted)

Google Colab source code
The Foolbox provides a HopSkipJump Attack implementation.
Each pixel (parameter) value is normalized between [0, 1].
We assign a random target image and label to each original image.
The attack success rate only counts the case the adversarial example is classified as a target class.
HopSkipJump Attack is always successful because the result images are always adversarial.
Each iteration includes 100 gradient approximation steps.

Number of iterations	10	30	50
Attack success rate	100.00%	100.00%	100.00%
Average L0 distance	3065	3059	3059
Average L2 distance	3.33	1.79	1.32
Average MSE	0.00402476	0.00115264	0.00062992
Average Linf distance	0.166263	0.093042	0.070633

3. MNIST Dataset

The size of adversarial examples: 28 x 28 x 1 (784 parameters)
The basic architecture: LeNet (top-1 accuracy: 98.99%)

1) Linf FGSM (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].

Epsilon size	1/255	2/255	4/255	8/255	16/255	32/255
Robust accuracy	98.86%	98.65%	98.19%	96.88%	91.80%	61.40%
Average L0 distance	459	459	459	459	459	459
Average L2 distance	0.0839	0.1671	0.3299	0.6554	1.3049	2.5937
Average MSE	0.00000901	0.00003569	0.00013916	0.00054924	0.00217703	0.00860155
Average Linf distance	0.003922	0.007843	0.015686	0.031373	0.062745	0.125490

2) Linf PGD (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
PGD attack is the 7-step PGD attack.

Epsilon size	1/255	2/255	4/255	8/255	16/255	32/255
Robust accuracy	98.86%	98.65%	98.16%	96.73%	90.79%	46.27%
Average L0 distance	495	496	499	503	509	518
Average L2 distance	0.0830	0.1650	0.3258	0.6460	1.2826	2.5349
Average MSE	0.00000881	0.00003484	0.00013578	0.00053386	0.00210479	0.00822191
Average Linf distance	0.003922	0.007843	0.015686	0.031373	0.062745	0.125490

3) L2 PGD (Untargeted)

Google Colab source code
Advertorch and Foolbox show almost the same results.
Each pixel (parameter) value is normalized between [0, 1].
PGD attack is the 7-step PGD attack.

Epsilon size	0.25	0.5	1.0	2.0	4.0	8.0
Robust accuracy	97.91%	96.06%	86.44%	26.70%	0.10%	0.00%
Average L0 distance	672	672	672	672	670	673
Average L2 distance	0.25	0.5	1.0	2.0	4.0	8.0
Average MSE	0.00007970	0.00031879	0.00127532	0.00510183	0.02040816	0.08163265
Average Linf distance	0.045673	0.092980	0.190897	0.391071	0.752917	0.914406

hikmatkhan / pytorch-adversarial-attack-baselines-for-imagenet-cifar10-mnist Goto Github PK

pytorch-adversarial-attack-baselines-for-imagenet-cifar10-mnist's Introduction

PyTorch Adversarial Attack Baselines for ImageNet, CIFAR10, and MNIST

What does the distance metric mean?

1. ImageNet Dataset

1) Linf FGSM (Untargeted)

2) Linf PGD (Untargeted)

3) L2 PGD (Untargeted)

4) L2 CW (Untargeted)

5) L2 CW (Targeted)

6) Boundary Attack (Untargeted)

7) Boundary Attack (Targeted)

2. CIFAR10 Dataset

1) Linf FGSM (Untargeted)

2) Linf PGD (Untargeted)

3) L2 PGD (Untargeted)

4) Boundary Attack (Targeted)

5) HopSkipJump Attack (Targeted)

3. MNIST Dataset

1) Linf FGSM (Untargeted)

2) Linf PGD (Untargeted)

3) L2 PGD (Untargeted)

pytorch-adversarial-attack-baselines-for-imagenet-cifar10-mnist's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org