kupl / adapt Goto Github PK

View Code? Open in Web Editor NEW

20.0 3.0 5.0 3.08 MB

ADAPT is the open source white-box testing framework for deep neural networks

License: MIT License

Dockerfile 0.02% Python 2.65% Jupyter Notebook 97.33%

testing deep-neural-network white-box-testing adapt

adapt's People

Contributors

Stargazers

Watchers

Forkers

sooyoungcha qijiale76 gosky9 yamizi saccsos

adapt's Issues

(2) Idea for ADAPT extension

Apply ADAPT to various models, domains, and coverage metrics.

Question about experimental results on cifar10 with resnet18(test_acc:86.14%)

I pick an image for each category in CIFAR10 that predicts correctly, and metric = TKNC(30), for each image, the generation time is 15 seconds. I compare adapt and DeepXplore similarly with your tutorial, but I get poor result in adapt and good result in DeepXplore, It is normal?
Here is the result :

method	total_input(10 image)	total_adversarial(10 image)	Labels
adapt	2047	654	20
DeepXplore	2618	2394	87

(Question from Oxford Univ) Why do you divide the distance between the new image and the original image by the norm of the original image?

"adapt/fuzzer/fuzzer.py line 173" Why do you divide the distance between the new image and the original image by the norm of the original image? I'm not sure what this corresponds to.

typos in README file

(, which introduced in) -> (, which is introduced in)

(Question from Oxford Univ) About perturbing an image

When perturbing an image, I don't think you constrain the pixel values to remain in valid ranges (e.g. for MNIST, this would be [0, 1]). I'm worried this means that many of the tests are not valid images. I've noticed this is done in DeepXplore's code as well, is this intentional? It seems like it may be unfair to test a classifier on inputs that aren't valid images.

The performance of ADAPT when considering the different L2-distance between the initial input and mutants.

In the paper, you fixed the L2-distance between the initial input and the mutants as 0.05 on average.

Is the performance of ADAPT dependent to the value of the L2-distance?

If so, what happens to the performance if I change the distance to a smaller value like 0.01??

Questions about the hyper-parameters used in the paper.

There are a lot of hyper-parameters used in the paper, like the number of selected neuron.

Is the effectiveness of ADAPT heavily influenced by the values of hyper-parameters?

(Question from Oxford Univ) About calculating neuron coverage

When calculating neuron coverage, I had initially expected that each element of the full activation tensor of a layer would be a neuron. However, adapt/network/network.py line 69, you take a tensor of dimensions HxWxC and turn it into a vector of dimension C by flattening and taking the mean of the first two dimensions. I believe DeepXplore does this as well, but I'm not sure I understand why this is?

Discussion about the redundant features

In the paper, we discussed the important features.
However, we did not discuss the redundant ones among 29 features. Right?

How can we apply ADAPT to another deep neural network model?

We want to apply ADAPT to other deep neural network models, not models used in the paper.
How can we do it?

typo in adapt/tutorial/README.md

Test LeNet-5 that trained for MNIST. -> Test LeNet-5 that is trained for MNIST.

Is ADAPT applicable to the other coverage metrics?

If I understand ADAPT correctly, it seems that it only considers two types of neuron coverage as targets to increase.

As a position interested in deep neural networks, what I am wondering is whether ADAPT is also applicable for the other coverage metrics like surprise coverage which was first introduced in ICSE `19 (Guiding deep learning system testing using surprise adequacy).

(1) Idea for ADAPT extension

Design a sophisticated algorithm that adapts the neuron-selection strategy

The Result of Adapt in vgg

I tried to use test_vgg19.ipynb to run experiment for testing vgg19. But the performance is worse than the examples in the paper, such as "total adversarials" and "total inputs". But I didn't modify any files or arguments in the code of adapt.
BTW, the performance of experiment for testing mnist is the same as the examples. I don't know the reason about this problem.

the output of code "archives_adapt[0].summary()" in my testing:
`
Total inputs: 10
Average distance: 0.001166449161246419
Total adversarials: 0
Average distance: -
Coverage
Original: 0.04568829113924051
Achieved: 0.04720464135021097

Original label: Pomeranian
Count: 414
Average distance: 0.016496244817972183

Total inputs: 413
Average distance: 0.002656622789800167
Total adversarials: 0
Average distance: -
Coverage
Original: 0.04562236286919831
Achieved: 0.05195147679324894
`

And the example is:

`
Total inputs: 8471
Average distance: 0.02087555266916752
Total adversarials: 1135
Average distance: 0.0999956950545311
Coverage
Original: 0.04555643459915612
Achieved: 0.15928270042194093

Original label: Pomeranian
Count: 7336
Average distance: 0.008634363301098347

Label: hyena
Count: 217
Average distance: 0.05843096598982811

Label: meerkat
Count: 255
Average distance: 0.07762718945741653

Label: tick
Count: 224
Average distance: 0.10655633360147476

Label: guinea_pig
Count: 33
Average distance: 0.09237903356552124

Label: English_setter
Count: 44
Average distance: 0.10023358464241028

Label: Petri_dish
Count: 6
Average distance: 0.10553994029760361

Label: flatworm
Count: 343
Average distance: 0.14118780195713043

Label: shower_cap
Count: 13
Average distance: 0.04866204783320427
`