aaml

Adversarial AutoML testing suite

Code for the thesis Data Poisoning Attacks against Automated Machine Learning

Organization

Each AutoML Framework has its own Docker container for the testing Environment. Getting test accuracies for all Frameworks can be done using the "task.py" python file. The averaged results and the locations for each framework (except Google AutoML since online Testing and H2O AutoML due to bad performance) can be seen in the following table:

Framework	Path	Test Accuracy Clean	Test Accuracy Untargeted Ɛ = 4	Test Accuracy Targeted Ɛ = 64
AutoKeras	automl-tester/autokeras	88.25%	34.00%	27.21%
P-DARTS	automl-tester/darts	86.98%	30.33%	32.44%
Auto-Sklearn	automl-tester/auto-sklearn	86.52%	62.57%	52.39%
TPOT	automl-tester/tpot	84.74%	55.88%	46.48%
H2O	automl-tester/h2o	12.45%	-	-
MLJAR	automl-tester/mljar	86.55%	58.15%	30.56%
AutoGluon	automl-tester/autogluon	83.08%	33.28%	21.99%
GCloud AutoML	Tested Online	86.90%	86.15%	37.52%

The results are obtained using 10% of the training data from the FASHION-MNIST dataset and 100% of the test dataset.

Preparations

The malicious and clean training data is located under /home/aaml/storage. It can be generated by launching the gpu-benchmarker/generatePoisons docker container, but is also provided precomputed. Recomputing it uses the benchmark model trained with full FASHION-MNIST inside the storage folder.

The clean csv files must be unzipped before usage (located at /home/aaml/storage/data/clean/fashion-mnist):

unzip storage/data/clean/fashion-mnist/fashion-mnist_train.zip -d storage/data/clean/fashion-mnist/

Results for experiments are saved for each container under /aaml/server-output.

Afterwards, the experiment can be started, for example for AutoKeras:

docker-compose up --build automltester_autokeras

Architecure Evaluation

We also tested if poisoned architectures are weaker than ones generated with clean Data. For that we trained 100 Architectures with clean/poisoned data and then saved the architecture, reset the weights and retrained both using clean data.

The results show 81,86 % accuracy for clean architectures and 80,33 % for poisoned architectures. This result is not statistically significant => poisoned architectures are not weaker than clean ones. Calculations can be obtained by executing testArchitectures.py in the autokeras folder. Results for each architecture are in the results folder.

guentj / aaml Goto Github PK

aaml's Introduction

aaml

Organization

Preparations

Architecure Evaluation

aaml's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent