ebagdasa / backdoors101 Goto Github PK
View Code? Open in Web Editor NEWBackdoors Framework for Deep Learning and Federated Learning. A light-weight tool to conduct your research on backdoors.
License: MIT License
Backdoors Framework for Deep Learning and Federated Learning. A light-weight tool to conduct your research on backdoors.
License: MIT License
Hi,
I'm researching defenses against the blind backdoor attack. I have a couple of questions regarding the backdoors 101 framework w.r.t. defenses:
I can't seem to find the implementations of the defenses (NC and SentiNet) mentioned in the Readme. Are these implemented and if so: how are they implemented and how could I add new defenses myself?
Are the backdoor tasks that change the task of the model (MultiMNIST addition, MultiMNIST multiply) also implemented?
Does the framework provide anything to evaluate the models retrieved from the training process?
Thanks for your help in advance. If possible, I will contribute some defenses after my research is done.
Hi, thanks for the code!
I have some questions regarding the code implementation.
In the line 119 of attack.py
file, I think the purpose here is to scale the local update of a compromised client so that the local update can replace the global model as described in the equation (3) of the paper How To Backdoor Federated Learning. In the implementation, the scaling factor is set to self.params.fl_weight_scale
. And in the config file, it was set to the total number of participants, However, I think this is not correct as it does not take the parameter fl_eta
(server-side stepsize) into account, which is used in here to perform global weight update. Also, I think it ignores the fact that the training protocol allows partial participation as implied by this line here. From what I have in mind, the scaling factor should be num_of_participants_at_the_attacked_round / fl_eta
.
In the model simple.py
, a F.log_softmax
is applied. But later, the attack uses the the nn.CrossEntropyLoss
, which ends with "normalizing" the neural net's output twice. This seems to be a bit weird to me. Is there any specific reason for this?
Thank you!
Hi there, I would like to ask how do you measure the effectiveness of the attack? For instance, I tried to launch a pixel pattern attack on CIFAR-10 via the code. From the paper "Blind Backdoors in Deep Learning Models", I saw that there is a main-task accuracy and backdoor-task accuracy measure as shown below
Is it possible to produce these results via the code? If so, how do I proceed? If not, what are other measures to measure the effectiveness of an attack?
Hello, I am preparing for my graduate, which aims at Person Recognition.
However, I failed to find the PIPA dataset in the Internet, since the pulic link to the dataset has gone.
Could you share the PIPA? Thanks very much in advance.
Looking for your reply.
Hey, Eugene Bagdasaryan.
Thanks a lot for the sharing of the codes of "How To Backdoor Federated Learning".
But I met some problems when I was trying to run cifar_fed with the default settings in your codes with:
python training.py --name cifar10 --params configs/cifar_fed.yaml
I got very low accuracy of Test_backdoor_True.
I'd really appreciate it if you could tell me why.
Thanks a lot.
Hi,
the save_model function does not properly save the best checkpoint. The reason being the following two lines of code.
Line 50 in 70869e5
Line 138 in 70869e5
Fix:
Change the initial value of self.best_loss and modify the comparison (maybe rename self.best_loss and val_loss as well).
self.best_loss = float(0)
and if val_loss >= self.best_loss:
Hi there,
As I am a beginner on Federated Learning and its backdoor attacks, may I check how do I view the training results on tensorboard? Nothing shows on the tensorboard.
Even when I aborted the training, it shows the error "Aborted training. No output generated". I have created the folders "runs" and "saved_models" as mentioned in the instructions.
I'm new to this study.So I want to recurrence your work,but aftter the end of 'python training.py --name mnist --params configs/mnist_params.yaml --commit none ', I can't get a clear result.
i can see some of the processes while the program is running.But there is no logs in runs/ or saved_models/ . and ' No scalar data was found. ' in tensorboard.
like thouse:
2022-11-26 22:03:54 - WARNING - Backdoor True . Epoch: 349. Accuracy: Top-1: 100.00 | Loss: value: 0.00
0it [00:00, ?it/s]2022-11-26 22:03:54 - INFO - Epoch: 350. Batch: 0/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.25', 'normal: 0.75']
99it [00:03, 28.02it/s]2022-11-26 22:03:58 - INFO - Epoch: 350. Batch: 100/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.23', 'normal: 0.77']
197it [00:07, 28.73it/s]2022-11-26 22:04:01 - INFO - Epoch: 350. Batch: 200/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.21', 'normal: 0.79']
Hi! Please what does the eta and fl_weight_scale stand for in the Federated Learning setup? Thank you!
In the update_global_model function, I don't know What is the role of the variable of the 'model_weight'. I think it's not used. Could you tell me how is used? Thanks!
I don't know if this affects the rest of how everything runs. You seem to have pyyaml in requirement.txt though.
When i try to run training with ‘python training.py --name mnist --params configs/mnist_params.yaml --commit none’, the following error occurs:
Traceback (most recent call last):
File "training.py", line 119, in
helper = Helper(params)
File "D:\lab\backdoors101\helper.py", line 40, in init
self.make_task()
File "D:\lab\backdoors101\helper.py", line 64, in make_task
self.task = task_class(self.params)
File "D:\lab\backdoors101\tasks\task.py", line 43, in init
self.init_task()
File "D:\lab\backdoors101\tasks\task.py", line 49, in init_task
self.model = self.model.to(self.params.device)
AttributeError: 'NoneType' object has no attribute 'to'
Then I find that the function build_model() in class Task is 'NotImplemented'. Does it mean that i have to make some changes to the code before i use 'python training.py --name mnist --params configs/mnist_params.yaml --commit none'?
Hi, Eugene Bagdasaryan,
Congratulations on the acceptance of your paper `Blind Backdoors in Deep Learning Models' and thanks for the sharing of its codes.
However, when we run your code on CIFAR-10 dataset and GTSRB dataset, we get a very low benign accuracy (CIFAR: BA: 18.24, ASR: 98.64; GTSRB: BA: 5.7, ASR: 100) with the default settings in your codes. (PS: we get satisified results on MNIST (BA: 98.86, ASR: 99.99)). We are not for sure where the problems are or whether you used different settings in the experiments of your paper. Can you kindly help us for this problem?
Besides, we also reproduce your codes in our open-sourced toolbox (https://github.com/THUYimingLi/BackdoorBox/blob/main/core/attacks/Blind.py) based on your codes and we meet the same problem. I would be very grateful if you can also help us to check our reproduced codes.
Best Regard,
Yiming Li
Hi,
Thanks for sharing the code.
I am trying to reproduce the results in the USENIX paper Blind Backdoors in Deep Learning Models that evade the Neural Cleanse defense. I am using the MNIST dataset. I assume if I uncomment the line "- neural_cleanse" in "loss_tasks" in configs/mnist_params.yaml
, this should be the same loss function as the one described in Section 6.1 in the paper. Correct me if this is not the case.
So I train a model using the above setting, which is supposed to evade the detection by Neural Cleanse. However, when I use Neural Cleanse to scan this trained model, I get an anomaly index larger than 2, which means the trained model is still considered to be backdoored.
Is there anything not configured properly? Would you be able to take a look? I'd really appreciate it.
I saw that there are different attacks via the syntehsizer.py, but I cannot find any code related semantic backdoor attack. Is it implemented in the code?
There are multiples issues with installing the version of the packages
error: subprocess-exited-with-error
ERROR: Ignored the following versions that require a different python version: 0.7 Requires-Python >=3.6, <3.7; 0.8 Requires-Python >=3.6, <3.7 ERROR: Could not find a version that satisfies the requirement torchtext~=0.7.0 (from versions: 0.1.1, 0.2.0, 0.2.1, 0.2.3, 0.3.1, 0.4.0, 0.5.0, 0.6.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.1, 0.15.2) ERROR: No matching distribution found for torchtext~=0.7.0
Dear authors, thank you very much for your nice work! From the two papers you wrote, I found some of your experiments were run on either 2 or 4 Titan X GPUs. I was wondering if some experiments in this repo, e.g., cifar federated, can run on multiple GPUs as well? Could you please point me to the point where this is achieved in the code (I couldn't find any code related to torch.dp or torch.ddp)? Many thanks!
Does the fact that the function synthesizes_inputs is not implemented mean that all of the attacks in the paper are still not implemented ? Or only batch poisoning ?
Hi there, I was wondering where are the datasets (e.g. CIFAR-10) stored? As I am trying to launch a backdoor attack with image-scaling, how or where can I store my own images? After training with the poisoned images, a model will be saved into the saved_models folder as shown here:
From here, how should I proceed to test whether the attack is successful?
I am sorry for these questions as I am still a beginner in machine learning.
Cifar is downloaded using PyTorch
Originally posted by @ebagdasa in #12 (comment)
Hi there, I was wondering if there is a way to insert my own image into the dataset? As I am performing an image-scaling attack. Otherwise, will imagenet work instead?
Hi @ebagdasa,
Thanks for sharing code.
I am trying to run cifar_fed with command,
python training.py --name cifar --params configs/cifar_fed.yaml --commit none
I am a little confused about the parameter fl_eta.
In function, run_fl_round (training.py) , the variable, round_participants,
round_participants = hlpr.task.sample_users_for_round(epoch)
uses parameter fl_no_models (cifar_fed.yaml) to decide the number of users updating weights to server, for example 10 in cifar_fed.yaml.
Then, the code
hlpr.task.update_global_model(weight_accumulator, global_model)
calls the function update_global_model (fl_task.py).
In function update_global_model (fl_tas.py),
def update_global_model(self, weight_accumulator, global_model: Module):
for name, sum_update in weight_accumulator.items():
if self.check_ignored_weights(name):
continue
scale = self.params.fl_eta / self.params.fl_total_participants
average_update = scale * sum_update
self.dp_add_noise(average_update)
model_weight = global_model.state_dict()[name]
model_weight.add_(average_update)
the sum_update is the sum of all users' weights, which is supposed to be divided by the value of fl_no_model. In the code, however, you use variables scale
scale = self.params.fl_eta / self.params.fl_total_participants
average_update = scale * sum_update
to process the sum_update. I didn't find any explains of this logic in papers or any comments in the code.
I wonder would you mind giving more details about the usage of fl_eta?
My questions are,
1. Why sum_update doesn't divide fl_no_model?
2. What is the meaning of self.params.fl_eta / self.params.fl_total_participants?
3. How should I set the fl_eta, if I trying to increase the value of fl_no_model?
Thanks
when I run "python training.py --name mnist --params configs/mnist_params.yaml --commit none"
my ternimal will says" urllib.error.HTTPError: HTTP Error 503: Service Unavailable"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.