ebagdasa / backdoors101 Goto Github PK

Backdoors Framework for Deep Learning and Federated Learning. A light-weight tool to conduct your research on backdoors.

License: MIT License

Python 99.49% HTML 0.51%

backdoors machine-learning research security pytorch adversarial-machine-learning adversarial deep-learning-security neural-trojan ml-backdoors

backdoors101's People

Contributors

Stargazers

Watchers

Forkers

fedoracy demoallan dwtcourses trendingtechnology bwubrian tubbz-alt gh0st0ne wpfhtl warriormay mrwater98 peixj acaedusol song-q denliness realrui davidhidde stevenzhang9577 rauniyar01 ebrsc17 aireich ledunguit tzq2doc lhysgithub sergiogarciasec tzuren gazzola rl-gan-vision-privacy-finance-projects riaduli akahello comp6130-graduate-group-9 landandland zhzhuang sunbing7 inconstance zhouda98 shiwen1997 welkinchan001 lyh02 turinhe ehsan886 wyh163 lisitian080516 sgtziggy phildesro123 papapene xujing1994 yangzhou6666 ravensanstete ayanamizzzz peterhan91 yutou520131 eurekayuan madeline271 liuyuqinggg wwaangg ehsanul9511 wdbbsq arnakii kunecai bubbazz xiemeigongzi sorasori fabacha mrevilliver ducthinh19102003 21thinh chenghr pkulium openselab mvandermeulen joker3614 aaronjcg anlitsas hashdotlee acodervic fengshaw1

backdoors101's Issues

General questions regarding the framework

Hi,

I'm researching defenses against the blind backdoor attack. I have a couple of questions regarding the backdoors 101 framework w.r.t. defenses:

I can't seem to find the implementations of the defenses (NC and SentiNet) mentioned in the Readme. Are these implemented and if so: how are they implemented and how could I add new defenses myself?
Are the backdoor tasks that change the task of the model (MultiMNIST addition, MultiMNIST multiply) also implemented?
Does the framework provide anything to evaluate the models retrieved from the training process?

Thanks for your help in advance. If possible, I will contribute some defenses after my research is done.

Questions Regarding the code Implementation

Hi, thanks for the code!

I have some questions regarding the code implementation.

In the line 119 of attack.py file, I think the purpose here is to scale the local update of a compromised client so that the local update can replace the global model as described in the equation (3) of the paper How To Backdoor Federated Learning. In the implementation, the scaling factor is set to self.params.fl_weight_scale. And in the config file, it was set to the total number of participants, However, I think this is not correct as it does not take the parameter fl_eta (server-side stepsize) into account, which is used in here to perform global weight update. Also, I think it ignores the fact that the training protocol allows partial participation as implied by this line here. From what I have in mind, the scaling factor should be num_of_participants_at_the_attacked_round / fl_eta.
In the model simple.py, a F.log_softmax is applied. But later, the attack uses the the nn.CrossEntropyLoss, which ends with "normalizing" the neural net's output twice. This seems to be a bit weird to me. Is there any specific reason for this?

Thank you!

How do you measure the effectiveness of the attack?

Hi there, I would like to ask how do you measure the effectiveness of the attack? For instance, I tried to launch a pixel pattern attack on CIFAR-10 via the code. From the paper "Blind Backdoors in Deep Learning Models", I saw that there is a main-task accuracy and backdoor-task accuracy measure as shown below

Is it possible to produce these results via the code? If so, how do I proceed? If not, what are other measures to measure the effectiveness of an attack?

about the PIPA dataset

Hello, I am preparing for my graduate, which aims at Person Recognition.
However, I failed to find the PIPA dataset in the Internet, since the pulic link to the dataset has gone.

Could you share the PIPA? Thanks very much in advance.
Looking for your reply.

Questions about low accuracy of Test_backdoor_True

Hey, Eugene Bagdasaryan.
Thanks a lot for the sharing of the codes of "How To Backdoor Federated Learning".

But I met some problems when I was trying to run cifar_fed with the default settings in your codes with:

python training.py --name cifar10 --params configs/cifar_fed.yaml

I got very low accuracy of Test_backdoor_True.

I'd really appreciate it if you could tell me why.
Thanks a lot.

Bug in save_model function

Hi,
the save_model function does not properly save the best checkpoint. The reason being the following two lines of code.

backdoors101/helper.py

Line 50 in 70869e5

self.best_loss = float('inf')

backdoors101/helper.py

Line 138 in 70869e5

if val_loss < self.best_loss:

During training, save_model is called and loss_val contains the accuracy of the current iteration on the test set, not the loss value.

Fix:
Change the initial value of self.best_loss and modify the comparison (maybe rename self.best_loss and val_loss as well).
self.best_loss = float(0) and if val_loss >= self.best_loss:

Problem saving results into "runs" and "saved_models"

Hi there,

As I am a beginner on Federated Learning and its backdoor attacks, may I check how do I view the training results on tensorboard? Nothing shows on the tensorboard.

Even when I aborted the training, it shows the error "Aborted training. No output generated". I have created the folders "runs" and "saved_models" as mentioned in the instructions.

dose I need to write the multi_mnist_params.yaml if I want to run the multi_mnist task?

no multi_mnist_params.yaml here

can't get a clear result

I'm new to this study.So I want to recurrence your work,but aftter the end of 'python training.py --name mnist --params configs/mnist_params.yaml --commit none ', I can't get a clear result.
i can see some of the processes while the program is running.But there is no logs in runs/ or saved_models/ . and ' No scalar data was found. ' in tensorboard.
like thouse:
2022-11-26 22:03:54 - WARNING - Backdoor True . Epoch: 349. Accuracy: Top-1: 100.00 | Loss: value: 0.00
0it [00:00, ?it/s]2022-11-26 22:03:54 - INFO - Epoch: 350. Batch: 0/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.25', 'normal: 0.75']
99it [00:03, 28.02it/s]2022-11-26 22:03:58 - INFO - Epoch: 350. Batch: 100/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.23', 'normal: 0.77']
197it [00:07, 28.73it/s]2022-11-26 22:04:01 - INFO - Epoch: 350. Batch: 200/938. Losses: ['backdoor: 0.00', 'normal: 0.00', 'total: 0.00']. Scales: ['backdoor: 0.21', 'normal: 0.79']

test_loader is NoneType Object

when I run the training.py, I got this error, then I check the task.py file, the test_loader was initalized None. How can I solve it?

Running FL

Hi! Please what does the eta and fl_weight_scale stand for in the Federated Learning setup? Thank you!

Questions about fl_task.update_global_model

In the update_global_model function, I don't know What is the role of the variable of the 'model_weight'. I think it's not used. Could you tell me how is used? Thanks!

Pip has no package named "yaml" in requirement.txt

I don't know if this affects the rest of how everything runs. You seem to have pyyaml in requirement.txt though.

AttributeError: 'NoneType' object has no attribute 'to'

When i try to run training with ‘python training.py --name mnist --params configs/mnist_params.yaml --commit none’, the following error occurs:

Traceback (most recent call last):
File "training.py", line 119, in
helper = Helper(params)
File "D:\lab\backdoors101\helper.py", line 40, in init
self.make_task()
File "D:\lab\backdoors101\helper.py", line 64, in make_task
self.task = task_class(self.params)
File "D:\lab\backdoors101\tasks\task.py", line 43, in init
self.init_task()
File "D:\lab\backdoors101\tasks\task.py", line 49, in init_task
self.model = self.model.to(self.params.device)
AttributeError: 'NoneType' object has no attribute 'to'

Then I find that the function build_model() in class Task is 'NotImplemented'. Does it mean that i have to make some changes to the code before i use 'python training.py --name mnist --params configs/mnist_params.yaml --commit none'?

Questions about the low benign accuracy on CIFAR-10 and GTSRB dataset of Blind Backdoor

Hi, Eugene Bagdasaryan,

Congratulations on the acceptance of your paper `Blind Backdoors in Deep Learning Models' and thanks for the sharing of its codes.

However, when we run your code on CIFAR-10 dataset and GTSRB dataset, we get a very low benign accuracy (CIFAR: BA: 18.24, ASR: 98.64; GTSRB: BA: 5.7, ASR: 100) with the default settings in your codes. (PS: we get satisified results on MNIST (BA: 98.86, ASR: 99.99)). We are not for sure where the problems are or whether you used different settings in the experiments of your paper. Can you kindly help us for this problem?

Besides, we also reproduce your codes in our open-sourced toolbox (https://github.com/THUYimingLi/BackdoorBox/blob/main/core/attacks/Blind.py) based on your codes and we meet the same problem. I would be very grateful if you can also help us to check our reproduced codes.

Best Regard,
Yiming Li

Questions regarding evading Neural Cleanse

Hi,

Thanks for sharing the code.

I am trying to reproduce the results in the USENIX paper Blind Backdoors in Deep Learning Models that evade the Neural Cleanse defense. I am using the MNIST dataset. I assume if I uncomment the line "- neural_cleanse" in "loss_tasks" in configs/mnist_params.yaml, this should be the same loss function as the one described in Section 6.1 in the paper. Correct me if this is not the case.

So I train a model using the above setting, which is supposed to evade the detection by Neural Cleanse. However, when I use Neural Cleanse to scan this trained model, I get an anomaly index larger than 2, which means the trained model is still considered to be backdoored.

Is there anything not configured properly? Would you be able to take a look? I'd really appreciate it.

How do I perform semantic backdoor attack?

I saw that there are different attacks via the syntehsizer.py, but I cannot find any code related semantic backdoor attack. Is it implemented in the code?

pip install failing

There are multiples issues with installing the version of the packages

numpy~=1.18.4 : Throws error: subprocess-exited-with-error
torch, torchtext versions missing, or is it because of a different Python version. It throws this: ERROR: Ignored the following versions that require a different python version: 0.7 Requires-Python >=3.6, <3.7; 0.8 Requires-Python >=3.6, <3.7 ERROR: Could not find a version that satisfies the requirement torchtext~=0.7.0 (from versions: 0.1.1, 0.2.0, 0.2.1, 0.2.3, 0.3.1, 0.4.0, 0.5.0, 0.6.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.11.1, 0.11.2, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.1, 0.15.2) ERROR: No matching distribution found for torchtext~=0.7.0
Is updating the functions according to the recent versions of the libraries/packages used the only way to go forward?

Question regarding federated experiment with multiple GPUs on one node (machine)

Dear authors, thank you very much for your nice work! From the two papers you wrote, I found some of your experiments were run on either 2 or 4 Titan X GPUs. I was wondering if some experiments in this repo, e.g., cifar federated, can run on multiple GPUs as well? Could you please point me to the point where this is achieved in the code (I couldn't find any code related to torch.dp or torch.ddp)? Many thanks!

Enquiries about the attacks

Does the fact that the function synthesizes_inputs is not implemented mean that all of the attacks in the paper are still not implemented ? Or only batch poisoning ?

Where can you find the dataset for training of model?

Hi there, I was wondering where are the datasets (e.g. CIFAR-10) stored? As I am trying to launch a backdoor attack with image-scaling, how or where can I store my own images? After training with the poisoned images, a model will be saved into the saved_models folder as shown here:

From here, how should I proceed to test whether the attack is successful?

I am sorry for these questions as I am still a beginner in machine learning.

"Cifar is downloaded using PyTorch": Is there a way to insert own image into the Cifar10 dataset?

Cifar is downloaded using PyTorch

Originally posted by @ebagdasa in #12 (comment)

Hi there, I was wondering if there is a way to insert my own image into the dataset? As I am performing an image-scaling attack. Otherwise, will imagenet work instead?

Question about parameter fl_eta in cifar_fed.yaml

Hi @ebagdasa,

Thanks for sharing code.

I am trying to run cifar_fed with command,

    python training.py --name cifar --params configs/cifar_fed.yaml --commit none

I am a little confused about the parameter fl_eta.

In function, run_fl_round (training.py) , the variable, round_participants,

    round_participants = hlpr.task.sample_users_for_round(epoch)

uses parameter fl_no_models (cifar_fed.yaml) to decide the number of users updating weights to server, for example 10 in cifar_fed.yaml.

Then, the code

    hlpr.task.update_global_model(weight_accumulator, global_model)

calls the function update_global_model (fl_task.py).

In function update_global_model (fl_tas.py),

    def update_global_model(self, weight_accumulator, global_model: Module):
        for name, sum_update in weight_accumulator.items():
            if self.check_ignored_weights(name):
                continue
            scale = self.params.fl_eta / self.params.fl_total_participants
            average_update = scale * sum_update
            self.dp_add_noise(average_update)
            model_weight = global_model.state_dict()[name]
            model_weight.add_(average_update)

the sum_update is the sum of all users' weights, which is supposed to be divided by the value of fl_no_model. In the code, however, you use variables scale

    scale = self.params.fl_eta / self.params.fl_total_participants
    average_update = scale * sum_update

to process the sum_update. I didn't find any explains of this logic in papers or any comments in the code.

I wonder would you mind giving more details about the usage of fl_eta?
My questions are,
1. Why sum_update doesn't divide fl_no_model?
2. What is the meaning of self.params.fl_eta / self.params.fl_total_participants?
3. How should I set the fl_eta, if I trying to increase the value of fl_no_model?

Thanks

I can't download dateset

when I run "python training.py --name mnist --params configs/mnist_params.yaml --commit none"
my ternimal will says" urllib.error.HTTPError: HTTP Error 503: Service Unavailable"