zhuangdizhu / fedgen Goto Github PK
View Code? Open in Web Editor NEWCode and data accompanying the FedGen paper
Code and data accompanying the FedGen paper
If there are wrong tensor type errors when running experiments with FedGen algorithm, see changes in #3
Thank you for the great work.
Besides, Does anyone try to train with CIFAR-10. I have followed the setup for Mnist: replace the data loader of Mnist to CIFAR-10, change input dimension from 1 to 3, keep the same models. However, the result is not good (about 31%) on FedAvg.
Is there any special setting when do experiment with a new dataset? Thank you
Thanks
Hi.
Does your implementation code of FedProx correspond to the algorithm block 2 in the original paper of FedProx? More specifically, the formula for updating lines 53-54 of code file "fedoptimizer.py" seems a little strange, right? In particular, what does lambda mean in FedProx algorithm?
The update formula I understand should be :
p.data=p.data - group['lr'] * ( p.grad. data + group ['mu'] * (p.data - pstar.data.clone())
Looking forward to your reply.
No issue.
Hi.
Does your implementation code of FedProx correspond to the algorithm block 2 in the original paper of FedProx? More specifically, the formula for updating lines 53-54 of code file "fedoptimizer.py" seems a little strange, right? In particular, what does lambda mean in FedProx algorithm?
The update formula I understand should be :
p.data=p.data - group['lr'] * ( p.grad. data + group ['mu'] * (p.data - pstar.data.clone())
Looking forward to your reply.
I run the example experiment for FedGen on Mnist in README.md with the option "--device cuda" but find out there is no process deployed on GPU. I further explore your code and it seems that you have not handled "args.device" in all scripts. Besides, I add "os.environ["CUDA_VISIBLE_DEVICES"] = '0'" in main.py but the model is still deployed only on CPU. I wonder how I can utilize GPU for FedGen. I really appreciate your help!
FedGen/FLAlgorithms/users/userpFedGen.py
Line 58 in 0bfd4e1
这个循环里,user_output_logp参数第一次使用时是循环外47行定义的,接下来的循环,这个参数就是64行定义的
前者是本地训练的batch的label,后者是random choice的一个batch的label,这个是不是有点奇怪?
It seems the code implemented does not conduct partial parameter sharing. As shown in line 103 of serverpFedGen.py, the partial parameter is default set to False, but in the paper, the pseudo-code shows only the classifier layer of the user's model is shared. Is it a bug or there is something I misunderstand in the code
self.aggregate_parameters()
it's seems that this function 'visualize_image' don't work when use commands
Hello,
I have been working with the FedGen
implementation and have a question regarding the broadcasting of the updated generative model w
to users after it has been trained on the server.
In the FedGen
class, the generative model w
is trained using the train_generator
method. However, I couldn't find the part of the code where the updated generative model parameters are broadcasted to the users after each iteration.
I noticed that the send_parameters
method broadcasts the global model parameters to users but does not broadcast the generative model parameters.
def train(self, args):
#### pretraining
for glob_iter in range(self.num_glob_iters):
print("\n\n-------------Round number: ",glob_iter, " -------------\n\n")
self.selected_users, self.user_idxs=self.select_users(glob_iter, self.num_users, return_idx=True)
if not self.local:
self.send_parameters(mode=self.mode)# broadcast averaged prediction model
self.evaluate()
chosen_verbose_user = np.random.randint(0, len(self.users))
self.timestamp = time.time() # log user-training start time
for user_id, user in zip(self.user_idxs, self.selected_users): # allow selected users to train
verbose= user_id == chosen_verbose_user # perform regularization using generated samples after the first communication round
user.train(
glob_iter,
personalized=self.personalized,
early_stop=self.early_stop,
verbose=verbose and glob_iter > 0,
regularization= glob_iter > 0 )
curr_timestamp = time.time() # log user-training end time
train_time = (curr_timestamp - self.timestamp) / len(self.selected_users)
self.metrics['user_train_time'].append(train_time)
if self.personalized:
self.evaluate_personalized_model()
self.timestamp = time.time() # log server-agg start time
self.train_generator(
self.batch_size,
epoches=self.ensemble_epochs // self.n_teacher_iters,
latent_layer_idx=self.latent_layer_idx,
verbose=True
)
self.aggregate_parameters()
curr_timestamp=time.time() # log server-agg end time
agg_time = curr_timestamp - self.timestamp
self.metrics['server_agg_time'].append(agg_time)
if glob_iter > 0 and glob_iter % 20 == 0 and self.latent_layer_idx == 0:
self.visualize_images(self.generative_model, glob_iter, repeats=10)
self.save_results(args)
self.save_model()
Hello
sorry,I have a problem about main_plot.pyI
the problem
FileNotFoundError: [Errno 2] No such file or directory: 'figs\Mnist/ratio0.5\Mnist-ratio0.5.png'
I hope to have a look during my busy schedule. I just touched this direction.Thank you!
Thank you for open-sourcing your project. I notice that "FedDF" (Ensemble Distillation for Robust Model Fusion in Federated Learning) is one of your baselines in your paper, however, you provide code for only FedAvg, FedProx, FedDistill, and FedGen. Could you please help me reproduce the results of FedDF? I really appreciate your help.
I think in the file plot_utils.py, the variable 'all_curves' used in the outside of the loop only saves the last algorithm's results, in this way, when we add several algorithms in the config, the plot figure result will cut the other algorithms' trend by following the last one's scope.
max_acc = np.max([max_acc, np.max(all_curves) ]) + 4e-2
python main_plot.py --dataset EMnist-alpha0.1-ratio0.1 --algorithms FedAvg,FedGen,FedProx,FedDistill --batch_size 32 --local_epochs 20 --num_users 10 --num_glob_iters 200 --plot_legend 1
Hi, Zhuang
can you share the script to generate the Celeb data? Thanks
when i'm ready to run "python main.py --dataset Mnist-alpha0.01-ratio0.05 --algorithm FedAvg --batch_size 32 --num_glob_iters 200 --local_epochs 20 --num_users 10 --lamda 1 --learning_rate 0.01 --model cnn --personal_learning_rate 0.01 --times 3"I got the following problem。How can I solve it.
Average Global Accurancy = 0.0950, Loss = 2.31.
Traceback (most recent call last):
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\users\userbase.py", line 163, in get_next_train_batch
(X, y) = next(self.iter_trainloader)
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next
data = self._next_data()
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 676, in _next_data
index = self._next_index() # may raise StopIteration
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 623, in _next_index
return next(self._sampler_iter) # may raise StopIteration
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\main.py", line 85, in
main(args)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\main.py", line 42, in main
run_job(args, i)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\main.py", line 37, in run_job
server.train(args)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\servers\serveravg.py", line 35, in train
user.train(glob_iter, personalized=self.personalized) #* user.train_samples
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\users\useravg.py", line 23, in train
result =self.get_next_train_batch(count_labels=count_labels)
File "C:\kust\xuesu\code\FedGen-main\FedGen-main\FLAlgorithms\users\userbase.py", line 167, in get_next_train_batch
(X, y) = next(self.iter_trainloader)
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next
data = self._next_data()
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 676, in _next_data
index = self._next_index() # may raise StopIteration
File "C:\Users\Administrator\anaconda3\envs\FedGen\lib\site-packages\torch\utils\data\dataloader.py", line 623, in _next_index
return next(self._sampler_iter) # may raise StopIteration
StopIteration
I wonder why the user_latent_loss
is not mentioned in your paper.
It seems that the code does not supprt CUDA?
--device "cuda" can be set but it seems that it is always running on cpu
Thanks
It seems that torch.rand generates [0,1) uniformly based on the official documentation instead of standard Gaussian. Is this intended? Thanks
Full error message: RuntimeError: Can't call numpy()
on Tensor that requires grad. Use tensor.detach().numpy()
instead.
Added the following as line 227 to serverbase.py
to resolve:
test_losses = [t.detach() for t in test_losses]
Python version: 3.8.6
When I ran the EMNIST experiment after generation of emnist dataset I got:
(pt) wangshu@ubuntu:~/projects/FedGen$ CUDA_VISIBLE_DEVICES=3 python main.py --dataset EMnist-alpha0.1-ratio0.1 --algorithm FedGen --batch_size 32 --local_epochs 20 --num_users 10 --lamda 1 --model cnn --learning_rate 0.01 --personal_learning_rate 0.01 --num_glob_iters 200 --times 3
================================================================================
Summary of training process:
Algorithm: FedGen
Batch size: 32
Learing rate : 0.01
Ensemble learing rate : 0.0001
Average Moving : 1.0
Subset of users : 10
Number of global rounds : 200
Number of local rounds : 20
Dataset : EMnist-alpha0.1-ratio0.1
Local Model : cnn
Device : cpu
================================================================================
[ Start training iteration 0 ]
Creating model for emnist
Network configs: [6, 16, 'F']
Dataset emnist
/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/_reduction.py:42: UserWarning: size_average and reduce args will be deprecated, please use reduction='none' instead.
warnings.warn(warning.format(ret))
Build layer 57 X 256
Build last layer 256 X 32
ensemble_lr: 0.0001
ensemble_batch_size: 128
unique_labels: 25
latent_layer_idx: -1
label embedding 0
ensemeble learning rate: 0.0001
ensemeble alpha = 1, beta = 0, eta = 1
generator alpha = 10, beta = 1
Number of Train/Test samples: 12480 8120
Data from 20 users in total.
Finished creating FedAvg server.
-------------Round number: 0 -------------
Traceback (most recent call last):
File "/home/wangshu/projects/FedGen/main.py", line 85, in <module>
main(args)
File "/home/wangshu/projects/FedGen/main.py", line 42, in main
run_job(args, i)
File "/home/wangshu/projects/FedGen/main.py", line 37, in run_job
server.train(args)
File "/home/wangshu/projects/FedGen/FLAlgorithms/servers/serverpFedGen.py", line 78, in train
self.evaluate()
File "/home/wangshu/projects/FedGen/FLAlgorithms/servers/serverbase.py", line 226, in evaluate
test_ids, test_samples, test_accs, test_losses = self.test(selected=selected)
File "/home/wangshu/projects/FedGen/FLAlgorithms/servers/serverbase.py", line 165, in test
ct, c_loss, ns = c.test()
File "/home/wangshu/projects/FedGen/FLAlgorithms/users/userbase.py", line 137, in test
loss += self.loss(output, y)
File "/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/modules/loss.py", line 216, in forward
return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/wangshu/miniconda3/envs/pt/lib/python3.9/site-packages/torch/nn/functional.py", line 2388, in nll_loss
ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
IndexError: Target 25 is out of bounds.
(pt) wangshu@ubuntu:~/projects/FedGen$
Pythorch 1.8.1, python 3.9.4.
The performance of FedAvg is not as good as FedGen simply because the Trainloader does not have a shuffle. After fixing the bugs Fedgen is not as effective as Fedavg.
[ Start training iteration 0 ]
Creating model for mnist
Network configs: [6, 16, 'F']
Algorithm FedDistll-FL has not been implemented.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.