litian96 / fedprox Goto Github PK
View Code? Open in Web Editor NEWFederated Optimization in Heterogeneous Networks (MLSys '20)
License: MIT License
Federated Optimization in Heterogeneous Networks (MLSys '20)
License: MIT License
Hi,
I studied your paper/code and I am trying to obtain \nabla h_k(w_t, w_t) to use as a local optimization criteria. In the fedprox
and pgd
codes, it is not clear to me where the gradients \nabla h_k(w, w_t) are evaluated. Could you help me with this?
If I can understand where these gradients are evaluated, I could simply pass w_t, the self.latest_model
in fedprox, to this function instead of using the local model.
Best Regards,
Mairton
The following lines are probably out of date,
FedProx/flearn/trainers/fedavg.py
Lines 72 to 74 in d9cdfdd
Update with
stats = self.test()
stats_train = self.train_error_and_loss()
Hi there,
I do see you have the option of using CNN on the MNIST dataset. But I don't see the implementation in the model.
Would you provide it later?
BTW, I was also on ICML this year, but was unable to attend the poster session. Would you put your poster on your homepage as well?
H.
Hi there, I am wondering is there a PyTorch version of FedProx?
After running this: !python experiments/centralized/moleculenet/molecule_classification_multilabel.py
Getting this Error Message:
Traceback (most recent call last):
File "experiments/centralized/moleculenet/molecule_classification_multilabel.py", line 11, in
from data_preprocessing.molecule.data_loader import get_dataloader, get_data
File "/content/drive/My Drive/Colab Notebooks/FedGraphNN/data_preprocessing/molecule/data_loader.py", line 12, in
from FedML.fedml_core.non_iid_partition.noniid_partition import partition_class_samples_with_dirichlet_distribution
ModuleNotFoundError: No module named 'FedML'
I would like to use parts of the code of FedProx
for my dissertation. In particular, I would like to use (https://github.com/litian96/FedProx/blob/master/data/synthetic_1_1/generate_synthetic.py)[generate_synthetic.py) to generate my own synthetic federated datasets.
Unfortunately, FedProx
does not seem to have a license. This means that it has all rights reserved, and I am not allowed to use its code.
Can FedProx
add an appropriate software license to this repository?
According to Algorithm 2, there is a parameter gamma in the input which measures how much local
computation is performed to solve the local subproblem on device k at the t-th round.
But I can't find gamma in the code implemention.
In (https://github.com/litian96/FedProx/blob/master/flearn/models/mnist/mclr.py) there is only a variable num_epochs.
def solve_inner(self, data, num_epochs=1, batch_size=32):
'''Solves local optimization problem'''
for _ in trange(num_epochs, desc='Epoch: ', leave=False, ncols=120):
for X, y in batch_data(data, batch_size):
with self.graph.as_default():
self.sess.run(self.train_op,
feed_dict={self.features: X, self.labels: y})
soln = self.get_params()
comp = num_epochs * (len(data['y'])//batch_size) * batch_size * self.flops
return soln, comp
So could please help me find gamma?
What python version are you using? I use python3.6 and some packages do not support import, so I switched to 3.5 because 3.5 has been deprecated and the dependencies cannot be downloaded.
Hi,
Is there a TFF implementation of your algorithm?
Hi, I got this problem over Mac OS and windows:
~ % pip install tensorflow-gpu==1.10
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.10 (from versions: none)
ERROR: No matching distribution found for tensorflow-gpu==1.10
same with pip3
Did I missed anything?
Thanks
I studied your paper and code, and I'm troubled to clearly understand to what extent PerturbedGradientDescent
optimizer contributes to your work. If you kindly explain to me what the following line does in your code, It may clear things up for me.
FedProx/flearn/trainers/fedprox.py
Line 55 in 0f9c2e8
Thank you in advance.
Could you clarify how do you calculate the avg_gradient that is given as input in the pggd file? I could not find the script that imports this file.
https://github.com/litian96/FedProx/blob/master/flearn/optimizer/pggd.py#L56
Thank you
In my_sample.py
file for generating FEMNIST data, the <
seems should be >
in line. Otherwise, the retrieved samples will be the same for the same class at the beginning. I checked the data files shared by google drive. There are indeed several same images for the identical class, each user.
FedProx/flearn/trainers/fedbase.py
Line 17 in d2a4501
Please take a look at this line. It seems that all clients are using the same ML model for local training. In other words, there is no local model, but a global model which is sequentially trained on each client.
This can be verified by the following code snippet (I have tested it on flearn/trainers/fedavg.py).
csolns = [] # buffer for receiving client solutions
lastc = None
for idx, c in enumerate(active_clients.tolist()): # simply drop the slow devices
print(i, idx)
if lastc is not None:
for j in range(len(lastc)):
print('Is the parameters of the current client (before training) the same as the parameters of the previous client (after training)?: %s' % (c.get_params()[j] == lastc[j]).all())
from time import sleep
sleep(1)
else:
print('The first client.')
# communicate the latest model
c.set_params(self.latest_model)
# solve minimization locally
soln, stats = c.solve_inner(num_epochs=self.num_epochs, batch_size=self.batch_size)
lastc = c.get_params()
# gather solutions from client
csolns.append(soln)
# track communication cost
self.metrics.update(rnd=i, cid=c.id, stats=stats)
# update models
self.latest_model = self.aggregate(csolns)
In my opinion, this is not expected for federated learning.
Dear Tian:
when i run below on CPU:
python3 -u main.py --dataset='sent140' --optimizer='fedprox'
--learning_rate=0.01 --num_rounds=200 --clients_per_round=10
--mu=0 --eval_every=1 --batch_size=10
--num_epochs=1
--model='stacked_lstm' | tee logs/‘logs_sent140_mu0_E1_fedprox’
it runs very very slow,and the worst is the outputs are the same numbers! Result is below:
5726 Clients in Total
Training with 10 workers ---
At round 0 accuracy: 0.4060871469235822
At round 0 training accuracy: 0.40770690942001303
At round 0 training loss: 0.6931471925528921
gradient difference: 0.3779687893000023
At round 1 accuracy: 0.5939128530764178
At round 1 training accuracy: 0.5922930905799869
At round 1 training loss: 0.682659032131717
gradient difference: 0.6406151359028104
At round 2 accuracy: 0.4060871469235822
At round 2 training accuracy: 0.40770690942001303
At round 2 training loss: 0.6951613189004014
gradient difference: 1.0240842395041418
At round 3 accuracy: 0.5939128530764178
At round 3 training accuracy: 0.5922930905799869
At round 3 training loss: 0.6845133630735032
gradient difference: 1.334649037607692
At round 4 accuracy: 0.4060871469235822
At round 4 training accuracy: 0.40770690942001303
At round 4 training loss: 0.7872438000397856
gradient difference: 3.8706158347478246
At round 5 accuracy: 0.5939128530764178
At round 5 training accuracy: 0.5922930905799869
At round 5 training loss: 0.676954747225743
gradient difference: 2.8532703690523324
At round 6 accuracy: 0.4060871469235822
At round 6 training accuracy: 0.40770690942001303
At round 6 training loss: 0.6952778442305486
gradient difference: 2.9297919740883964
At round 7 accuracy: 0.5939128530764178
At round 7 training accuracy: 0.5922930905799869
At round 7 training loss: 0.7021283723042158
gradient difference: 4.2864026772781
At round 8 accuracy: 0.5939128530764178
At round 8 training accuracy: 0.5922930905799869
At round 8 training loss: 0.6761318949424154
gradient difference: 4.987087255237341
At round 9 accuracy: 0.4060871469235822
At round 9 training accuracy: 0.40770690942001303
At round 9 training loss: 0.8113437744137745
gradient difference: 9.235964830922306
At round 10 accuracy: 0.5939128530764178
At round 10 training accuracy: 0.5922930905799869
At round 10 training loss: 0.7755919640498169
gradient difference: 6.982072813031079
At round 11 accuracy: 0.5939128530764178
At round 11 training accuracy: 0.5922930905799869
At round 11 training loss: 0.7091725448816267
gradient difference: 6.115867566149534
At round 12 accuracy: 0.5939128530764178
At round 12 training accuracy: 0.5922930905799869
At round 12 training loss: 0.7398191231275261
gradient difference: 7.72441549160035
At round 13 accuracy: 0.5939128530764178
At round 13 training accuracy: 0.5922930905799869
At round 13 training loss: 1.0417891773572328
gradient difference: 15.32712477985914
And the same result happened when i run shakespeare.
But mnist and nist performs good.
how can i solve this? is there something wrong of stacked_lstm?
Good job. I have read the paper and codes. I have some questions:
Thank you.
hey~
when run main.py
have an error:
Traceback (most recent call last):
File "", line 1, in
runfile('C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py', wdir='C:/Users/Administrator/Desktop/federated learning/code/FedProx-master')
File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
execfile(filename, namespace)
File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 130, in
main()
File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 118, in main
options, learner, optimizer = read_options()
File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 94, in read_options
mod = importlib.import_module(model_path)
File "E:\anaconda\Anaconda\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 978, in _gcd_import
File "", line 961, in _find_and_load
File "", line 936, in _find_and_load_unlocked
File "", line 205, in _call_with_frames_removed
File "", line 978, in _gcd_import
File "", line 961, in _find_and_load
File "", line 948, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'flearn.models.nist.stacked_lstm'
And flearn.models.nist.stacked_lstm does not exist exactly. Why?
Thank you so much.
FedProx, only the performance about clients' own testset was concerned, without global testset. I know that sometimes personalized FL concerns about client's testset, but why not do we compare personialized FL with local train in clients?
If we just concern about clients' own testset, I think comparsion experiments with the profermance of local train are necessary.
Hi,
Is there a pytorch version for FedProx?
Best regards
Dear Tian:
Thank you very much for your code, I have a question:
Whether the difference calculation need to consider the proportion of data rather than simply adding?
44~45, https://github.com/litian96/FedProx/blob/master/flearn/trainers/fedprox.py
Thank you.
Does the current implementation provide the option for heuristic μ as discussed in "C.3.3 Adaptively setting μ" from https://arxiv.org/pdf/1812.06127.pdf?
We decrease μ by 0.1 when the loss continues to decrease for 5 rounds and increase μ by 0.1 when we see the loss increase.
I assume that you mean that you use the same μ for all clients, and that you refer to the global loss, right?
Thank you
Hi, I read your paper and code, and this work has inspired me a lot in my work on Federated Learning Optimization. I am trying to reproduce FedProx using PyTorch and I am confused on a small detail. In the algorithm in the paper, the local client model seems to have no replacement operation, i.e. w_k^t=w^t$
But when I read your code, I found that there is actually a REPLACE operation.
FedProx/flearn/trainers/fedprox.py
Lines 77 to 78 in d2a4501
And I also found similar operations in a PyTorch replication repo's.
FedMA
https://github.com/IBM/FedMA/blob/4b586a5a22002dc955d025b890bc632daa3c01c7/main.py#L863-L883
Q1: Actually, should I use this aggregated model to replace the local client model after aggregation?
Q2: When not replacing, it can be interpreted as the local model
If I have misunderstood something, please let me know. I look forward to hearing from you.
Please update the Google Drive link to MNIST, it has expired now.
I check the loss function and others, but I fail to know how you use proximal term
Thanks for the work :)
I have read the code, and corresponding issues #10, but there are some places I still feel inconsistent with the paper. Please correct me if I am wrong.
In the Algorithm 2 line 7, we are calculating the norm between local model and global model. But the code is using l2 norm for local model without considering global model. Taks mnist/mclr.py line 40 for example.
I also checked the NLP experiments Shakespare, but I didn't find the regularization part in create_model. shakespare/stacked_lstm.py create_model
Thank you!
Can you put the right version of the module to requirements.txt
? Or it will download the latest version.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.