litian96 / fedprox Goto Github PK

View Code? Open in Web Editor NEW

636.0 636.0 157.0 30.1 MB

Federated Optimization in Heterogeneous Networks (MLSys '20)

License: MIT License

Python 93.09% Shell 6.91%

distributed-optimization federated-optimization large-scale-learning parallel-learning

fedprox's People

Contributors

Stargazers

Watchers

Forkers

yonggucheng chunhuizng njumathdy ayeps gggwb commandsecurity xwcbigboy zhangce ds3lab marxwolf lianzhuotao zyl9737 mnoumanabbasi kundjanasith neilliang90 huangjie-nn trandinhhieu1989 guobbin akaanirban oujin ahmedcs zhangjunyyyyy yannfra omarfoq bruinxiong cn2778611263 tzq2doc ayushm-agrawal byzhang locardsx cequencer jasonyanglu illidanlab 13301338176 pooyadav honglin1997 heibaipei kelenlv franciszchen thu-syh justoa chamathpali ebumsuk allenfeizz jz9888 liuluyeah walternie anjanaw 8tiqa truongscotl tuanminh926 rimonece zenghui9977 mingt2019 hongshenghu xiaoyuwang2821 monthfall pterhang xiby kouda-amine baoxueli realrui shobhitdubey0729 hungnphan cpaulzyf jgshu ferdous09 renpuliu zhlinup yinaghe shiehshieh 3025066980 yhyeh czx-yui weimingwill zaoxing aishwariyachakraborty amitport sudo-u-hung-nn yuge-byte mengfanwu96 tailinzhou edykristianto tanqiao2 moon-kyungyong blesslord tranqu1l eugeneyuz tianmm866 ed-fish txsing zhangzhizheng han0806 ms116 782169620 davidyyang hongdawu1226 siamesecatk ymjs-irfan dajun2gaigai

fedprox's Issues

Obtain \nabla h_k(w_t, w_t) in FedProx

Hi,

I studied your paper/code and I am trying to obtain \nabla h_k(w_t, w_t) to use as a local optimization criteria. In the fedprox and pgd codes, it is not clear to me where the gradients \nabla h_k(w, w_t) are evaluated. Could you help me with this?

If I can understand where these gradients are evaluated, I could simply pass w_t, the self.latest_model in fedprox, to this function instead of using the local model.

Best Regards,
Mairton

TOFIX: 'Server' object has no attribute 'train_error'

The following lines are probably out of date,

FedProx/flearn/trainers/fedavg.py

Lines 72 to 74 in d9cdfdd

    
           stats = self.test() 
        
           stats_train = self.train_error() 
        
           stats_loss = self.train_loss()

Update with

stats = self.test()
stats_train = self.train_error_and_loss()

No model of "'mnist.cnn"

Hi there,
I do see you have the option of using CNN on the MNIST dataset. But I don't see the implementation in the model.
Would you provide it later?

BTW, I was also on ICML this year, but was unable to attend the poster session. Would you put your poster on your homepage as well?

Is Pytorch version of FedProx avaliable?

Hi there, I am wondering is there a PyTorch version of FedProx?

ModuleNotFoundError: No module named 'FedML'

After running this: !python experiments/centralized/moleculenet/molecule_classification_multilabel.py

Getting this Error Message:
Traceback (most recent call last):
File "experiments/centralized/moleculenet/molecule_classification_multilabel.py", line 11, in
from data_preprocessing.molecule.data_loader import get_dataloader, get_data
File "/content/drive/My Drive/Colab Notebooks/FedGraphNN/data_preprocessing/molecule/data_loader.py", line 12, in
from FedML.fedml_core.non_iid_partition.noniid_partition import partition_class_samples_with_dirichlet_distribution
ModuleNotFoundError: No module named 'FedML'

License?

I would like to use parts of the code of FedProx for my dissertation. In particular, I would like to use (https://github.com/litian96/FedProx/blob/master/data/synthetic_1_1/generate_synthetic.py)[generate_synthetic.py) to generate my own synthetic federated datasets.

Can FedProx add an appropriate software license to this repository?

Where is the gamma in the code implemetion?

According to Algorithm 2, there is a parameter gamma in the input which measures how much local
computation is performed to solve the local subproblem on device k at the t-th round.

But I can't find gamma in the code implemention.
In (https://github.com/litian96/FedProx/blob/master/flearn/models/mnist/mclr.py) there is only a variable num_epochs.
def solve_inner(self, data, num_epochs=1, batch_size=32):
'''Solves local optimization problem'''
for _ in trange(num_epochs, desc='Epoch: ', leave=False, ncols=120):
for X, y in batch_data(data, batch_size):
with self.graph.as_default():
self.sess.run(self.train_op,
feed_dict={self.features: X, self.labels: y})
soln = self.get_params()
comp = num_epochs * (len(data['y'])//batch_size) * batch_size * self.flops
return soln, comp
So could please help me find gamma?

python version

What python version are you using? I use python3.6 and some packages do not support import, so I switched to 3.5 because 3.5 has been deprecated and the dependencies cannot be downloaded.

TensorFlow Federated Implementation

Hi,

Is there a TFF implementation of your algorithm?

Tensorflow installation

Hi, I got this problem over Mac OS and windows:

~ % pip install tensorflow-gpu==1.10
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.10 (from versions: none)
ERROR: No matching distribution found for tensorflow-gpu==1.10

same with pip3

Did I missed anything?
Thanks

what is the exact role of PerturbedGradientDescent in FedProx

I studied your paper and code, and I'm troubled to clearly understand to what extent PerturbedGradientDescent optimizer contributes to your work. If you kindly explain to me what the following line does in your code, It may clear things up for me.

FedProx/flearn/trainers/fedprox.py

Line 55 in 0f9c2e8

self.inner_opt.set_params(self.latest_model, self.client_model)

Thank you in advance.

PGGD

Could you clarify how do you calculate the avg_gradient that is given as input in the pggd file? I could not find the script that imports this file.
https://github.com/litian96/FedProx/blob/master/flearn/optimizer/pggd.py#L56
Thank you

The FEMNIST data generation

In my_sample.py file for generating FEMNIST data, the < seems should be > in line. Otherwise, the retrieved samples will be the same for the same class at the beginning. I checked the data files shared by google drive. There are indeed several same images for the identical class, each user.

All clients are sharing the same underlying learner.

FedProx/flearn/trainers/fedbase.py

Line 17 in d2a4501

self.clients = self.setup_clients(dataset, self.client_model)

Please take a look at this line. It seems that all clients are using the same ML model for local training. In other words, there is no local model, but a global model which is sequentially trained on each client.

This can be verified by the following code snippet (I have tested it on flearn/trainers/fedavg.py).

            csolns = []  # buffer for receiving client solutions

            lastc = None
            for idx, c in enumerate(active_clients.tolist()):  # simply drop the slow devices
                print(i, idx)
                if lastc is not None:
                  for j in range(len(lastc)):
                    print('Is the parameters of the current client (before training) the same as the parameters of the previous client (after training)?: %s' % (c.get_params()[j] == lastc[j]).all())
                  from time import sleep
                  sleep(1)
                else:
                  print('The first client.')
                # communicate the latest model
                c.set_params(self.latest_model)

                # solve minimization locally
                soln, stats = c.solve_inner(num_epochs=self.num_epochs, batch_size=self.batch_size)
                lastc = c.get_params()

                # gather solutions from client
                csolns.append(soln)

                # track communication cost
                self.metrics.update(rnd=i, cid=c.id, stats=stats)

            # update models
            self.latest_model = self.aggregate(csolns)

In my opinion, this is not expected for federated learning.

problems when run shakespeare and sent140

Dear Tian:
when i run below on CPU:
python3 -u main.py --dataset='sent140' --optimizer='fedprox'
--learning_rate=0.01 --num_rounds=200 --clients_per_round=10
--mu=0 --eval_every=1 --batch_size=10
--num_epochs=1
--model='stacked_lstm' | tee logs/‘logs_sent140_mu0_E1_fedprox’

it runs very very slow,and the worst is the outputs are the same numbers! Result is below:
5726 Clients in Total
Training with 10 workers ---
At round 0 accuracy: 0.4060871469235822
At round 0 training accuracy: 0.40770690942001303
At round 0 training loss: 0.6931471925528921
gradient difference: 0.3779687893000023
At round 1 accuracy: 0.5939128530764178
At round 1 training accuracy: 0.5922930905799869
At round 1 training loss: 0.682659032131717
gradient difference: 0.6406151359028104
At round 2 accuracy: 0.4060871469235822
At round 2 training accuracy: 0.40770690942001303
At round 2 training loss: 0.6951613189004014
gradient difference: 1.0240842395041418
At round 3 accuracy: 0.5939128530764178
At round 3 training accuracy: 0.5922930905799869
At round 3 training loss: 0.6845133630735032
gradient difference: 1.334649037607692
At round 4 accuracy: 0.4060871469235822
At round 4 training accuracy: 0.40770690942001303
At round 4 training loss: 0.7872438000397856
gradient difference: 3.8706158347478246
At round 5 accuracy: 0.5939128530764178
At round 5 training accuracy: 0.5922930905799869
At round 5 training loss: 0.676954747225743
gradient difference: 2.8532703690523324
At round 6 accuracy: 0.4060871469235822
At round 6 training accuracy: 0.40770690942001303
At round 6 training loss: 0.6952778442305486
gradient difference: 2.9297919740883964
At round 7 accuracy: 0.5939128530764178
At round 7 training accuracy: 0.5922930905799869
At round 7 training loss: 0.7021283723042158
gradient difference: 4.2864026772781
At round 8 accuracy: 0.5939128530764178
At round 8 training accuracy: 0.5922930905799869
At round 8 training loss: 0.6761318949424154
gradient difference: 4.987087255237341
At round 9 accuracy: 0.4060871469235822
At round 9 training accuracy: 0.40770690942001303
At round 9 training loss: 0.8113437744137745
gradient difference: 9.235964830922306
At round 10 accuracy: 0.5939128530764178
At round 10 training accuracy: 0.5922930905799869
At round 10 training loss: 0.7755919640498169
gradient difference: 6.982072813031079
At round 11 accuracy: 0.5939128530764178
At round 11 training accuracy: 0.5922930905799869
At round 11 training loss: 0.7091725448816267
gradient difference: 6.115867566149534
At round 12 accuracy: 0.5939128530764178
At round 12 training accuracy: 0.5922930905799869
At round 12 training loss: 0.7398191231275261
gradient difference: 7.72441549160035
At round 13 accuracy: 0.5939128530764178
At round 13 training accuracy: 0.5922930905799869
At round 13 training loss: 1.0417891773572328
gradient difference: 15.32712477985914

And the same result happened when i run shakespeare.
But mnist and nist performs good.
how can i solve this? is there something wrong of stacked_lstm?

Are the procedures of weights aggregation and clients selection correct?

Good job. I have read the paper and codes. I have some questions:

Weights aggregation didn't scale the sum of each weight by 1/K in codes: fedbase.py line 115
As you mention in paper, the target clients are chosen by their probability (p_k). But I found the relevant codes in fedbase.py line98 just uniformly chose the clients which is violated to the FedProx pseudo codes.

Thank you.

No module named 'flearn.models.nist.stacked_lstm'

hey~
when run main.py
have an error:
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py', wdir='C:/Users/Administrator/Desktop/federated learning/code/FedProx-master')

File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
execfile(filename, namespace)

File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 130, in
main()

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 118, in main
options, learner, optimizer = read_options()

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 94, in read_options
mod = importlib.import_module(model_path)

File "E:\anaconda\Anaconda\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)

File "", line 978, in _gcd_import

File "", line 961, in _find_and_load

File "", line 936, in _find_and_load_unlocked

File "", line 205, in _call_with_frames_removed

File "", line 978, in _gcd_import

File "", line 961, in _find_and_load

File "", line 948, in _find_and_load_unlocked

ModuleNotFoundError: No module named 'flearn.models.nist.stacked_lstm'

And flearn.models.nist.stacked_lstm does not exist exactly. Why?
Thank you so much.

about personalized FL

FedProx, only the performance about clients' own testset was concerned, without global testset. I know that sometimes personalized FL concerns about client's testset, but why not do we compare personialized FL with local train in clients?
If we just concern about clients' own testset, I think comparsion experiments with the profermance of local train are necessary.

Does this project have a pytorch version?

Hi,

Is there a pytorch version for FedProx?

Best regards

Whether the difference calculation need to consider the proportion of the sample

Dear Tian:
Thank you very much for your code, I have a question:
Whether the difference calculation need to consider the proportion of data rather than simply adding?
44~45, https://github.com/litian96/FedProx/blob/master/flearn/trainers/fedprox.py
Thank you.

Dynamic μ

Does the current implementation provide the option for heuristic μ as discussed in "C.3.3 Adaptively setting μ" from https://arxiv.org/pdf/1812.06127.pdf?

We decrease μ by 0.1 when the loss continues to decrease for 5 rounds and increase μ by 0.1 when we see the loss increase.

I assume that you mean that you use the same μ for all clients, and that you refer to the global loss, right?

Thank you

Should the global model replace the client model?

Hi, I read your paper and code, and this work has inspired me a lot in my work on Federated Learning Optimization. I am trying to reproduce FedProx using PyTorch and I am confused on a small detail. In the algorithm in the paper, the local client model seems to have no replacement operation, i.e. w_k^t=w^t$

But when I read your code, I found that there is actually a REPLACE operation.

FedProx/flearn/trainers/fedprox.py

Lines 77 to 78 in d2a4501

    
           self.latest_model = self.aggregate(csolns) 
        
           self.client_model.set_params(self.latest_model)

And I also found similar operations in a PyTorch replication repo's.

FedMA

https://github.com/IBM/FedMA/blob/4b586a5a22002dc955d025b890bc632daa3c01c7/main.py#L863-L883

Q1: Actually, should I use this aggregated model to replace the local client model after aggregation?

Q2: When not replacing, it can be interpreted as the local model $w_k^t$ trying to approximate the global model $w^t$. From another point of view, does it count to alleviate the catastrophic forgetting problem?

If I have misunderstood something, please let me know. I look forward to hearing from you.

The Google Drive link to MNIST dataset is expired.

Please update the Google Drive link to MNIST, it has expired now.

AttributeError: 'Server' object has no attribute 'client_model'

After running this "bash run_fedprox.sh synthetic_iid 0 1 | tee log_synthetic/synthetic_iid_client10_epoch20_mu1"

how should i solve this problem？

where reflects the proximal term in you programme

I check the loss function and others, but I fail to know how you use proximal term

Algorithm 2 inconsistent with code

Thanks for the work :)
I have read the code, and corresponding issues #10, but there are some places I still feel inconsistent with the paper. Please correct me if I am wrong.

In the Algorithm 2 line 7, we are calculating the norm between local model and global model. But the code is using l2 norm for local model without considering global model. Taks mnist/mclr.py line 40 for example.
I also checked the NLP experiments Shakespare, but I didn't find the regularization part in create_model. shakespare/stacked_lstm.py create_model

Thank you!

	stats = self.test()
	stats_train = self.train_error()
	stats_loss = self.train_loss()

	self.latest_model = self.aggregate(csolns)
	self.client_model.set_params(self.latest_model)