Coder Social home page Coder Social logo

mit-han-lab / once-for-all Goto Github PK

View Code? Open in Web Editor NEW
1.8K 53.0 332.0 6.99 MB

[ICLR 2020] Once for All: Train One Network and Specialize it for Efficient Deployment

Home Page: https://ofa.mit.edu/

License: MIT License

Python 78.01% Shell 0.02% Jupyter Notebook 21.97%
tinyml edge-ai efficient-model acceleration nas automl

once-for-all's People

Contributors

han-cai avatar jpablomch avatar kentang-mit avatar lmxyy avatar lyken17 avatar mzahran001 avatar songhan avatar synxlin avatar usedtobe97 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

once-for-all's Issues

About training of once-for-all network

Hi, thanks for your great work!
I am interested in training the once-for-all network but I met some problems when diving into your training code.
Line 198 in train_ofa_net.py loads a teacher model weights. Is this a trained teacher model and the training code only performs the progressive shrinking?
Besides, the arguments arg.task and arg.phase seem never changing during the training. Am I right? If it is, so I need to train multiple times with different arguments?
Thanks.

Some questions about accuracy predictor

Hi, I'm very interested in your works.

I want to use accuracy predictor about some other config. (like resnet based OFA ... and some)

I saw some tutorial codes about acc_predictor you uploaded, so I could understand how it look likes.
And I saw your paper's appendix about detail of accuracy predictor.

I have a question about how much train data for training accuracy predictor.

And when you were training acc_predictor , there are ground truths you measured using whole imagenet valid set.

How many ground truth are needed?

Also, I want to know hyper parameters about accuracy predictor's training.

I hope you answer my questions, Thank you

How to measure the latency correctly?

Hi, Thanks for your great work!
When I was testing the latency on V100, the results confused me.
I used the following code to measure the latency table.
torch.cuda.empty_cache() img_L = img_L.cuda() start.record() out = ofa_network(img_L) end.record() torch.cuda.synchronize() run_time.update(start.elapsed_time(end))
The img_L is one image.
Is this correct?

Question about GPU Memory for OFA Progressive Shrinking.

Hi Han Cai, Thank you so much for responding about previous question very quickly.

I'm trying to train OFA Model(ofa_mbv3) using 4 Nvidia Titan V & 2 Titan RTX GPUs.

But There's a problems when validating subnet models.

I checked below code about progressive shrinking validate.

for setting, name in subnet_settings:
    run_manager.write_log('-' * 30 + ' Validate %s ' % name + '-' * 30, 'train', should_print=False)
    run_manager.run_config.data_provider.assign_active_img_size(setting.pop('image_size'))
    dynamic_net.set_active_subnet(**setting)
    run_manager.write_log(dynamic_net.module_str, 'train', should_print=False)

    run_manager.reset_running_statistics(dynamic_net)
    loss, top1, top5 = run_manager.validate(epoch=epoch, is_test=is_test, run_str=name, net=dynamic_net)
    losses_of_subnets.append(loss)
    top1_of_subnets.append(top1)
    top5_of_subnets.append(top5)
    valid_log += '%s (%.3f), ' % (name, top1)

Validating 1st loop (about 1st subnet) is no problem.
But when I try to validate 2nd subnet, Error("CUDA out of memory") happened.

My GPUs have 12GB(Titan V) , 24GB(Titan RTX) Memories each.

How big is your GPU memory?
Also, please let me know if there is any guessing or recommendation to solve this error.

Thank you so much.

args.valid_size is wrong?

Looks like args.valid_size in train_ofa_net.py is set to 10000. Is that right? Seems to me that target size is much smaller than that (~200)

once-for-all/ofa/imagenet_classification/data_providers/base_provider.py", line 42, in random_sample_valid_set
assert train_size > valid_size

what does MACs mean?

Soryy to ask such a simple question, but I can not find the solution anywhere. Could anyone help me ?

Question regarding implementation detail - re_organize_middle_weights

In channel selection for width control, the function re_organize_middle_weights in dynamic_layers. In line 144, the following operation is applied - importance[target_width:] = torch.arange(0, target_width - importance.size(0), -1).
I don't really understand this line. If importance is assumed to be sorted then it does nothing to the order of importance. If it is not - then important channels can effectively be discarded.
What am I missing?

Error when run train_ofa_net.py

Hi, this project is an excellent work about NAS. I am very interested in it and try it on my machine. But I get the following problem when running 'horovodrun -np 4 -H localhost:4 python train_ofa_net.py':


[1,1]:Traceback (most recent call last):
[1,1]: File "train_ofa_net.py", line 194, in
[1,1]: distributed_run_manager.broadcast()
[1,1]: File "/home/xiaobingt/xueshengke/code/once-for-all/ofa/imagenet_codebase/run_manager/distributed_run_manager.py", line 183, in broadcast
[1,1]: hvd.broadcast_parameters(self.net.state_dict(), 0)
[1,1]: File "/home/xiaobingt/horovod/env/lib/python3.7/site-packages/horovod/torch/init.py", line 476, in broadcast_parameters
[1,1]: handle = broadcast_async_(p, root_rank, name)
[1,1]: File "/home/xiaobingt/horovod/env/lib/python3.7/site-packages/horovod/torch/mpi_ops.py", line 449, in broadcast_async_
[1,1]: return _broadcast_async(tensor, tensor, root_rank, name)
[1,1]: File "/home/xiaobingt/horovod/env/lib/python3.7/site-packages/horovod/torch/mpi_ops.py", line 359, in _broadcast_async
[1,1]: tensor, output, root_rank, name.encode() if name is not None else _NULL)
[1,1]:RuntimeError: Internal error. Requested ReadyEvent with GPU device but not compiled with CUDA.

It seems this issuse comes from my horovod. But I have installed successfully 'horovod' and can run examples without error. I also googled but no soluntion has been found yet. Can you help me?

Here is my environment:

  • Cudnn 7.6.5
  • Cudatoolkit 10.1.243
  • Openmpi 4.0.5
  • Python 3.7.8
  • Pytorch 1.5.1
  • Tensorflow-gpu 2.1.1

Bug for the implementation of knowledge distillation?

Thanks for sharing your code!
I'm wondering if this is a bug for the implementation of knowledge distillation.
Since the cross_entropy_loss_with_soft_target already use nn.LogSoftmax,

logsoftmax = nn.LogSoftmax()

Does it need to apply softmax on soft_logits here again? Thanks!

soft_label = F.softmax(soft_logits, dim=1)

What should I do after train_ofa_net

I run train_ofa_net.py and there is three folders under 'exp/': 'kernel2kernel_depth', 'kernel_depth2kernel_depth_width', 'normal2kernel'. Then, what should I do next? There are 'checkpoint logs net.config net_info.txt run.config' under each exp subfolder after training. Anybody knows how should I deal with it?

I can not find any relations between the training exp results and 'eval_ofa_net.py'. Please help this poor kid. \doge

How to deploy to mobile?

Thanks for great work! This code uses a pytorch model but you mention that the models are deployed on mobile in tf-lite, do you convert a pytorch model with ONNX or implement it in tensorflow separately?

channel sorting for elastic width

Hi, thx for your work.
In the paper, for supporting elastic width, a channel sorting algorithm based on the norm of each channel was introduced. However, i can't find this part in the codes. Could anyone tell me about its location?

top5 performance

Hi and thanks for the amazing work,

What's the top5 accuracy on ImageNet of the model that achieved top1=80% reported in the paper?
This would help for my literature review where I only have top5 for some models.

Thanks,
Boris

subnet重训练代码

project里似乎只有supernet的训练代码,子网的重训练代码请问是否能提供?

Questions about training supernet

Hi,

Thanks for your time regarding to this issue.

I have some questions about OFA supernet training phase.

  1. Will performance of supernet always surpass the performance of original model?

  2. How should we modify the hyper parameter setting from original model task (LR, optimizer type)?

  3. Is the performance of supernet the ceil of performances of subnets?

Thanks for your help and happy Chinese New Year!

Question about the calculation of importance(L1 Norm)

Thank you for your great job.

I have a question about the calculation of importance.
Here in Once for all, the importance is calculated by the input dimension.

importance = torch.sum(torch.abs(self.point_linear.conv.conv.weight.data), dim=(0, 2, 3))

But in Pruning_filters_for_efficient_convnets, the importance is calculated by the output dimension.

https://github.com/tyui592/Pruning_filters_for_efficient_convnets/blob/00ec7b7ae9e8f9bd3973888590728477e73537d9/prune.py#L69

sum_of_kernel = torch.sum(torch.abs(kernel.view(kernel.size(0), -1)), dim=1)

Is there any intrinsic reason to calculated by the input dimension?

Thanks!

Question about progressive shrinking

Greetings
There is a function re_organize_middle_weights which resort the convolution weight. However, the sequence of x remain the same after this operation.
Thus, the weight is misordering to input x. Mismatch of weight and input will cause output changes. Is this a big problem?

In set_running_statistics, CPU is used by default to forward images

forward_model is created by deep copying incoming model
However, it's not deployed in any gpu devices.
It's time-consuming to calculate mean and variance by forwarding batch of images using cpu.
I think it's better to assign default device and deploy the copied one on it.

Evolution details

hi, thanks for your excellent work

How did the network architecture be encoded and decoded during the evolution?

After reading the description of the acc predictor in the paper, it seems that the kernel size and expansion of each layer are first ecoded. If a architecture is [3,4, ....., 0,0 ..... 3,6], another architecture is [3,4, ....., 7,4 ..... 3,6], there are two question in evolution:

  1. What if [0,0] and [7,4] crossover [0,4]? This is not a normal gene.

  2. If one stage is [1,1,0,0], the last two are skipped. If mutation is [1,1,0,1] during the evolution process, which the last layer is not skipped, but the third layer is skipped. (which is not in line with the rules.)

What is the role of 'reset_running_statistics' ?

On line 67 of progressive_shrinking.py, why do we need the 'reset_running_statistics' function to reset both the 'mean' and 'var' value of the batchnormal layer to the 'mean' and 'var' obtained from random 2000 images?
run_manager.reset_running_statistics(dynamic_net)

When I use ofa_resnet50 to Efficient Deployment in tutorial/ofa.ipynb, I met some errors.

  1. first, I searched a network

Searching with note10 constraint (25): 100%|██████████| 500/500 [00:09<00:00, 51.03it/s]Found best architecture on note10 with latency <= 25.00 ms in 9.84 seconds! It achieves 81.71% predicted accuracy with 24.73 ms latency on note10.
Architecture of the searched sub-net:
DyConv(O32, K3, S2)
(DyConv(O32, K3, S1), Identity)
DyConv(O64, K3, S1)
max_pooling(ks=3, stride=2)
(3x3_BottleneckConv_in->768->256_S1, avgpool_conv)
(3x3_BottleneckConv_in->768->256_S1, Identity)
(3x3_BottleneckConv_in->1536->256_S1, Identity)
(3x3_BottleneckConv_in->768->256_S1, Identity)
(3x3_BottleneckConv_in->2048->512_S2, avgpool_conv)
(3x3_BottleneckConv_in->2048->512_S1, Identity)
(3x3_BottleneckConv_in->3072->512_S1, Identity)
(3x3_BottleneckConv_in->2048->512_S1, Identity)
(3x3_BottleneckConv_in->6144->1024_S2, avgpool_conv)
(3x3_BottleneckConv_in->3072->1024_S1, Identity)
(3x3_BottleneckConv_in->4096->1024_S1, Identity)
(3x3_BottleneckConv_in->6144->1024_S1, Identity)
(3x3_BottleneckConv_in->4096->1024_S1, Identity)
(3x3_BottleneckConv_in->4096->1024_S1, Identity)
(3x3_BottleneckConv_in->8192->2048_S2, avgpool_conv)
(3x3_BottleneckConv_in->6144->2048_S1, Identity)
(3x3_BottleneckConv_in->12288->2048_S1, Identity)
(3x3_BottleneckConv_in->12288->2048_S1, Identity)
MyGlobalAvgPool2d(keep_dim=False)
DyLinear(2048, 1000)

But, I think The middle dimension of the network searched is a bit untrustworthy

  1. When I wanted to evaluate this sub-model, I met this error

Evaluating the sub-network with latency = 24.7 ms on note10
RuntimeError Traceback (most recent call last)
in
6 , net_config, latency = result
7 print('Evaluating the sub-network with latency = %.1f ms on %s' % (latency, target_hardware))
----> 8 top1 = evaluate_ofa_subnet(
9 ofa_network,
10 imagenet_data_path,
~/桌面/once-for-all-master/ofa/tutorial/imagenet_eval_helper.py in evaluate_ofa_subnet(ofa_net, path, net_config, data_loader, batch_size, device)
18 assert len(net_config['ks']) == 20 and len(net_config['e']) == 20 and len(net_config['d']) == 5
19 ofa_net.set_active_subnet(ks=net_config['ks'], d=net_config['d'], e=net_config['e'])
---> 20 subnet = ofa_net.get_active_subnet().to(device)
21 calib_bn(subnet, path, net_config['r'][0], batch_size)
22 top1 = validate(subnet, path, net_config['r'][0], data_loader, batch_size, device)
~/桌面/once-for-all-master/ofa/imagenet_classification/elastic_nn/networks/ofa_resnets.py in get_active_subnet(self, preserve_weight)
226 active_idx = block_idx[:len(block_idx) - depth_param]
227 for idx in active_idx:
--> 228 blocks.append(self.blocks[idx].get_active_subnet(input_channel, preserve_weight))
229 input_channel = self.blocks[idx].active_out_channel
230 classifier = self.classifier.get_active_subnet(input_channel, preserve_weight)
~/桌面/once-for-all-master/ofa/imagenet_classification/elastic_nn/modules/dynamic_layers.py in get_active_subnet(self, in_channel, preserve_weight)
540
541 # copy weight from current layer
--> 542 sub_layer.conv1.conv.weight.data.copy
(
543 self.conv1.conv.get_active_filter(self.active_middle_channels, in_channel).data)
544 copy_bn(sub_layer.conv1.bn, self.conv1.bn.bn)

RuntimeError: The size of tensor a (768) must match the size of tensor b (88) at non-singleton dimension 0

I guess that Do I need to modify the code for resnet50 network. Please tell me how to modify . Thanks a lot

How many subnets does knowledge distillation optimize?

I have a question that is not cleared in the paper. During knowledge distillation, do you optimize for all 10^19 networks? The elastic - nn portion of the code seems to point to that:

	subnet_settings = []
	for d in depth_list:
		for e in expand_ratio_list:
			for k in ks_list:
				for w in width_mult_list:
					for img_size in image_size_list:
						subnet_settings.append([{
							'image_size': img_size,
							'd': d,
							'e': e,
							'ks': k,
							'w': w,
						}, 'R%s-D%s-E%s-K%s-W%s' % (img_size, d, e, k, w)])

Details about finetuning (25 / 75 epochs)

Thanks for sharing your code for this excellent work!

Could you reveal more details about how you finetune your specialized sub-network? I didn't find the code in the repo, but hyper-parameters like batch size, optimizer, learning rate, lr decay and weight decay will be also very helpful.

Thanks again.

Tutorial for deploying with FPGA

Hi,

Congratulations on this great job. I was amazed by your solution in CVPR2020 competition.

Is there any tutorial to use this work on a FPGA ZynqUltrascale ZU3EG or ZU9EG?

Best regards,

Jorge

Incorrect accuracy while testing the pretrained ofa network

Thanks for sharing your code.

I have some problems when I test the ofa pretrained network.

I build a ofa network using the code provided in the README.

from model_zoo import ofa_net
ofa_network = ofa_net('ofa_mbv3_d234_e346_k357_w1.0', pretrained=True)
    
ofa_network.set_active_subnet(ks=7, e=6, d=4)
subnet = ofa_network.get_active_subnet(preserve_weight=True)

# test the subset on the validation set of the ImageNet

When I set the parameter ks, e and d with different value, the accuracy of the ofa network becomes about 0 in some cases. I show the test results in the following:
image

Have I made some mistake while testing the ofa pretrained network?

Question about training for model `ofa_D4_E6_K7`

Hello
I want to train OFA model(ofa_mbv3) on 'Cifar100' or custom datasets.

so I want to get some training details about first supernet.

When I checked model in progressive-shrinking phase,
I saw F.Linear(Kernel transformer) layer's weights were also trained.

When I want to train First Supernet (ofa_D4_E6_K7), should I train there Kernel transform matrix?

And I was wondering If you had some information about OFA net training on other dataset(like cifar 10, 100), I want to know them.

Thank you all the time.

two questions about ofa

Thanks for sharing your excellent work. I hava two questions about ofa.

  1. Different hardware platforms have different optimizations for op and We often choose efficient op according to differnt hardware platform, can ofa handle this situation when different hardware platform have different prefer op?
  2. On mobile platforms, different camera sensor produce different data, so different training data for different hardware platform. when we usr ofa for a generative network, like srgan, which platform's training data should be used?

Use for our Custom Dataset

Hi,

Thanks for the Amazing work.

I want to train the OFA network on our custom Dataset. How to do it for the same?
Looking forward to your reply.

Thanks,
Darshan

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.