yhhhli / brecq Goto Github PK

View Code? Open in Web Editor NEW

242.0 242.0 56.0 83 KB

Pytorch implementation of BRECQ, ICLR 2021

License: MIT License

Python 100.00%

brecq's People

Contributors

Stargazers

Watchers

Forkers

lliai liuguoyou blackandredplayerinfuture qigongsun neo-lin-cortex dorakbg tianhaofu chenbohua3 blueardour linwk666 vuiseng9 mike-zyz githubfragments jackeyzhang1001 penghu-cs changerzz wimh966 shiyongming scotter-qian muhammad-amin-nadim vpoul renyan1998 leonui deploy-soon 666dzy666 penpaperkeycode bxz9200 ysjsbz buaabai xiaohangge aqsc niko-zyf marcoromanelli-github xvyaward itminner qiulinzhang o-ooovo hylihitic yiliu30 robotseye nfrumkin irving-qin alanjonson sungjuryu cekcoco crocodilegogogo lenghuixing0330 tygerwu meowrowan b1ore supervan-young goatwu lswzjuer agongee jiho264

brecq's Issues

Issues regarding Layer-wise reconstruction

Greetings,

I was trying to reproduce the layer-wise reconstruction compared to block-wise. I have commented out the first two lines of this loop of quant_model.py

However, the accuracy has drastically dropped to 0.62% for ResNet 18 (W2), which cannot reach the 65.19% accuracy from your paper. Could you briefly introduce the way of applying layer-wise reconstruction?

Thanks.

Cannot reproduce the accuracy

Greetings,

Really appreciate your open source contribution.

However, it seems the accuracy mentioned in the paper cannot be reproduced applying the standard Imagenet. For instance, with the full precision model, I have tested Resnet 18 (70.186%), MobileNetv2(71.618%), which is slightly lower than the results from your paper (71.08, 72.49 respectively).

Have you utilized any preprocessing techniques other than imagenet.build_imagenet_data?

Thanks

Faster RCNN quantification

Hello, I have read the quantitative results of Faster RCNN reported in your paper. Could you please release your quantified weight file? Thank you

outcome is different with and without hyperparameter 'test_before_calibration'

CUDA_VISIBLE_DEVICES=3 python main_imagenet.py --data_path /disk2/imagenet/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration | tee w2a4_test.log
CUDA_VISIBLE_DEVICES=3 python main_imagenet.py --data_path /disk2/imagenet/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant | tee w2a4.log

the outcome of w2a4 and w2a32 after reconstruction are different when I remove the hyperparameter 'test_before_calibration', but this one wouldn't modify the seed. I'm wondering why and looking forward to your reply.

激活值量化问题

非常棒的工作，有一个小问题想请教一下~
在quant_block中有中间的激活层，但是在forward的时候没有使用，比如图中的conv1.activation_function

期待您的回答~

Cuda Error when launching example

user@machine:/path_to/BRECQ# python main_imagenet.py --data_path /path_to/IMAGENET_2012/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration
You are using fake SyncBatchNorm2d who is actually the official BatchNorm2d
==> Using Pytorch Dataset
Downloading: "https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet18_imagenet.pth.tar" to /root/.cache/torch/hub/checkpoints/resnet18_imagenet.pth.tar
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.6M/44.6M [00:27<00:00, 1.70MB/s]
Traceback (most recent call last):
File "main_imagenet.py", line 178, in
cnn.cuda()
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in cuda
return self._apply(lambda t: t.cuda(device))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 593, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The bit setting for the first and last layer.

Hi, impressive work.

The implementation of quant_layer forward is shown below:

BRECQ/quant/quant_layer.py

Lines 193 to 210 in 2888b29

    
           def forward(self, input: torch.Tensor): 
        
               if self.use_weight_quant: 
        
                   weight = self.weight_quantizer(self.weight) 
        
                   bias = self.bias 
        
               else: 
        
                   weight = self.org_weight 
        
                   bias = self.org_bias 
        
               out = self.fwd_func(input, weight, bias, **self.fwd_kwargs) 
        
               # disable act quantization is designed for convolution before elemental-wise operation, 
        
               # in that case, we apply activation function and quantization after ele-wise op. 
        
               if self.se_module is not None: 
        
                   out = self.se_module(out) 
        
               out = self.activation_function(out) 
        
               if self.disable_act_quant: 
        
                   return out 
        
               if self.use_act_quant: 
        
                   out = self.act_quantizer(out) 
        
               return out

I found the layer only quant the weight and output, instead of input. The inner convolution layers can be quantized correctly by the act_quantizer at the end of the quant block. But there still are two problems.

It will make the first Conv1 and last FC layer with full precision (FP32) input feature maps, even set the bit num for input.
Moreover, if disable_8bit_head_stem is False (default), the first Conv1 and FC layers are set to W8A32 (PF32) quant.
The Conv1 has an INT8 output feature map. That can directly make the first convolution layer of the first block W4A8 quant for W4A4 quant, especially, W2A8 quant for the W2A4 quant.

Are there additional settings I missed?
If that is the exact setting. But it seems that the paper did not mention that.
Dose the settings are the same for AdaQuant or other baseline/workloads?

Thanks.

what is BRECQ is stand for?

Hey,

Im just wondering what BRECQ is stand for? BR as block reconstruction but what about the other letters?

搭建半天的环境，结果感情受到了伤害···

how to reproduce zero data result?

as title.

there is a bug:

BRECQ/main_imagenet.py

Line 173 in da93abc

    
           train_loader, test_loader = build_imagenet_data(batch_size=args.batch_size, workers=args.batchsize,

args.batchsize should be args.workers

channel_wise quantization

Hi, nice idea for quantizaton
But it seems that the paper(not include the appendix) did not point that it is channel-wise quantization. however, the code showed it is.
As we know, it is of course that channel-wise quntization would outperform layer-wise quantization.
So, maybe it's hard to say that the performance of your method is close to QAT

suggest replacing .view with .reshape in accuracy() function

Got an error:

Traceback (most recent call last):
  File "main_imagenet.py", line 198, in <module>
    print('Quantized accuracy before brecq: {}'.format(validate_model(test_loader, qnn)))
  File "/home/xxxx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "main_imagenet.py", line 108, in validate_model
    acc1, acc5 = accuracy(output, target, topk=(1, 5))
  File "main_imagenet.py", line 77, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

So suggest replacing .view with .reshape in accuracy() function.

Quantization doesn't work?

Hi,

So I tried running your code on CIFAR-10 with a pre-trained ResNet50 model. I've attached the code below.
My accuracy however does not come nearly as close to the float model which is around 93% but after quanitzation: I get:

Accuracy of the network on the 10000 test images: 10.0 % top5: 52.28 %

Please help me with this. The code is inside the zip file.

main_cifar.zip
s

yolov5 Quantitative problem

Thank you very much for your work，
I refer to your code modification yolov5，When w4a8 quantizing There are nearly 3points of loss，Have you experimented yolov5

Questions about measuring sensitivity and genetic algorithm application

Hi, it is a nice code for a quantization.
However, I have some questions about sensitivity measurement and genetic algorithm.

In the paper, you expressed that layer sensitivity can be measured with diagonal loss while off-diagonal loss expresses cross-layer sensitivity. However, I could not find such a loss term in your code in this git..... if you do not mind, please let me know where you wrote those terms.
It was quite interesting that using genetic algorithm to find the optimal bitwidth configuration for each block. But as well as sensitivity problem, I could not find such a code for this algorithm in the git. Again, if you do not mind it either, please let me know where it is.

By the way, I am really impressed with your paper and code for new quantization method. Hope you to have a nice day. Thank you!

p.s. when you mentioned about examining permutations, you said that it would be 3^n permutations while n is the number of layers in a block. and 3 is the number of bit candidates(2,4,6). However, due to negligible performance drop with 4bits and 8 bits quantization, only 2 bit permutation is considered according to your paper. However, if only 2 bit permutation is considered, shouldn't it be 1^n permutations per each block? I am a bit confused with this part.

COCO dataset mAP

We use the APoT method on COCO dataset ,but the mAP is very low,why?Thx! @yhhhli @blackandredplayerinfuture

Question regarding hard rounding

BRECQ/quant/adaptive_rounding.py

Lines 50 to 51 in 819d440

    
           else: 
        
               x_int = x_floor + (self.alpha >= 0).float()

Would you elaborate how you derived hard-rounding scheme and what's the use of it?

Some questions about implementation details

Hello, thank you for an interesting paper and nice code.

I have two questions concerning implementation details.

Does the "one-by-one" block reconstruction mentioned in the paper mean that input to each block comes from already quantized preceding blocks, i.e. each block may correct quantization errors coming from previous blocks? Or maybe input to each block is collected from the full-precision model?
Am I correct in my understanding that in block-wise reconstruction objective you use gradients for each object in calibration sample independently (i.e. no gradient averaging or smth, like in Adam mentioned on the paper)? Besides, what is happening here in data_utils.py, why do you add 1.0 to the gradients?

cached_grads = cached_grads.abs() + 1.0
# scaling to make sure its mean is 1
# cached_grads = cached_grads * torch.sqrt(cached_grads.numel() / cached_grads.pow(2).sum())

Thank you for your time and consideration!

act-quant的疑惑

请问第三种量化是在使用BRECQ调整完权重round后, 再用LSQ方法调整activation的步长吗？

Last layer quantization

Hi,
Very impressive coding.

Last layer bit-width setting, especially restoring to 8-bit, seems strange:

module_list[-1].weight_quantizer.bitwidth_refactor(8)
module_list[-2].act_quantizer.bitwidth_refactor(8)

The weights of the last, usually dense, layer are set to 8-bits.
However, the activations of the preceeding layer are also set to 8-bit.

Was this your intention or is it a bug?

Thanks,
Ilan.

why not quantize the activation of the last conv layer in a block

Hi,
Thanks for the release of your code. But I have one problem regarding the detail of the implementation.
In quant_block.py, take the following code of ResNet-18 and ResNet-34 for example.
The disable_act_quant is set True for conv2, which disables the quantization of the output of conv2.

class QuantBasicBlock(BaseQuantBlock):
    """
    Implementation of Quantized BasicBlock used in ResNet-18 and ResNet-34.
    """
    def __init__(self, basic_block: BasicBlock, weight_quant_params: dict = {}, act_quant_params: dict = {}):
        super().__init__(act_quant_params)
        self.conv1 = QuantModule(basic_block.conv1, weight_quant_params, act_quant_params)
        self.conv1.activation_function = basic_block.relu1
        self.conv2 = QuantModule(basic_block.conv2, weight_quant_params, act_quant_params, disable_act_quant=True)

        # modify the activation function to ReLU
        self.activation_function = basic_block.relu2

        if basic_block.downsample is None:
            self.downsample = None
        else:
            self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params,
                                          disable_act_quant=True)
        # copying all attributes in original block
        self.stride = basic_block.stride

It will cause a boost in accuracy, the following is the result I get use the your code and the same ImageNet dataset you used in the paper.
[1] and [2] denotes the modification I did to the original code.

[1]: quant_block.py→QuantBasicBlock→__init__→self.conv2=QuantModule(... , disable_act_quant=True) self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True). Change from True to False;
[2]: quant_block.py→QuantInvertedResidual→__init__→self.conv=nn.Sequential(..., QuantModule(... , disable_act_quant=True), change from True to False

But I do not think it is applicable for most of NPUs, which do quantization of every output of conv layer.
So why not quantize the activation of the last conv layer in a block? Is there any particular reason for this?
Also, for the methods you compared with in your paper, have you checked whether they do the same thing as you do or not?

RuntimeError: `Trying to backward through the graph a second time` when setting opt_mode to fisher_diag

Hi Yuhang,

Thank you for open sourcing this project.

As noted in the paper that diagonal fisher information matrix is applied to replace the pre-activation Hessian, we tried to set opt_mode to fisher_diag instead of mse for reconstruction. However, a runtime error is thrown:

File "xxxx/quant/data_utils.py", line 184, in __call__
    loss.backward()
  File "xxxx/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "xxxx/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

It seems occuring during backward to save grad:

handle = self.layer.register_backward_hook(self.data_saver)
        with torch.enable_grad():
            try:
                self.model.zero_grad()
                inputs = model_input.to(self.device)
                self.model.set_quant_state(False, False)
                out_fp = self.model(inputs)
                quantize_model_till(self.model, self.layer, self.act_quant)
                out_q = self.model(inputs)
                loss = F.kl_div(F.log_softmax(out_q, dim=1), F.softmax(out_fp, dim=1), reduction='batchmean')
                # here....
                loss.backward()
            except StopForwardException:
                pass

As indicated by the error, first backward succeeds but second fails.

We tried to create a very simple network for reproducing and the error keeps showing:

class DummyNet(nn.Module):
  def __init__(self):
      super(DummyNet, self).__init__()
      self.conv1 = nn.Conv2d(3, 32, 3, 3)
      self.conv2 = nn.Conv2d(32, 32, 3, 3)
      self.conv3 = nn.Conv2d(32, 1, 3, 3)

  def forward(self, x):
      x = self.conv1(x)
      x = self.conv2(x)
      x = self.conv3(x)
      output = F.log_softmax(x, dim=0)
      return output

recon_model function is the same as that in the main_imagenet file:

def recon_model(model: nn.Module):
        """
        Block reconstruction. For the first and last layers, we can only apply layer reconstruction.
        """
        for name, module in model.named_children():
            if isinstance(module, QuantModule):
                if module.ignore_reconstruction is True:
                    print('Ignore reconstruction of layer {}'.format(name))
                    continue
                else:
                    layer_reconstruction(qnn, module, **kwargs)
            elif isinstance(module, BaseQuantBlock):
                if module.ignore_reconstruction is True:
                    print('Ignore reconstruction of block {}'.format(name))
                    continue
                else:
                    print('Reconstruction for block {}'.format(name))
                    block_reconstruction(qnn, module, **kwargs)
            else:
                recon_model(module)

We are not quite sure why PyTorch complains here as backward only calls once in a batch... But we also noticed that after calling save_grad_data, grad would be cached for later loss calculation:

# in block_reconstruction
err = loss_func(out_quant, cur_out, cur_grad)

Is intermediate grad still available at this point since backward has already been called? In our case, even we workaround for the first error inside save_grad_data, here we would get a same one (i. e. backward twice)

Environment

Ubuntu 16.04 / Python 3.6.8 / PyTorch 1.7.1 / CUDA 10.1

Any advice would be appreciated.

Quantization seems to be doesn't produce good accuracy ? Are there additional settings I missed?

So I tried running your code on my dataset with a pre-trained ResNet50 model. I got these results

Full precision model i got accuracy of : MobileNetV2 (58.19)
Quantized model (W8A8) i got accuracy of : MobileNetV2 (12.02)
Quantized model (W6A6) i got accuracy of : MobileNetV2 (10.12)

Full precision model i got accuracy of : ResNet-50 (65.16)
Quantized model (W8A8) i got accuracy of : ResNet-50 (13.22)
Quantized model (W6A6) i got accuracy of : ResNet-50 (11.02)

https://github.com/yhhhli/BRECQ/blob/main/main_imagenet.py#L201C1-L229C87

My accuracy however does not come nearly as close to the float model which is around 58.19% and 65.16% but after quantization
Are there additional settings I missed?

Pre-trained model

Hi,
Very impressive coding.

https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet50_imagenet.pth.tar
I tested the pre-trained model resnet50 with only 76.62 performance, but the paper wrote 77.00.

Thanks

权重更新范围限制

想问一下雨杭大佬，你们有没有试过AdaQuant他们的方法，去掉权重更新的范围限制来进行重构呢？这样子理论上量化的效果是不是会有进一步提升？

Does it necessary to do weight quantization reconstruction before full quantization reconstruction?

If my target is to get a full quantization model, does it necessary to do weight quantization reconstruction before full quantization reconstruction as the main_imagenet.py shown? Can I skip weight quantization reconstruction and turn on full quantization reconstruction directly?

关于目标检测网络的FP模型参数来源

请问论文中提到的fasterRCNN RetinaNet 网络的全精度模型参数是从哪里下载的呢？

Why loss function value is too high? Is it expecteted result?

I tried running your code for with a pre-trained ResNet50 and MobilieNetV2 model. I got loss function value for output and pred losses:

rec_loss = lp_loss(pred, tgt, p=self.p)
:param pred: output from quantized model
:param tgt: output from FP model
https://github.com/yhhhli/BRECQ/blob/main/quant/block_recon.py#L149

pd_loss = self.pd_loss(F.log_softmax(output / self.T, dim=1), F.softmax(output_fp / self.T, dim=1)) / self.lam
:param pred: output from quantized model
:param tgt: output from FP model
https://github.com/yhhhli/BRECQ/blob/main/quant/block_recon.py#L151

Are there additional settings I missed?

disable act quantization is designed for convolution

Hi,
Very impressive coding.

There is a question about the quantization of activation values.

In the code:

disable act quantization is designed for convolution before elemental-wise operation,

in that case, we apply activation function and quantization after ele-wise op.

Why can it be replaced like this？

Thanks

你好，retinanet和deeplabv3源码有吗？

我迁移过程中，retinanet图片大小不一致之类的问题怎么解决啊？还有那种score为0的问题

Achieving very low accuracy

Hi, I am running the command exactly as written but am receiving very low accuracy:
This is the command:
python3 main_imagenet.py --data_path '/home/ofekglick/BRECQ/tiny-imagenet-200' --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration

I am receiving an accuracy of about 0.05% , both before and after quantizing
I am running the code on the tiny-imagenet-200 dataset.

Any idea why this could happen?

可以导出量化后的模型吗？

试图导出6w6a的模型，无论如何尝试都无法导出

W4A4 quantization problem of resnet18

I used the model (resnet18) provided by the project and the default parameters of the code to run W4A4 quantization. The results are quite different from those in the paper. Are there any other specific settings required to activate quantization to 4 bits?

What is the purpose for setting retain_graph=True?

BRECQ/quant/block_recon.py

Line 91 in 2888b29

err.backward(retain_graph=True)

What is the purpose for setting retain_graph=True?

8bit result？

can you provide normal 8/8 bits quantization results of Resnet18 & mobilenetV2 on imagenet? since this is the most widely used version.

Basic questions about algorithm and measuring sensitivity

Hi, it is a nice code for a quantization.
However, I have some questions about sensitivity measurement and genetic algorithm.

In the paper, you expressed that layer sensitivity can be measured with diagonal loss while off-diagonal loss expresses cross-layer sensitivity. However, I could not find such a loss term in your code in this git..... if you do not mind, please let me know where you wrote those terms?
Also, could you please tell me where are your implementation details of scale data for weights and activation? Where do you calculation them and improve them?
It was quite interesting that using genetic algorithm to find the optimal bitwidth configuration for each block. But as well as sensitivity problem, I could not find such a code for this algorithm in the git. Again, if you do not mind it either, please let me know where it is.

By the way, I am really impressed with your paper and code for new quantization method. Hope you to have a nice day. Thank you and sorry for the questions!

在使用论文中提出的Fisher-diag方式进行Hessian估计时会提示Trying to backward through the graph a second time

如文中所提出的Fisher-diag方式来估计Hessian矩阵，需要计算每一层pre-activation的梯度。但在实际代码运行时，save_grad_data中的cur_grad = get_grad(cali_data[i * batch_size:(i + 1) * batch_size])在执行到第二个batch的时候会报错Trying to backward through the graph a second time，第一个batch的数据并不会报错。不知道作者是否遇到过类似的情况？

How could I get scale and offset with scalar-type form?

the scale and offset of UniformAffineQuantizer are tensor-type data after I finished quantization.
How to convert them to scalar data ,used to generate quantization form. @yhhhli
such as:

Encoding:{
bitwidth: integer
is_symmetric: string
max: float
min: float
offset: integer
scale: float
}

and

Where the FPGA accelerator simulator source code is?

According to the paper appendix B.4.3 latency acquisition, the simulator is available in the provided source codes. But how can I find it? thx.

How to reproduce mobilenetv2 w2a4 result?

command:

python3 main_imagenet.py --data_path image --arch mobilenetv2 --n_bits_w 2 --n_bits_a 4 --channel_wise --weight 0.1 --act_quant

result:

Full quantization (W2A4) accuracy: 0.1419999897480011

How to reproduce mobilenetv2 w2a4 result?
thanks

关于混合精度

请问敏感度获取和位宽分配的代码能否开源？

How to deal with data parallel and distributed data parallel?

On my eyes, your code is just running with single gpu while I need to test this code with multi-gpu for other implementations. I just want to check that you have ran your code using data parallel and distributed data parallel.

	def forward(self, input: torch.Tensor):
	if self.use_weight_quant:
	weight = self.weight_quantizer(self.weight)
	bias = self.bias
	else:
	weight = self.org_weight
	bias = self.org_bias
	out = self.fwd_func(input, weight, bias, **self.fwd_kwargs)
	# disable act quantization is designed for convolution before elemental-wise operation,
	# in that case, we apply activation function and quantization after ele-wise op.
	if self.se_module is not None:
	out = self.se_module(out)
	out = self.activation_function(out)
	if self.disable_act_quant:
	return out
	if self.use_act_quant:
	out = self.act_quantizer(out)
	return out

yhhhli / brecq Goto Github PK

brecq's People

Contributors

Stargazers

Watchers

Forkers

brecq's Issues

disable act quantization is designed for convolution before elemental-wise operation,

in that case, we apply activation function and quantization after ele-wise op.

Recommend Projects

Recommend Topics

Recommend Org