Coder Social home page Coder Social logo

brecq's Introduction

BRECQ

Pytorch implementation of BRECQ, ICLR 2021

@article{li2021brecq,
  title={BRECQ: Pushing the Limit of Post-Training Quantization by Block Reconstruction},
  author={Li, Yuhang and Gong, Ruihao and Tan, Xu and Yang, Yang and Hu, Peng and Zhang, Qi and Yu, Fengwei and Wang, Wei and Gu, Shi},
  journal={arXiv preprint arXiv:2102.05426},
  year={2021}
}

Update (Jul 30): Add Multi-GPU Reconstruction

We release the code for multi-GPU reconstruction.

Note that this cannot be simply performed with torch.nn.DataParallel or DDP. To synchorize the gradients, activation scale, etc., we have to manully call torch.distributed.allreduce.

The first step is to initialize the distributed envrionment, and then use distributed sampler for data loading.

Please use main_imagenet_dist for multi-GPU reconstruction. With this, you can reconstruct larger models and use more data samples!

python -m main_imagenet_dist **KWARGS_FOR_RECON

Pretrained models

We provide all the pretrained models and they can be accessed via torch.hub

For example: use res18 = torch.hub.load('yhhhli/BRECQ', model='resnet18', pretrained=True) to get the pretrained ResNet-18 model.

If you encounter URLError when downloading the pretrained network, it's probably a network failure. An alternative way is to use wget to manually download the file, then move it to ~/.cache/torch/checkpoints, where the load_state_dict_from_url function will check before downloading it.

For example:

wget https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet50_imagenet.pth.tar 
mv resnet50_imagenet.pth.tar ~/.cache/torch/checkpoints

Usage

python main_imagenet.py --data_path PATN/TO/DATA --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration

You can get the following output:

Quantized accuracy before brecq: 0.13599999248981476
Weight quantization accuracy: 66.32799530029297
Full quantization (W2A4) accuracy: 65.21199798583984

MobileNetV2 Quantization:

python main_imagenet.py --data_path PATN/TO/DATA --arch mobilenetv2 --n_bits_w 2 --channel_wise --weight 0.1

Results: Weight quantization accuracy: 59.48799896240234

brecq's People

Contributors

blackandredplayerinfuture avatar yhhhli avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

brecq's Issues

激活值量化问题

非常棒的工作,有一个小问题想请教一下~
在quant_block中有中间的激活层,但是在forward的时候没有使用,比如图中的conv1.activation_function
image
image
期待您的回答~

8bit result?

can you provide normal 8/8 bits quantization results of Resnet18 & mobilenetV2 on imagenet? since this is the most widely used version.

yolov5 Quantitative problem

Thank you very much for your work,
I refer to your code modification yolov5,When w4a8 quantizing There are nearly 3points of loss,Have you experimented yolov5

why not quantize the activation of the last conv layer in a block

Hi,
Thanks for the release of your code. But I have one problem regarding the detail of the implementation.
In quant_block.py, take the following code of ResNet-18 and ResNet-34 for example.
The disable_act_quant is set True for conv2, which disables the quantization of the output of conv2.

class QuantBasicBlock(BaseQuantBlock):
    """
    Implementation of Quantized BasicBlock used in ResNet-18 and ResNet-34.
    """
    def __init__(self, basic_block: BasicBlock, weight_quant_params: dict = {}, act_quant_params: dict = {}):
        super().__init__(act_quant_params)
        self.conv1 = QuantModule(basic_block.conv1, weight_quant_params, act_quant_params)
        self.conv1.activation_function = basic_block.relu1
        self.conv2 = QuantModule(basic_block.conv2, weight_quant_params, act_quant_params, disable_act_quant=True)

        # modify the activation function to ReLU
        self.activation_function = basic_block.relu2

        if basic_block.downsample is None:
            self.downsample = None
        else:
            self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params,
                                          disable_act_quant=True)
        # copying all attributes in original block
        self.stride = basic_block.stride

It will cause a boost in accuracy, the following is the result I get use the your code and the same ImageNet dataset you used in the paper.
[1] and [2] denotes the modification I did to the original code.

image

[1]: quant_block.py→QuantBasicBlock→__init__→self.conv2=QuantModule(... , disable_act_quant=True) self.downsample = QuantModule(basic_block.downsample[0], weight_quant_params, act_quant_params, disable_act_quant=True). Change from True to False;
[2]: quant_block.py→QuantInvertedResidual→__init__→self.conv=nn.Sequential(..., QuantModule(... , disable_act_quant=True), change from True to False

But I do not think it is applicable for most of NPUs, which do quantization of every output of conv layer.
So why not quantize the activation of the last conv layer in a block? Is there any particular reason for this?
Also, for the methods you compared with in your paper, have you checked whether they do the same thing as you do or not?

suggest replacing .view with .reshape in accuracy() function

Got an error:

Traceback (most recent call last):
  File "main_imagenet.py", line 198, in <module>
    print('Quantized accuracy before brecq: {}'.format(validate_model(test_loader, qnn)))
  File "/home/xxxx/anaconda3/envs/torch/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "main_imagenet.py", line 108, in validate_model
    acc1, acc5 = accuracy(output, target, topk=(1, 5))
  File "main_imagenet.py", line 77, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

So suggest replacing .view with .reshape in accuracy() function.

what is BRECQ is stand for?

Hey,

Im just wondering what BRECQ is stand for? BR as block reconstruction but what about the other letters?

Issues regarding Layer-wise reconstruction

Greetings,

I was trying to reproduce the layer-wise reconstruction compared to block-wise. I have commented out the first two lines of this loop of quant_model.py
image

However, the accuracy has drastically dropped to 0.62% for ResNet 18 (W2), which cannot reach the 65.19% accuracy from your paper. Could you briefly introduce the way of applying layer-wise reconstruction?

Thanks.

Quantization seems to be doesn't produce good accuracy ? Are there additional settings I missed?

So I tried running your code on my dataset with a pre-trained ResNet50 model. I got these results

Full precision model i got accuracy of : MobileNetV2 (58.19)
Quantized model (W8A8) i got accuracy of : MobileNetV2 (12.02)
Quantized model (W6A6) i got accuracy of : MobileNetV2 (10.12)

Full precision model i got accuracy of : ResNet-50 (65.16)
Quantized model (W8A8) i got accuracy of : ResNet-50 (13.22)
Quantized model (W6A6) i got accuracy of : ResNet-50 (11.02)

image
https://github.com/yhhhli/BRECQ/blob/main/main_imagenet.py#L201C1-L229C87

My accuracy however does not come nearly as close to the float model which is around 58.19% and 65.16% but after quantization
Are there additional settings I missed?

Quantization doesn't work?

Hi,

So I tried running your code on CIFAR-10 with a pre-trained ResNet50 model. I've attached the code below.
My accuracy however does not come nearly as close to the float model which is around 93% but after quanitzation: I get:

  • Accuracy of the network on the 10000 test images: 10.0 % top5: 52.28 %

Please help me with this. The code is inside the zip file.

main_cifar.zip
s

Cuda Error when launching example

user@machine:/path_to/BRECQ# python main_imagenet.py --data_path /path_to/IMAGENET_2012/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration
You are using fake SyncBatchNorm2d who is actually the official BatchNorm2d
==> Using Pytorch Dataset
Downloading: "https://github.com/yhhhli/BRECQ/releases/download/v1.0/resnet18_imagenet.pth.tar" to /root/.cache/torch/hub/checkpoints/resnet18_imagenet.pth.tar
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 44.6M/44.6M [00:27<00:00, 1.70MB/s]
Traceback (most recent call last):
File "main_imagenet.py", line 178, in
cnn.cuda()
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in cuda
return self._apply(lambda t: t.cuda(device))
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 570, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 593, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 680, in
return self._apply(lambda t: t.cuda(device))
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Some questions about implementation details

Hello, thank you for an interesting paper and nice code.

I have two questions concerning implementation details.

  1. Does the "one-by-one" block reconstruction mentioned in the paper mean that input to each block comes from already quantized preceding blocks, i.e. each block may correct quantization errors coming from previous blocks? Or maybe input to each block is collected from the full-precision model?
  2. Am I correct in my understanding that in block-wise reconstruction objective you use gradients for each object in calibration sample independently (i.e. no gradient averaging or smth, like in Adam mentioned on the paper)? Besides, what is happening here in data_utils.py, why do you add 1.0 to the gradients?
cached_grads = cached_grads.abs() + 1.0
# scaling to make sure its mean is 1
# cached_grads = cached_grads * torch.sqrt(cached_grads.numel() / cached_grads.pow(2).sum())

Thank you for your time and consideration!

W4A4 quantization problem of resnet18

I used the model (resnet18) provided by the project and the default parameters of the code to run W4A4 quantization. The results are quite different from those in the paper. Are there any other specific settings required to activate quantization to 4 bits?

Achieving very low accuracy

Hi, I am running the command exactly as written but am receiving very low accuracy:
This is the command:
python3 main_imagenet.py --data_path '/home/ofekglick/BRECQ/tiny-imagenet-200' --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration

I am receiving an accuracy of about 0.05% , both before and after quantizing
I am running the code on the tiny-imagenet-200 dataset.

Any idea why this could happen?

outcome is different with and without hyperparameter 'test_before_calibration'

CUDA_VISIBLE_DEVICES=3 python main_imagenet.py --data_path /disk2/imagenet/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant --test_before_calibration | tee w2a4_test.log
CUDA_VISIBLE_DEVICES=3 python main_imagenet.py --data_path /disk2/imagenet/ --arch resnet18 --n_bits_w 2 --channel_wise --n_bits_a 4 --act_quant | tee w2a4.log

the outcome of w2a4 and w2a32 after reconstruction are different when I remove the hyperparameter 'test_before_calibration', but this one wouldn't modify the seed. I'm wondering why and looking forward to your reply.

How could I get scale and offset with scalar-type form?

the scale and offset of UniformAffineQuantizer are tensor-type data after I finished quantization.
How to convert them to scalar data ,used to generate quantization form. @yhhhli
such as:

Encoding:{
bitwidth: integer
is_symmetric: string
max: float
min: float
offset: integer
scale: float
}

and

image

Faster RCNN quantification

Hello, I have read the quantitative results of Faster RCNN reported in your paper. Could you please release your quantified weight file? Thank you

channel_wise quantization

Hi, nice idea for quantizaton
But it seems that the paper(not include the appendix) did not point that it is channel-wise quantization. however, the code showed it is.
As we know, it is of course that channel-wise quntization would outperform layer-wise quantization.
So, maybe it's hard to say that the performance of your method is close to QAT

Cannot reproduce the accuracy

Greetings,

Really appreciate your open source contribution.

However, it seems the accuracy mentioned in the paper cannot be reproduced applying the standard Imagenet. For instance, with the full precision model, I have tested Resnet 18 (70.186%), MobileNetv2(71.618%), which is slightly lower than the results from your paper (71.08, 72.49 respectively).

Have you utilized any preprocessing techniques other than imagenet.build_imagenet_data?

Thanks

在使用论文中提出的Fisher-diag方式进行Hessian估计时会提示Trying to backward through the graph a second time

如文中所提出的Fisher-diag方式来估计Hessian矩阵,需要计算每一层pre-activation的梯度。但在实际代码运行时,save_grad_data中的cur_grad = get_grad(cali_data[i * batch_size:(i + 1) * batch_size])在执行到第二个batch的时候会报错Trying to backward through the graph a second time,第一个batch的数据并不会报错。不知道作者是否遇到过类似的情况?

The bit setting for the first and last layer.

Hi, impressive work.

The implementation of quant_layer forward is shown below:

BRECQ/quant/quant_layer.py

Lines 193 to 210 in 2888b29

def forward(self, input: torch.Tensor):
if self.use_weight_quant:
weight = self.weight_quantizer(self.weight)
bias = self.bias
else:
weight = self.org_weight
bias = self.org_bias
out = self.fwd_func(input, weight, bias, **self.fwd_kwargs)
# disable act quantization is designed for convolution before elemental-wise operation,
# in that case, we apply activation function and quantization after ele-wise op.
if self.se_module is not None:
out = self.se_module(out)
out = self.activation_function(out)
if self.disable_act_quant:
return out
if self.use_act_quant:
out = self.act_quantizer(out)
return out

I found the layer only quant the weight and output, instead of input. The inner convolution layers can be quantized correctly by the act_quantizer at the end of the quant block. But there still are two problems.

  1. It will make the first Conv1 and last FC layer with full precision (FP32) input feature maps, even set the bit num for input.
  2. Moreover, if disable_8bit_head_stem is False (default), the first Conv1 and FC layers are set to W8A32 (PF32) quant.
    The Conv1 has an INT8 output feature map. That can directly make the first convolution layer of the first block W4A8 quant for W4A4 quant, especially, W2A8 quant for the W2A4 quant.

Are there additional settings I missed?
If that is the exact setting. But it seems that the paper did not mention that.
Dose the settings are the same for AdaQuant or other baseline/workloads?

Thanks.

Basic questions about algorithm and measuring sensitivity

Hi, it is a nice code for a quantization.
However, I have some questions about sensitivity measurement and genetic algorithm.

  1. In the paper, you expressed that layer sensitivity can be measured with diagonal loss while off-diagonal loss expresses cross-layer sensitivity. However, I could not find such a loss term in your code in this git..... if you do not mind, please let me know where you wrote those terms?

  2. Also, could you please tell me where are your implementation details of scale data for weights and activation? Where do you calculation them and improve them?

  3. It was quite interesting that using genetic algorithm to find the optimal bitwidth configuration for each block. But as well as sensitivity problem, I could not find such a code for this algorithm in the git. Again, if you do not mind it either, please let me know where it is.

By the way, I am really impressed with your paper and code for new quantization method. Hope you to have a nice day. Thank you and sorry for the questions!

RuntimeError: `Trying to backward through the graph a second time` when setting opt_mode to fisher_diag

Hi Yuhang,

Thank you for open sourcing this project.

As noted in the paper that diagonal fisher information matrix is applied to replace the pre-activation Hessian, we tried to set opt_mode to fisher_diag instead of mse for reconstruction. However, a runtime error is thrown:

File "xxxx/quant/data_utils.py", line 184, in __call__
    loss.backward()
  File "xxxx/lib/python3.6/site-packages/torch/tensor.py", line 221, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "xxxx/lib/python3.6/site-packages/torch/autograd/__init__.py", line 132, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

It seems occuring during backward to save grad:

handle = self.layer.register_backward_hook(self.data_saver)
        with torch.enable_grad():
            try:
                self.model.zero_grad()
                inputs = model_input.to(self.device)
                self.model.set_quant_state(False, False)
                out_fp = self.model(inputs)
                quantize_model_till(self.model, self.layer, self.act_quant)
                out_q = self.model(inputs)
                loss = F.kl_div(F.log_softmax(out_q, dim=1), F.softmax(out_fp, dim=1), reduction='batchmean')
                # here....
                loss.backward()
            except StopForwardException:
                pass

As indicated by the error, first backward succeeds but second fails.

We tried to create a very simple network for reproducing and the error keeps showing:

class DummyNet(nn.Module):
  def __init__(self):
      super(DummyNet, self).__init__()
      self.conv1 = nn.Conv2d(3, 32, 3, 3)
      self.conv2 = nn.Conv2d(32, 32, 3, 3)
      self.conv3 = nn.Conv2d(32, 1, 3, 3)

  def forward(self, x):
      x = self.conv1(x)
      x = self.conv2(x)
      x = self.conv3(x)
      output = F.log_softmax(x, dim=0)
      return output

recon_model function is the same as that in the main_imagenet file:

def recon_model(model: nn.Module):
        """
        Block reconstruction. For the first and last layers, we can only apply layer reconstruction.
        """
        for name, module in model.named_children():
            if isinstance(module, QuantModule):
                if module.ignore_reconstruction is True:
                    print('Ignore reconstruction of layer {}'.format(name))
                    continue
                else:
                    layer_reconstruction(qnn, module, **kwargs)
            elif isinstance(module, BaseQuantBlock):
                if module.ignore_reconstruction is True:
                    print('Ignore reconstruction of block {}'.format(name))
                    continue
                else:
                    print('Reconstruction for block {}'.format(name))
                    block_reconstruction(qnn, module, **kwargs)
            else:
                recon_model(module)

We are not quite sure why PyTorch complains here as backward only calls once in a batch... But we also noticed that after calling save_grad_data, grad would be cached for later loss calculation:

# in block_reconstruction
err = loss_func(out_quant, cur_out, cur_grad)

Is intermediate grad still available at this point since backward has already been called? In our case, even we workaround for the first error inside save_grad_data, here we would get a same one (i. e. backward twice)

Environment

Ubuntu 16.04 / Python 3.6.8 / PyTorch 1.7.1 / CUDA 10.1

Any advice would be appreciated.

Last layer quantization

Hi,
Very impressive coding.

Last layer bit-width setting, especially restoring to 8-bit, seems strange:

module_list[-1].weight_quantizer.bitwidth_refactor(8)
module_list[-2].act_quantizer.bitwidth_refactor(8)

The weights of the last, usually dense, layer are set to 8-bits.
However, the activations of the preceeding layer are also set to 8-bit.

Was this your intention or is it a bug?

Thanks,
Ilan.

Questions about measuring sensitivity and genetic algorithm application

Hi, it is a nice code for a quantization.
However, I have some questions about sensitivity measurement and genetic algorithm.

  1. In the paper, you expressed that layer sensitivity can be measured with diagonal loss while off-diagonal loss expresses cross-layer sensitivity. However, I could not find such a loss term in your code in this git..... if you do not mind, please let me know where you wrote those terms.

  2. It was quite interesting that using genetic algorithm to find the optimal bitwidth configuration for each block. But as well as sensitivity problem, I could not find such a code for this algorithm in the git. Again, if you do not mind it either, please let me know where it is.

By the way, I am really impressed with your paper and code for new quantization method. Hope you to have a nice day. Thank you!

p.s. when you mentioned about examining permutations, you said that it would be 3^n permutations while n is the number of layers in a block. and 3 is the number of bit candidates(2,4,6). However, due to negligible performance drop with 4bits and 8 bits quantization, only 2 bit permutation is considered according to your paper. However, if only 2 bit permutation is considered, shouldn't it be 1^n permutations per each block? I am a bit confused with this part.

权重更新范围限制

想问一下雨杭大佬,你们有没有试过AdaQuant他们的方法,去掉权重更新的范围限制来进行重构呢?这样子理论上量化的效果是不是会有进一步提升?

act-quant的疑惑

2
请问第三种量化是在使用BRECQ调整完权重round后, 再用LSQ方法调整activation的步长吗?

Why loss function value is too high? Is it expecteted result?

I tried running your code for with a pre-trained ResNet50 and MobilieNetV2 model. I got loss function value for output and pred losses:

rec_loss = lp_loss(pred, tgt, p=self.p)
:param pred: output from quantized model
:param tgt: output from FP model
https://github.com/yhhhli/BRECQ/blob/main/quant/block_recon.py#L149

pd_loss = self.pd_loss(F.log_softmax(output / self.T, dim=1), F.softmax(output_fp / self.T, dim=1)) / self.lam
:param pred: output from quantized model
:param tgt: output from FP model
https://github.com/yhhhli/BRECQ/blob/main/quant/block_recon.py#L151

Are there additional settings I missed?

disable act quantization is designed for convolution

Hi,
Very impressive coding.

There is a question about the quantization of activation values.

In the code:

disable act quantization is designed for convolution before elemental-wise operation,

in that case, we apply activation function and quantization after ele-wise op.

Why can it be replaced like this?

Thanks

How to reproduce mobilenetv2 w2a4 result?

command:

python3 main_imagenet.py --data_path image --arch mobilenetv2 --n_bits_w 2 --n_bits_a 4 --channel_wise --weight 0.1 --act_quant

result:

Full quantization (W2A4) accuracy: 0.1419999897480011

How to reproduce mobilenetv2 w2a4 result?
thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.