Coder Social home page Coder Social logo

repvgg's People

Contributors

dingxiaoh avatar lmk123568 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

repvgg's Issues

Inference environment

Hi,
Thanks for the great work and for sharing the code.

  1. Is the reported performance on model compiled to TRT or naive Pytorch?
  2. Do. you have comparison of latency (batch size of 1) rather than throughput that you dominate?

The output between the RepVGGNet (training mode) and the converted RepVGGNet (deploy mode) is not the same.

I created a training RepVGGNet_A0 by calling this interface:
rep_vgg_a0_training = create_RepVGG_A0()

And I created a deploy RepVGGNet_A0 from the training RepVGGNet_A0 by calling this interface:
rep_vgg_a0_deploy = repvgg_model_convert(rep_vgg_a0_training, create_RepVGG_A0)

for the same input tensor
in_tensors = torch.rand([args.batch_size, args.in_channels, args.height, args.width])

I regretfully found that the output between rep_vgg_a0_training and rep_vgg_a0_deploy is not the same.
Anything wrong with my code?
How can I get a correct converted deploy model?

Looking forward to your kind reply!

weights

Hello, thank you very much for the work done by you and your team. I noticed that there are many versions in the pre-training model provided. I don't know what the different names mean, could I add your WeChat or QQ for your advice if convenient? thank you

deploy model and training model inference value check error

I using the provided code for model output equivalence test, but it failed.
Here is my code, are there any error in my code?


from repvgg import repvgg_model_convert, create_RepVGG_A0
import torch
import time
import numpy as np

def model_equivalence(model_1,
model_2,
device,
rtol=1e-05,
atol=1e-08,
num_tests=100,
input_size=(1, 3, 32, 32)):

model_1.to(device)
model_2.to(device)

for _ in range(num_tests):
    x = torch.rand(size=input_size).to(device)
    y1 = model_1(x).detach().cpu().numpy()
    y2 = model_2(x).detach().cpu().numpy()
    if np.allclose(a=y1, b=y2, rtol=rtol, atol=atol,equal_nan=False) == False:
        print("Model equivalence test sample failed: ")
        print(y1)
        print(y2)
        return False
return True

def measure_inference_latency(model,
device,
input_size=(1, 3, 32, 32),
num_samples=100):

model.to(device)
model.eval()

x = torch.rand(size=input_size).to(device)

start_time = time.time()
for _ in range(num_samples):
    _ = model(x)
end_time = time.time()
elapsed_time = end_time - start_time
elapsed_time_ave = elapsed_time / num_samples

return elapsed_time_ave

if name == "main":
RepVGG_A0 = create_RepVGG_A0(deploy=False)
RepVGG_A0.load_state_dict(torch.load('RepVGG-A0-train.pth')) # or train from scratch
# do whatever you want with train_model
RepVGG_A0_deploy = repvgg_model_convert(RepVGG_A0, create_RepVGG_A0, save_path='RepVGG_A0_deploy.pth')
print(model_equivalence(RepVGG_A0, RepVGG_A0_deploy , torch.device("cpu:0"), rtol=1e-03, atol=1e-06,
num_tests=100, input_size=(1, 3, 224, 224)))


ask help: lr_scheduler has multi period? maybe a bug for local parallel train

Dear DingXiaoH, thanks for your innovative paper,
could you help me on this questions? thanks!
lr_scheduler = CosineAnnealingLR(optimizer=optimizer, T_max=args.epochs * IMAGENET_TRAINSET_SIZE // args.batch_size // ngpus_per_node)

In my local parallel GPU test, without distributed training. batch_size is for all GPU with the orig code, when we used n gpu, there will be n period change of cosineLR(0--PI, as one period)
Is this normal for such multi period CosineLR ?

thanks in advance!!

Plug-in version implementation

hi @DingXiaoH, This is a simple and intuitive implementation !!! I implemented a plug-in version about RepVGGBlock. I hope it will help you and others

This plug-in version implements the following functions:

  1. The training model and the test model are separated;
  2. You can apply RepVGGBlock to other models;
  3. You can use RepVGGBlock and ACBlock together in training, no matter which order。

Framework

Implementation and test files are as follows:

Key implementation

Training and testing models are separated by inserting and fusing functions

####### conv_helper
def insert_repvgg_block(model: nn.Module):
    items = list(model.named_children())
    idx = 0
    while idx < len(items):
        name, module = items[idx]
        if isinstance(module, nn.Conv2d) and module.kernel_size[0] > 1:
            # 将标准卷积替换为RepVGGBlock
            in_channels = module.in_channels
            out_channels = module.out_channels
            kernel_size = module.kernel_size
            stride = module.stride
            padding = module.padding
            dilation = module.dilation
            groups = module.groups
            padding_mode = module.padding_mode

            acblock = RepVGGBlock(in_channels,
                                  out_channels,
                                  kernel_size[0],
                                  stride[0],
                                  padding=padding[0],
                                  padding_mode=padding_mode,
                                  dilation=dilation,
                                  groups=groups)
            model.add_module(name, acblock)
            # 如果conv层之后跟随着BN层,那么删除该BN层
            # 参考[About BN layer #35](https://github.com/DingXiaoH/ACNet/issues/35)
            if (idx + 1) < len(items) and isinstance(items[idx + 1][1], nn.BatchNorm2d):
                new_layer = nn.Identity()
                model.add_module(items[idx + 1][0], new_layer)
        else:
            insert_repvgg_block(module)
        idx += 1


def fuse_repvgg_block(model: nn.Module):
    for name, module in model.named_children():
        if isinstance(module, RepVGGBlock):
            # 将RepVGGBlock替换为标准卷积
            kernel, bias = get_equivalent_kernel_bias(module.rbr_dense,
                                                      module.rbr_1x1,
                                                      module.rbr_identity,
                                                      module.in_channels,
                                                      module.groups,
                                                      module.padding)
            # 新建标准卷积,赋值权重和偏差后重新插入模型
            fused_conv = nn.Conv2d(module.in_channels,
                                   module.out_channels,
                                   module.kernel_size,
                                   stride=module.stride,
                                   padding=module.padding,
                                   dilation=module.dilation,
                                   groups=module.groups,
                                   padding_mode=module.padding_mode,
                                   bias=True
                                   )
            fused_conv.weight = nn.Parameter(kernel.detach().cpu())
            fused_conv.bias = nn.Parameter(bias.detach().cpu())
            model.add_module(name, fused_conv)
        else:
            fuse_repvgg_block(module)

I modified the specific fusion function to ensure ACB and repvgg_block can be used in one training, and the block can be inserted into other models with different sizes conv

################ repvgg_block.py
# -*- coding: utf-8 -*-

"""
@date: 2021/2/2 下午8:32
@file: repvgg_block.py
@author: zj
@description: 
"""

import torch.nn as nn


def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
    result = nn.Sequential()
    result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
                                        kernel_size=kernel_size, stride=stride, padding=padding, groups=groups,
                                        bias=False))
    result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
    return result


class RepVGGBlock(nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size,
                 stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros'):
        super(RepVGGBlock, self).__init__()

        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.padding_mode = padding_mode

        # assert kernel_size == 3                      # ----------------- Annotate it so that the block can be inserted into other models
        # assert padding == 1

        padding_11 = padding - kernel_size // 2

        self.rbr_identity = nn.BatchNorm2d(
            num_features=in_channels) if out_channels == in_channels and stride == 1 else None
        self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
                                 stride=stride, padding=padding, groups=groups)
        self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride,
                               padding=padding_11, groups=groups)

        self._init_weights()

    def _init_weights(self, gamma=0.01):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, gamma)
                nn.init.constant_(m.bias, gamma)

    def forward(self, inputs):
        if self.rbr_identity is None:
            id_out = 0
        else:
            id_out = self.rbr_identity(inputs)

        return self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out

    def repvgg_convert(self):
        kernel, bias = self.get_equivalent_kernel_bias()
        return kernel.detach().cpu().numpy(), bias.detach().cpu().numpy(),
############## repvgg_util.py
# -*- coding: utf-8 -*-

"""
@date: 2021/2/2 下午8:51
@file: repvgg_util.py
@author: zj
@description: 
"""

import torch
import torch.nn as nn
import numpy as np


#   This func derives the equivalent kernel and bias in a DIFFERENTIABLE way.
#   You can get the equivalent kernel and bias at any time and do whatever you want,
#   for example, apply some penalties or constraints during training, just like you do to the other models.
#   May be useful for quantization or pruning.
def get_equivalent_kernel_bias(rbr_dense, rbr_1x1, rbr_identity, in_channels, groups, padding_11):
    kernel3x3, bias3x3 = _fuse_bn_tensor(rbr_dense, in_channels, groups)
    kernel1x1, bias1x1 = _fuse_bn_tensor(rbr_1x1, in_channels, groups)
    kernelid, biasid = _fuse_bn_tensor(rbr_identity, in_channels, groups)
    return kernel3x3 + _pad_1x1_to_3x3_tensor(kernel1x1, padding_11) + kernelid, bias3x3 + bias1x1 + biasid


def _pad_1x1_to_3x3_tensor(kernel1x1, padding_11=1): # --------------->add  padding_11 to 1x1 conv can match 3x3 conv
    if kernel1x1 is None:
        return 0
    else:
        # return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
        return torch.nn.functional.pad(kernel1x1, [padding_11] * 4)


def _fuse_bn_tensor(branch, in_channels, groups):
    if branch is None:
        return 0, 0
    if isinstance(branch, nn.Sequential):
        layer_list = list(branch)
        if len(layer_list) == 2 and isinstance(layer_list[1], nn.Identity):
            # conv/bn已经在acb中进行了融合
            return branch.conv.weight, branch.conv.bias
        kernel = branch.conv.weight
        running_mean = branch.bn.running_mean
        running_var = branch.bn.running_var
        gamma = branch.bn.weight
        beta = branch.bn.bias
        eps = branch.bn.eps
    else:
        assert isinstance(branch, nn.BatchNorm2d)
        input_dim = in_channels // groups
        kernel_value = np.zeros((in_channels, input_dim, 3, 3), dtype=np.float32)
        for i in range(in_channels):
            kernel_value[i, i % input_dim, 1, 1] = 1

        kernel = torch.from_numpy(kernel_value).to(branch.weight.device)
        running_mean = branch.running_mean
        running_var = branch.running_var
        gamma = branch.weight
        beta = branch.bias
        eps = branch.eps
    std = (running_var + eps).sqrt()
    t = (gamma / std).reshape(-1, 1, 1, 1)
    return kernel * t, beta - running_mean * gamma / std

About Test

I notice that the implementation of precision matching test in the warehouse can be further improved

################## origin
print(((train_y - deploy_y) ** 2).sum())    # Will be around 1e-10
################## mine
print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)

how to use

you can create model as usual, then insert acblock or repvgg_block or togetehr no matter which order

。。。
。。。
    if cfg.MODEL.CONV.ADD_BLOCKS is not None:
        assert isinstance(cfg.MODEL.CONV.ADD_BLOCKS, tuple)
        for add_block in cfg.MODEL.CONV.ADD_BLOCKS:
            if add_block == 'RepVGGBlock':
                insert_repvgg_block(model)
            if add_block == 'ACBlock':
                insert_acblock(model)
。。。
。。。

Then normal training and model parameter preservation are carried out, if you want to fuse ACBlock, you can use func fuse_acblock; if you want to fuse RepVGGBlock, then use fuse_repvgg_block. Note: The order of insertion should be opposite to that of fusion.

insert_acblock -> insert_regvgg_block .... fuse_regvgg_block -> fuse_acblock
or 
insert_regvgg_block -> insert_acblock .... fuse_acblock -> fuse_regvgg_block

The complete implementation can be referred to ZJCV/ZCls

Why not compare with Mobilenetv2/v3?

Dear author:
thanks for this insightful idea. It's really userful to deploy plain cnn since its simplicity and high efficiency. I wonder why you did not conduct experiments on mobilenetv2/v3. I am eager to see whether mobilenetv2/v3 will benefit from the plain structure? Thank you again.

模型修改问题

您好,我在使用RepVGG_A0预训练模型做语义分割的任务时,进行了如下操作:
1、更换了stage层,因为我需要一个通道是输出
stage4 = nn.Sequential(nn.ReLU(),
nn.Conv2d(192, 96, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)),
nn.ReLU(),
nn.Conv2d(96, 1, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1)))
model.stage4 = stage4
2、剔除了gap与linear,nn.Sequential(*list(model.modules())[:-2])

但是训练时报错:

Traceback (most recent call last):
File "/home/zgj/pycharmProject/competetion/kaggle/HuBMAP/1st_version/train.py", line 101, in
main()
File "/home/zgj/pycharmProject/competetion/kaggle/HuBMAP/1st_version/train.py", line 73, in main
predict = model(img) # ['out']
File "/home/zgj/anaconda3/envs/torch1.7.0py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zgj/anaconda3/envs/torch1.7.0py3.7/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/zgj/anaconda3/envs/torch1.7.0py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zgj/pycharmProject/competetion/kaggle/HuBMAP/input/RepVGG/repvgg.py", line 145, in forward
out = self.linear(out)

File "/home/zgj/anaconda3/envs/torch1.7.0py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zgj/anaconda3/envs/torch1.7.0py3.7/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 93, in forward
return F.linear(input, self.weight, self.bias)
File "/home/zgj/anaconda3/envs/torch1.7.0py3.7/lib/python3.7/site-packages/torch/nn/functional.py", line 1690, in linear
ret = torch.addmm(bias, input, weight.t())
RuntimeError: mat1 dim 1 must match mat2 dim 0

请问该如何解决呢?

分布式lr,batch_size等参数设置

你好,我看到README中给出参数为“lr:01, 8gpu, global batch_size 256”,请问一下“global batch_size”指单gpu上batch为256还是8张gpu上总共为256
我使用4server(8gpu/server)进行分布式训练,目前设置参数为 (batch_size 1024, lr 0.4),但最终精读不高,发现是学习率衰减存在问题。想请教一下,在分布式模式下,参数该如何设置

More smaller model

Thanks for sharing great model.

My question is, How can i make model smaller than repvgg_a0 by changing blocks or width multiplier?

any idea?

More Examples

First of all, thanks for your great job. It would be wonderful if you could kindly provide more examples to show the capabilities of the proposed model.

drop mIOU when convert to int8 TensorRT

I use RepVGG A2 as Backbone segmentation. After converting to engine TensorRT, float 16 keep the mIOU as the pytorch model, but when I convert to int8 TensorRT, mIOU drop ~5%. Did you try it yet?

Feature extraction

Hi, thanks for your great work

Can I use RepVGG as feature extractor? And if it's possible how i can do that in this source code.

Thanks

speed test example

Thank you for your job. It will be helpful if you could provide some examples which we can get the same speed (inference) of your RepVGG model in the table4 of the paper.

Can I train this code in window?

The following error occurred when I trained this code under window:
AttributeError: module 'torch.multiprocessing' has no attribute 'spawn'

So I wonder if it's systemic. Thank you very much!!

Winograd conv speed

Only MULs are presented in the paper, do you have did experiments on the speed of winograd conv?

部署时精度差异大

感谢大佬的作品。
使用时,我训练小模型10MFLOPS以内部署时精度损失可以忽略,但是大模型2GFLOPS时精度就对不齐了:
LOG:
deploy param: stage0.rbr_reparam.weight torch.Size([64, 1, 3, 3]) -0.048573527
deploy param: stage0.rbr_reparam.bias torch.Size([64]) 0.23182523
deploy param: stage1.0.rbr_reparam.weight torch.Size([128, 64, 3, 3]) -0.0054542203
deploy param: stage1.0.rbr_reparam.bias torch.Size([128]) 1.0140312
deploy param: stage1.1.rbr_reparam.weight torch.Size([128, 64, 3, 3]) 0.0006282824
deploy param: stage1.1.rbr_reparam.bias torch.Size([128]) 0.32761782
deploy param: stage1.2.rbr_reparam.weight torch.Size([128, 128, 3, 3]) 0.0023862773
deploy param: stage1.2.rbr_reparam.bias torch.Size([128]) 0.34976208
deploy param: stage1.3.rbr_reparam.weight torch.Size([128, 64, 3, 3]) -9.027165e-05
deploy param: stage1.3.rbr_reparam.bias torch.Size([128]) 0.0063683093
deploy param: stage2.0.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -8.460902e-05
deploy param: stage2.0.rbr_reparam.bias torch.Size([256]) 0.11033552
deploy param: stage2.1.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -0.00010023986
deploy param: stage2.1.rbr_reparam.bias torch.Size([256]) -0.15826604
deploy param: stage2.2.rbr_reparam.weight torch.Size([256, 256, 3, 3]) -5.3966836e-05
deploy param: stage2.2.rbr_reparam.bias torch.Size([256]) -0.15924689
deploy param: stage2.3.rbr_reparam.weight torch.Size([256, 128, 3, 3]) -6.7551824e-05
deploy param: stage2.3.rbr_reparam.bias torch.Size([256]) -0.37404576
deploy param: stage2.4.rbr_reparam.weight torch.Size([256, 256, 3, 3]) -0.00012947948
deploy param: stage2.4.rbr_reparam.bias torch.Size([256]) -0.6853457
deploy param: stage2.5.rbr_reparam.weight torch.Size([256, 128, 3, 3]) 7.473848e-05
deploy param: stage2.5.rbr_reparam.bias torch.Size([256]) -0.16874048
deploy param: stage3.0.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.000433887
deploy param: stage3.0.rbr_reparam.bias torch.Size([512]) 0.18602118
deploy param: stage3.1.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00048246872
deploy param: stage3.1.rbr_reparam.bias torch.Size([512]) -0.7235512
deploy param: stage3.2.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.00021061227
deploy param: stage3.2.rbr_reparam.bias torch.Size([512]) -0.5657553
deploy param: stage3.3.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.00081703335
deploy param: stage3.3.rbr_reparam.bias torch.Size([512]) -0.37847003
deploy param: stage3.4.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00033185782
deploy param: stage3.4.rbr_reparam.bias torch.Size([512]) -0.57922906
deploy param: stage3.5.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.0007206367
deploy param: stage3.5.rbr_reparam.bias torch.Size([512]) -0.56909364
deploy param: stage3.6.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.0003344199
deploy param: stage3.6.rbr_reparam.bias torch.Size([512]) -0.5628111
deploy param: stage3.7.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.00021987755
deploy param: stage3.7.rbr_reparam.bias torch.Size([512]) -0.34248477
deploy param: stage3.8.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00010127398
deploy param: stage3.8.rbr_reparam.bias torch.Size([512]) -0.5895205
deploy param: stage3.9.rbr_reparam.weight torch.Size([512, 256, 3, 3]) -0.0005824505
deploy param: stage3.9.rbr_reparam.bias torch.Size([512]) -0.37577158
deploy param: stage3.10.rbr_reparam.weight torch.Size([512, 512, 3, 3]) -0.00012262027
deploy param: stage3.10.rbr_reparam.bias torch.Size([512]) -0.6199002
deploy param: stage3.11.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 1.503076e-06
deploy param: stage3.11.rbr_reparam.bias torch.Size([512]) -0.7054796
deploy param: stage3.12.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.0006349176
deploy param: stage3.12.rbr_reparam.bias torch.Size([512]) -1.0350925
deploy param: stage3.13.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00037807773
deploy param: stage3.13.rbr_reparam.bias torch.Size([512]) -1.1399512
deploy param: stage3.14.rbr_reparam.weight torch.Size([512, 512, 3, 3]) 0.00025178236
deploy param: stage3.14.rbr_reparam.bias torch.Size([512]) -0.27695537
deploy param: stage3.15.rbr_reparam.weight torch.Size([512, 256, 3, 3]) 0.00074805244
deploy param: stage3.15.rbr_reparam.bias torch.Size([512]) -0.8776718
deploy param: stage4.0.rbr_reparam.weight torch.Size([1024, 512, 3, 3]) -0.00013951868
deploy param: stage4.0.rbr_reparam.bias torch.Size([1024]) 0.021552037
deploy param: linear.weight torch.Size([372, 1024]) 0.0051029953
deploy param: linear.bias torch.Size([372]) 0.17604762

打印代码:

    deploy_model = build_func(deploy=True,**kwargs)
    for name, param in deploy_model.named_parameters():
        print('deploy param: ', name, param.size(), np.mean(converted_weights[name]))
        param.data = torch.from_numpy(converted_weights[name]).float()

Reproduce accuracy

Thanks for your inspiring work.
I'm trying to reproduce the light A0 and midsize B1 model, but I only got 69.5% top1 accuracy for A0.
B1 accuracy is also lower than reported by about 1-2%.
I followed the 120 epoch cosine shedule, batch size 8*256.
Any other specific settings or tricks employed in the training pipe?

Different outputs from train-model and deploy-model

After I converted the trained model into the inference-time structure, I tested two models with the same input, and I got different outputs from train-model (RepVGG-X-train.pth) and deploy model (RepVGG-X-depoly.pth).

Have you done that kind of comparison?
lots of thanks~

difference output in segmentation with RepVGG encoder

I tried to implement RepVGG backbone into U-net. After training process, the result is pretty well.
Then I converted model with my code (base on your code):

`model = smp.Unet(
encoder_name="RepVGG-A2", # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
classes=config_seg.NUM_CLASSES, # model output channels (number of classes in your dataset)
deploy=False
).cuda()
model.load_state_dict(pretrained_dict)

model_deploy = smp.Unet(
    encoder_name="RepVGG-A2",        # choose encoder, e.g. mobilenet_v2 or efficientnet-b7
    classes=config_seg.NUM_CLASSES,                      # model output channels (number of classes in your dataset)
    deploy=True
).cuda()

all_weights = {}
for name, module in model.named_modules():
    if hasattr(module, 'repvgg_convert'):
        kernel, bias = module.repvgg_convert()
        print('>> name: ', name)
        all_weights[name + '.rbr_reparam.weight'] = kernel
        all_weights[name + '.rbr_reparam.bias'] = bias
        print('convert RepVGG block')
    else:
        for p_name, p_tensor in module.named_parameters():
            full_name = name + '.' + p_name
            print('>> not vgg block name: ', name, p_name, full_name)
            if full_name not in all_weights:
                all_weights[full_name] = p_tensor.detach().cpu().numpy()
        for p_name, p_tensor in module.named_buffers():
            full_name = name + '.' + p_name
            if full_name not in all_weights:
                all_weights[full_name] = p_tensor.cpu().numpy()


for name, param in model_deploy.named_parameters():
    print('deploy param: ', name, param.size(), np.mean(all_weights[name]))
    param.data = torch.from_numpy(all_weights[name]).float()
del model
model = model_deploy.cuda()`

deploy 模型文件

您好,请问能开源deploy 模型文件吗?

我用train 的模型文件convert到deploy文件,但发现两种模型输出的结果不一致,希望能得到您的帮助~

bias=False?

I noticed the conv2d bias in conv_bn() is False and doubt if it is reasonable.

"If the conv2d bias is True, meanwhile the final bias returned by _fuse_bn_tensor() should consider this, the accuracy will be improved". Does it make sense?

Have you done some comparison?

Thank you!

Repeated output when training with multi-gpu

Hi DingXiaoH,
I encountered with the repeated outputs when training with multi-gpu (8 GPUs), and the specified outputs are as follows.
image
As you can see in the image, the program prints the training result of the same batch for 8 times.
I use the code below to start the training.
python train.py -a RepVGG-A0 --dist-url 'tcp://127.0.0.1:23333' --dist-backend 'nccl' --multiprocessing-distributed --world-size 1 --rank 0 --workers 32 imagenet
How can I solve this problem? Or, this problem doesn't influence the result?

Thanks,
Ema1997.

替换resnet18作为TRN网络的backbone时的精度

十分感谢作者的工作和开源的代码。

我近期用RepVGG-B0替换了ResNet-18做backbone来训练和测试,在相同超参数的情况下,ResNet18可以达到85%,而RepVGG-B0仅有70%,对此有一些疑惑。
整体模型是TRN,用于多帧的动作识别,网络的结构主要是:

  1. CNN对多帧进行feature提取;
  2. 对多个feature做concat;
  3. MLP对concat后的feature做分类

以下是一些训练参数的设置,两个backbone用了相同参数:
优化器:Adam
学习率:1.0e-5
betas:[0.9, 0.99]
eps:1.0e-8
weight_decay:1.0e-4

学习率调整策略:ExponentialLR
gamma:0.99

epoch:150
batch:64
输入尺寸:96*96
同时训练帧数:5

测试时用的指标是f1-score,对替换backbone前后各训练了5个模型,对5个模型取在测试集上的最优指标,取平均。
其中,对RepVGG-base的模型进行测试时,没有进行deploy转换。

个人认为RepVGG对工业界模型部署十分友好,希望能用上这个模型,故提此issue。

.pth model to onnx model with an error

RuntimeError: Error(s) in loading state_dict for RepVGG:
Missing key(s) in state_dict: "stage0.rbr_reparam.weight", "stage0.rbr_reparam.bias", "stage1.0.rbr_reparam.weight", ...........
Unexpected key(s) in state_dict: "stage0.rbr_dense.conv.weight", "stage0.rbr_dense.bn.weight", "stage0.rbr_dense.bn.bias", "stage0.rbr_dense.bn.running_mean", "stage0.rbr_dense.bn.running_var", "stage0.rbr_dense.bn.num_batches_tracked", "stage0.rbr_1x1.conv.weight",

what's wrong with this, looking forward to your reply.

Is the repvgg_model_convert() a liitle wrong? The result of deploy version is not the same with that in train

Hi
I try to download the train-version of RepVGG-A2, and convert it to the deployed-version.

` model_o = create_RepVGG_A2()

model_o.load_state_dict(torch.load('/home/forrest/pycharm/data/RepVGG-A2-train.pth'), strict=False)
# original download version weight


model_o_copy = model_o

model = repvgg_model_convert(model_o_copy, create_RepVGG_A2, '/home/forrest/pycharm/data/RepVGG-A2-deploytest.pth')




x = torch.from_numpy(np.array(Image.open('/home/forrest/Downloads/data/syj/Haze20/HazeClear-train-test/train/00002/002.jpg'))).float().unsqueeze(0).permute(0,3,1,2) / 255.0
model2 = create_RepVGG_A2(deploy=True)


model2.load_state_dict(torch.load('/home/forrest/pycharm/data/RepVGG-A2-deploytest.pth'), strict=False)


out1 = model_o(x)
out2 = model2(x)
out3 = model(x)
print(torch.sum(torch.abs(out1 - out2)), torch.sum(torch.abs(out1 - out3)))

print(out1[0, :10], out2[0, :10], out3[0, :10])`

The result is
tensor(1090.8134, grad_fn=<SumBackward0>) tensor(1090.8134, grad_fn=<SumBackward0>) tensor([-1.1406e+00, -7.5946e-01, -9.9028e-01, -1.6798e+00, -1.5024e+00, -8.5764e-01, -1.3238e+00, -1.4038e-03, -4.1698e-01, -8.1957e-01], grad_fn=<SliceBackward>) tensor([ 1.9279, -0.5468, 0.9310, 0.5799, 0.7641, 1.0118, 0.0925, -0.8435, -0.9261, -0.0287], grad_fn=<SliceBackward>) tensor([ 1.9279, -0.5468, 0.9310, 0.5799, 0.7641, 1.0118, 0.0925, -0.8435, -0.9261, -0.0287], grad_fn=<SliceBackward>)

It seems the repvgg_model_convert() version is not the same with that in train-version?
I wonder why?

bug of convert.py

when i try to convert the A1-train.pt but get "return kernel * t, beta - running_mean * gamma / std
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"
can you help me ? thank you!

RepVGG in OD

Hello, I embed the RepVGG module into the detection model for training. Without converting the model, I can predict the result normally, but when I use the method in convert.py to process the model, the result cannot be predicted.

Why converted model become slower?

I used the repvgg-a0 in my task and convert the trained model by whole_model_convert function. But the trained model is much faster than the converted model while testing. The trained model's test time is around 288s, but converted model's test time over 400s.

Can RepVGG block combine with SeparableConv2d?

Thanks for your great work. It helps me a lot. Now I want to speed up the 3x3 convolution even further.

My question is, can RepVGG block combine with SeparableConv2d? This is shown in the figure below.

image

The accuracy problem

I wonder use your pytorch script to train RepVGGA0 can achieve which accuracy?

I try to reproduce RepVGGA0 with 0.1 labelsmooth, but get accuracy as 71.6%

deploy 模型文件

您好,请问能开源deploy 模型文件吗?

我用train 的模型文件convert到deploy文件,但发现两种模型输出的结果不一致,希望能得到您的帮助~

Style transfer

Hello,
How does RepVGG compare to using standard VGG in style transfer tasks (Perceptual Loss) where it's uncommon to use models such as ResNet?

Data loading time during training

Hi @DingXiaoH , thanks for the great work.
I am running the training code in the repo recently. However, according to the log, the data loading time seems unstable, and slow down the training speed a lot.
图片

My GPU utilization also fluctuates wildly during training, from 0% to 99%.
图片

Apparently, the bottleneck should be the data loading of imagenet dataset. Do you have some practices or suggenstions on how to accelerate the training?

Thank you.

RepVGG vs GENet

As far as I understand your approach has similar idea with GPU-Efficient Networks. Have you done some comparison with them?

> ```直接跑初始化的模型的话是没问题的,但是如果是加载权重的模型就不行

    x = torch.from_numpy(np.random.randn(1,*shape)).float()
    y = model(x)
    model_d = repvgg_model_convert(model,model_func,out_c=186*2,num_blocks=[4,6,16,1],in_c=1)
    y_d = model_d(x)
    print('diff abs: max {},\n**2:{}'.format(abs(y - y_d).max(),((y - y_d) ** 2).sum()))

输出:
diff abs: max 6.67572021484375e-06,
**2:1.419987460948846e-09
这里看正常的,但是实际训练下来,最后导出就是有之前贴出来的那么大差异。convert细节我没搞清,不好枉下结论。

我在实现RepVGG的时候观察到两个现象:

  1. 训练阶段和测试阶段的模型都需要执行eval后再进行精度比较,否则会找成较大差异;
  2. 当初始化方式如下所示:
def init_weights(modules):
    for m in modules:
        if isinstance(m, nn.Conv2d):
            nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            if m.bias is not None:
                nn.init.zeros_(m.bias)
        elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
            nn.init.constant_(m.weight, 1)
            nn.init.constant_(m.bias, 0)

会造成较大精度不对齐,而使用下述初始化则可以保证一致性

    def _init_weights(self, gamma=0.01):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, gamma)
                nn.init.constant_(m.bias, gamma)

下面是测试代码:

def test_regvgg():
    model = RepVGGRecognizer()
    model.eval()
    print(model)

    data = torch.randn(1, 3, 224, 224)
    insert_repvgg_block(model)
    model.eval()
    train_outputs = model(data)[KEY_OUTPUT]
    print(model)

    fuse_repvgg_block(model)
    model.eval()
    eval_outputs = model(data)[KEY_OUTPUT]
    print(model)

    print(torch.sqrt(torch.sum((train_outputs - eval_outputs) ** 2)))
    print(torch.allclose(train_outputs, eval_outputs, atol=1e-8))
    assert torch.allclose(train_outputs, eval_outputs, atol=1e-8)

希望能够对你有所帮助

Originally posted by @zjykzj in #23 (comment)

训练的RepVGG-A0模型在转模型时报错,我用的pytorch版本1.4.0,请问如何解决

RuntimeError: Error(s) in loading state_dict for RepVGG:
Missing key(s) in state_dict: "stage0.rbr_dense.conv.weight", "stage0.rbr_dense.bn.weight", "stage0.rbr_dense.bn.bias", "stage0.rbr_dense.bn.running_mean", "stage0.rbr_dense.bn.running_var", "stage0.rbr_1x1.conv.weight", "stage0.rbr_1x1.bn.weight", "stage0.rbr_1x1.bn.bias", "stage0.rbr_1x1.bn.running_mean", "stage0.rbr_1x1.bn.running_var", "stage1.0.rbr_dense.conv.weight", "stage1.0.rbr_dense.bn.weight", "stage1.0.rbr_dense.bn.bias", "stage1.0.rbr_dense.bn.running_mean", "stage1.0.rbr_dense.bn.running_var" .......

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.