Coder Social home page Coder Social logo

使用repvgg训练,然后参数deploy为true,即改成单路推理结构时,出现训练无法收敛的情况?是不是由于每一次训练都直接转成了单路结构的原因? about flexible-yolov5 HOT 30 CLOSED

wjli-debug avatar wjli-debug commented on May 28, 2024
使用repvgg训练,然后参数deploy为true,即改成单路推理结构时,出现训练无法收敛的情况?是不是由于每一次训练都直接转成了单路结构的原因?

from flexible-yolov5.

Comments (30)

Bobo-y avatar Bobo-y commented on May 28, 2024

@wjli-debug 这样应该不太行,相当于跳跃连接结构没了,网络深了就训练不出来了。想想为啥提出resnet

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

那么对于这个的解决思路是不是应该是训练的时候要一直保持dep参数为false,那么等训练完成之后再转为推理结构,而不能在创造repvgg结构的时候直接使用deploy=True?

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

因为我看到repvgg官方有提供switch_to_deploy转化的代码,应该是训练的时候要保持deploy参数为false,即一直分支训练,到完成之后再单独进行convert转化为单支结构

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

@wjli-debug 是的。这个网络这么设计是为了推理时加速。训练的时候残差链接还是很重要的,不然网络深了就崩了

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

但是我在将保存使用repvgg训练保存的best.pt进行转化为推理结构时一直出现问题,无论是使用官方给出的还是自己写然后加载在调用都有下面问题的存在,作者有时间看一下这个是什么原因造成的吗?
Traceback (most recent call last):
File "convert_1.py", line 6, in
model.load_state_dict(torch.load('best.pt'))
File "/data1/docker_project/ENV/flex_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1223, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for RepVGG:

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

image
image
出现l了保存的权重缺失了许多参数,这是由于什么原因导致的呀?

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

发个权重给我吧

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

链接:https://pan.baidu.com/s/1Evl3vG_hmPOJU_G-g76S-w
提取码:qwer
昨天我看这两个权重,best权重是使用yolov5框架训练生成,另一个是repvgg官方提供的权重文件;两者在结构上存在不同,我昨天想在train.py中保存为best部分代码前将权重修改为下述:
if (not nosave) or (final_epoch and not evolve): # if save
ckpt = {
'epoch': epoch,
'best_fitness': best_fitness,
'model': deepcopy(de_parallel(model)),
'ema': deepcopy(ema.ema),
'updates': ema.updates,
'optimizer': optimizer.state_dict(),
'wandb_id': loggers.wandb.wandb_run.id if loggers.wandb else None,
'date': datetime.now().isoformat()}

            # Save last, best and delete
            # Save last, best and delete
            torch.save(ckpt, last)
            if best_fitness == fi:
                # Switch to deploy mode
                model.switch_to_pretrained() # Add this line
                torch.save(ckpt, best)
            if opt.save_period > 0 and epoch % opt.save_period == 0:
                torch.save(ckpt, w / f'epoch{epoch}.pt')
            del ckpt
            callbacks.run('on_model_save', last, epoch, final_epoch, best_fitness, fi)

保存前引入model.switch_to_pretrained()进行转化,目前正在重新训练,不确定是否能成功

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

内网无法使用百度云,邮箱吧, [email protected]

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

backbone = get_RepVGG_func_by_name('RepVGG-A0')()

pretrained_dict = torch.load('../best.pt')

backbone_state_dict = pretrained_dict['model'].backbone.state_dict()

print(backbone_state_dict)
我使用上述代码获取了保存的best.pt中model的backbone的参数矩阵,如果直接对model中的backbone进行转换是否会破环整个best.pt的model结构?

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

转换是指?

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

呃,没有成功,还是出现了一些bug,貌似这样不太行

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

转换是将分支改成单路结构

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

原本训练的backbone不是分支结构的吗?想在保存之后单独将backbone转换成单路结构的权重值,再赋值给model的backbone

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

我理解是不行的,状态字典大概率匹配不上

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

作者有什么建议或者方法吗?

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

我正在训练一个网络,然后会尝试转换结构

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

好的,麻烦大佬了

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

我刚刚训练了一下,拿保存的权重以部署的方式导出onnx 是没有问题的:先以带分支的结构构件网络,然后加载权重,然后调用重参数化接口。我会更新一下export_onnx.py

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

还有一点,如果你想加载别人预训练的权重,需要确认他们的状态字典是怎么保存的:带分支还是不带分支。你需要以相应的模型状态去加载别人的预训练权重

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

嗯,好的,多谢作者,我去尝试一下,看看效果

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

image
我运行查看了一下,并将deploy=True,但是下述代码好像并没有将分支全部重参数为3*3
if deploy:
for name, module in model.named_modules():
if hasattr(module, 'switch_to_deploy'):
module.switch_to_deploy()

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

它虽然导出为部署模型,但是仍然是多分支结构;但是repvgg应该在导出作为推理时是要单分支结构,上述代码发现没起到效果

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

我知道原因了,加载预训练权重失败是他们的key 和我的命名不一样,只需要在 load_state_dict(, strict=False) 即可,不让检查名字匹配

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

switch_to_deploy 这个函数默认没有的,我刚加上的,默认是 switch_to_pretrain, 不太符合部署命名,我加了一个函数

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

load_state_dict(, strict=False) ,这个是在哪部分添加?是单独加载best.pt进行转换吗?

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

不是特别懂作者说的是在哪部分,我看到你提供的代码中并没有load_state_dict部分

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

暴力一点,在train.py 129 行,直接加上 model.backbone.load_state_dict(torch.load('下载的预训练权重'), strict=False) 只要权重shape 没问题的话就能直接加载进去了

from flexible-yolov5.

Bobo-y avatar Bobo-y commented on May 28, 2024

加载预训练模型确实需要一点改动,需要把下载的预训练模型的 key 改成此repo 对应的才行。后续有时间我再看看吧 @wjli-debug

from flexible-yolov5.

wjli-debug avatar wjli-debug commented on May 28, 2024

嗯,好的

from flexible-yolov5.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.