Coder Social home page Coder Social logo

增加卷积层性能下降 about transreid HOT 4 CLOSED

damo-cv avatar damo-cv commented on August 17, 2024
增加卷积层性能下降

from transreid.

Comments (4)

michuanhaohao avatar michuanhaohao commented on August 17, 2024

同学你好,关于convolution stem的使用需要考虑到两点:

  1. convolution stem本身结构是有效的,当然现在比较好的设计是volo里面的堆叠三层conv bn relu,你这个一层效果不会特别显著
  2. 最根本的原因在于你新加的conv层没有预训练而是随机初始化,而vit的其他层全部都是ImageNet预训练的,导致网络输入全部错乱了,因此产生了掉点

你可以做以下实验来验证一下:

  1. 均不用ImageNet预训练参数,全部从头随机初始化来训练,看看加入conv之后是否有效
  2. 按照vit官方训练代码,加入conv之后你在ImageNet重新预训练模型,看看加入conv之后是否有效

另外额外补充一下,我们近期的工作表明,patch embed里面bn+relu是帮助vit训练的关键,conv层作用反而不是那么大,可以参考我们最近的论文:Scaled ReLU Matters for Training Vision Transformers

from transreid.

ljwwwiop avatar ljwwwiop commented on August 17, 2024

个人一点点分析
1 首先你的卷积核还是比较大的,在overlap时候可能会丢失部分感受野中比较重要的特征块,可以尝试使用的更小的卷积一块一块有的滑动处理操作更符合patch embed的特点。而且patch embed中最后的conv主要是用于通道修改操作。
2 个人觉得可以在这里可以尝试一下max Pooling 或者avg pooling替换卷积试试。

from transreid.

LiYanchao-lab avatar LiYanchao-lab commented on August 17, 2024

请问如果想在ImageNet重新预训练模型,有官方教程地址吗?感谢感谢

from transreid.

michuanhaohao avatar michuanhaohao commented on August 17, 2024

请问如果想在ImageNet重新预训练模型,有官方教程地址吗?感谢感谢

可以用timm库,很多人都用这个。

from transreid.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.