Coder Social home page Coder Social logo

Comments (5)

Booooooooooo avatar Booooooooooo commented on September 26, 2024

请问这部分复现实验,训练过程是只训练student吗?这样的话和我们论文的目的是不同的。我们论文的目标是构建一个双分支的网络,实际应用时只用加载一个模型,可以在计算时根据计算资源选择分支。而不是加载两个不同大小的模型。
总的来说我们的目标不是小模型的性能越高越好,而是在保证大模型性能的同时尽可能提升其中子分支的性能。

from csd.

billionfish avatar billionfish commented on September 26, 2024

复现实验只训练student。
明白你们工作的目的,是一个联合优化问题,不能只看小模型性能。

但是1论文中baseline student也是分开训练的,2从学生的初始信息角度来说不公平,3您也说了目的不同。从这几个角度来看一个单独从scratch训练的模型作为baseline是否合适呢

from csd.

Booooooooooo avatar Booooooooooo commented on September 26, 2024

如果从目的角度来考虑的话,我们还比较了两个分支联合训练的方式,结果和文中’baseline‘基本一致所以后续直接拿的’baseline‘作比较。
另外,虽然说我们是pre-trained teacher,但是模型结构和数据都没有变化,需要pre-train只是为了为对比学习提供一个solid的正样本。实际上这个效果和先训练几个没有对比损失的epoch再引入对比学习损失是一致的。

from csd.

splinter21 avatar splinter21 commented on September 26, 2024

baseline student是从头开始训练的(scratch),而csd student用了教师部分channel的预训练权重+CSD,个人觉得这两者比较并不公平。

所以我们复现了student同样载入教师部分权重后与hr 直接进行l1作为baseline,没有用任何蒸馏。初步训练结果如下:

0.25 Set5 32.34541767076197 39.64636575838763 0.8969442045190894
0.25 Set14 28.727600156122755 23.11639569902273 0.7848623259477279
0.25 B100 27.661651828426916 19.849210476012832 0.738650274225103
0.25 Urban100 26.30338385572442 12.644811790998192 0.792405167459895

论文中CSDx4 student指标为:

Set5 32.34 0.8974 
Set14 28.72 0.7856 
B100 27.68 0.7396 
Urban100 26.34 0.7948

从指标看,用不用蒸馏的结果相差并不大。个人觉得很难有说服力吧。

@billionfish 感觉楼主发现新大陆了啊
原来教师权重这么好用的,那直接加载pretrain的部分权重,比写一套蒸馏训练框架方便多了

from csd.

splinter21 avatar splinter21 commented on September 26, 2024

baseline student是从头开始训练的(scratch),而csd student用了教师部分channel的预训练权重+CSD,个人觉得这两者比较并不公平。
所以我们复现了student同样载入教师部分权重后与hr 直接进行l1作为baseline,没有用任何蒸馏。初步训练结果如下:

0.25 Set5 32.34541767076197 39.64636575838763 0.8969442045190894
0.25 Set14 28.727600156122755 23.11639569902273 0.7848623259477279
0.25 B100 27.661651828426916 19.849210476012832 0.738650274225103
0.25 Urban100 26.30338385572442 12.644811790998192 0.792405167459895

论文中CSDx4 student指标为:

Set5 32.34 0.8974 
Set14 28.72 0.7856 
B100 27.68 0.7396 
Urban100 26.34 0.7948

从指标看,用不用蒸馏的结果相差并不大。个人觉得很难有说服力吧。

@billionfish 感觉楼主发现新大陆了啊 原来教师权重这么好用的,那直接加载pretrain的部分权重,比写一套蒸馏训练框架方便多了

emmm 可能也仅仅是因为student没训好太弱了
@billionfish 楼主你能复现不加载pretrain的student的指标吗,table1的前两行指标

from csd.

Related Issues (17)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.