Comments (5)
请问这部分复现实验,训练过程是只训练student吗?这样的话和我们论文的目的是不同的。我们论文的目标是构建一个双分支的网络,实际应用时只用加载一个模型,可以在计算时根据计算资源选择分支。而不是加载两个不同大小的模型。
总的来说我们的目标不是小模型的性能越高越好,而是在保证大模型性能的同时尽可能提升其中子分支的性能。
from csd.
复现实验只训练student。
明白你们工作的目的,是一个联合优化问题,不能只看小模型性能。
但是1论文中baseline student也是分开训练的,2从学生的初始信息角度来说不公平,3您也说了目的不同。从这几个角度来看一个单独从scratch训练的模型作为baseline是否合适呢
from csd.
如果从目的角度来考虑的话,我们还比较了两个分支联合训练的方式,结果和文中’baseline‘基本一致所以后续直接拿的’baseline‘作比较。
另外,虽然说我们是pre-trained teacher,但是模型结构和数据都没有变化,需要pre-train只是为了为对比学习提供一个solid的正样本。实际上这个效果和先训练几个没有对比损失的epoch再引入对比学习损失是一致的。
from csd.
baseline student是从头开始训练的(scratch),而csd student用了教师部分channel的预训练权重+CSD,个人觉得这两者比较并不公平。
所以我们复现了student同样载入教师部分权重后与hr 直接进行l1作为baseline,没有用任何蒸馏。初步训练结果如下:
0.25 Set5 32.34541767076197 39.64636575838763 0.8969442045190894 0.25 Set14 28.727600156122755 23.11639569902273 0.7848623259477279 0.25 B100 27.661651828426916 19.849210476012832 0.738650274225103 0.25 Urban100 26.30338385572442 12.644811790998192 0.792405167459895
论文中CSDx4 student指标为:
Set5 32.34 0.8974 Set14 28.72 0.7856 B100 27.68 0.7396 Urban100 26.34 0.7948
从指标看,用不用蒸馏的结果相差并不大。个人觉得很难有说服力吧。
@billionfish 感觉楼主发现新大陆了啊
原来教师权重这么好用的,那直接加载pretrain的部分权重,比写一套蒸馏训练框架方便多了
from csd.
baseline student是从头开始训练的(scratch),而csd student用了教师部分channel的预训练权重+CSD,个人觉得这两者比较并不公平。
所以我们复现了student同样载入教师部分权重后与hr 直接进行l1作为baseline,没有用任何蒸馏。初步训练结果如下:0.25 Set5 32.34541767076197 39.64636575838763 0.8969442045190894 0.25 Set14 28.727600156122755 23.11639569902273 0.7848623259477279 0.25 B100 27.661651828426916 19.849210476012832 0.738650274225103 0.25 Urban100 26.30338385572442 12.644811790998192 0.792405167459895
论文中CSDx4 student指标为:
Set5 32.34 0.8974 Set14 28.72 0.7856 B100 27.68 0.7396 Urban100 26.34 0.7948
从指标看,用不用蒸馏的结果相差并不大。个人觉得很难有说服力吧。
@billionfish 感觉楼主发现新大陆了啊 原来教师权重这么好用的,那直接加载pretrain的部分权重,比写一套蒸馏训练框架方便多了
emmm 可能也仅仅是因为student没训好太弱了
@billionfish 楼主你能复现不加载pretrain的student的指标吗,table1的前两行指标
from csd.
Related Issues (17)
- Can you provide the full text version of the paper?? I cant find it anywhere on the internet
- Can you provide the full text version of the paper?? I cant find it anywhere on the internet HOT 1
- Is it only works on existing models, or eventually it outputs a pertained model based on a reference one? HOT 1
- code HOT 1
- the code is different to the paper HOT 2
- CL negtive sample question HOT 1
- 关于baseline 0.25x HOT 3
- About EDSR+ teacher performance HOT 2
- 你好
- 关于动态分配两种支路的问题 HOT 3
- Some question about speed up HOT 3
- vgg19_ImageNet.ckpt
- 请问论文中的局部子分支的参数量是如何计算得到的?
- About Contrastive Loss
- ImportError: cannot import name 'SIGPIPE' from 'signal'
- 为什么不直接采用hr作为负样本?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from csd.