Comments (10)
- 太小的模型(或者有一些特殊operator的模型)可能不太能从mask modeling中受益很多,因为他们的supervised pretraining可能还是欠拟合状态,也就不太能提现self-supervised 优势了
- 也有可能是supervised和self-supervised checkpoint在具体下游任务上finetune的最优超参不一样
- 也可能与下游任务的类型有关
from spark.
再请教一个问题,小模型在自监督训练的时候,会出现loss突然增大的情况,您有遇到过类似情况吗?
"cur_ep": "28/1600", "last_L": 0.5888837552416637
"cur_ep": "29/1600", "last_L": 0.5947143313255203
"cur_ep": "30/1600", "last_L": 0.8912972437866619
"cur_ep": "31/1600", "last_L": 0.6176579332809323
"cur_ep": "32/1600", "last_L": 0.5972666382733802
"cur_ep": "33/1600", "last_L": 0.5940532513269771
"cur_ep": "34/1600", "last_L": 0.5805482207277741
from spark.
印象中没有遇到。用fp16了吗?我猜测也有可能是batchsize或learning rate过大
from spark.
没有用fp16,batchsize是1000左右,比默认的4096小,learning rate是您的代码里面计算得到的
from spark.
可以开源下您resnet50的训练日志吗
from spark.
如果您dataset显著比imagenet小,batchsize1000可能对dataset来说过大
这是 1600 epoch ResNet50 预训练阶段每个 iteration 的 loss 情况
from spark.
您有对比过400或者800轮相比1600轮的效果吗
from spark.
可见我们paper里的ablation部分,另外您或许可以调整--base_lr=1e-4,我们的默认值2e-4或许对您dataset来说过大
from spark.
数据集同样使用的imagenet,只是模型是mobilenet级别的网络,学习率还建议调小吗
from spark.
不是很确定,或许可以尝试调整一下变大变小;另外如果网络中有特殊算子,可能需要手动定义一下它的sparse形式,因为 https://github.com/keyu-tian/SparK/blob/main/pretrain/encoder.py#L39-L110 中只定义了conv2d,maxpooling,avgpooling,bn2d,syncbn,layernorm的sparse形式。例如如果网络中间有linear层也是需要定义一下sparse的(因为输入中有0,经过linear之后0+bias变成非0了,需要对output进行mask归0)
from spark.
Related Issues (20)
- 对比convnextv2 HOT 1
- reducing pre-training to 200 epochs HOT 9
- Tutorial for finetune on my own dataset HOT 1
- Are there any plans to make a port to tensorflow and Keras? HOT 1
- ImageNet finetuning exploding HOT 9
- there is no requirements.txt file. HOT 1
- SparK for semantic segmentation HOT 3
- Resuming ImageNet fine-tuning HOT 2
- About sparse convolution HOT 4
- How to transfer this method to 3D situation. HOT 1
- ConvNext B for reconstruct images HOT 3
- recommend a great library designed for sparse tensors HOT 1
- Can SparK be used for few-shot learning? HOT 2
- SparseBatchNorm2d can not mask correctly ? HOT 3
- A Code Issue About “pretrain/main.py” HOT 2
- SparK ResNet and global feature interaction HOT 8
- ConvNext implementation performance HOT 4
- Increasing batch size HOT 1
- Necessity of Mask Tokens
- The versions of mmdet and mmcv?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark.