Coder Social home page Coder Social logo

megvii-research / revcol Goto Github PK

View Code? Open in Web Editor NEW
245.0 12.0 10.0 554 KB

Official Code of Paper "Reversible Column Networks" "RevColv2"

License: Apache License 2.0

Python 99.70% Shell 0.30%
cnn computer-vision pytorch transformer iclr2023 mae vit

revcol's Introduction

Reversible Column Networks

This repo is the official implementation of:

Yuxuan Cai, Yizhuang Zhou, Qi Han, Jianjian Sun, Xiangwen Kong, Jun Li, Xiangyu Zhang
MEGVII Technology
International Conference on Learning Representations (ICLR) 2023
[arxiv]

Qi Han, Yuxuan Cai, Xiangyu Zhang
MEGVII Technology
[arxiv]

Updates

9/06/2023*
RevColv2 will be released soon!

3/15/2023*
RevCol Huge checkpoint for segmentation released! Add visualization tools.

3/9/2023*
Detection, Segmentation Code and Model Weights Released.

2/10/2023
RevCol model weights released.

1/21/2023
RevCol was accepted by ICLR 2023!

12/23/2022
Initial commits: codes for ImageNet-1k and ImageNet-22k classification are released.

To Do List

  • ImageNet-1K and 22k Training Code
  • ImageNet-1K and 22k Model Weights
  • Cascade Mask R-CNN COCO Object Detection Code & Model Weights
  • ADE20k Semantic Segmentation Code & Model Weights

Introduction

RevCol is composed of multiple copies of subnetworks, named columns respectively, between which multi-level reversible connections are employed. RevCol coud serves as a foundation model backbone for various tasks in computer vision including classification, detection and segmentation.

Main Results on ImageNet with Pre-trained Models

name pretrain resolution #params FLOPs acc@1 pretrained model finetuned model
RevCol-T ImageNet-1K 224x224 30M 4.5G 82.2 baidu/github -
RevCol-S ImageNet-1K 224x224 60M 9.0G 83.5 baidu/github -
RevCol-B ImageNet-1K 224x224 138M 16.6G 84.1 baidu/github -
RevCol-B* ImageNet-22K 224x224 138M 16.6G 85.6 baidu/github baidu/github
RevCol-B* ImageNet-22K 384x384 138M 48.9G 86.7 baidu/github baidu/github
RevCol-L* ImageNet-22K 224x224 273M 39G 86.6 baidu/github baidu/github
RevCol-L* ImageNet-22K 384x384 273M 116G 87.6 baidu/github baidu/github
RevCol-H*+ Megdata-168M pretrain 224 / finetune 640 2.1B 2537 90.0 huggingface huggingface

[+]: Note that we use a slightly different model on RevCol-H with one more branch from the bottom level to the top one. Later experiments prove that this connection is unnecessary, however, consider RevCol-H's training cost, we do not retrain it.

Getting Started

Please refer to INSTRUCTIONS.md for setting up, training and evaluation details.

Acknowledgement

This repo was inspired by several open source projects. We are grateful for these excellent projects and list them as follows:

License

RevCol is released under the Apache 2.0 license.

Contact Us

If you have any questions about this repo or the original paper, please contact Yuxuan at [email protected].

Citation

@inproceedings{cai2022reversible,
  title={Reversible Column Networks},
  author={Cai, Yuxuan and Zhou, Yizhuang and Han, Qi and Sun, Jianjian and Kong, Xiangwen and Li, Jun and Zhang, Xiangyu},
  booktitle={International Conference on Learning Representations},
  year={2023},
  url={https://openreview.net/forum?id=Oc2vlWU0jFY}
}

revcol's People

Contributors

jacktesla avatar nightsnack avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

revcol's Issues

对于语义分割部分的疑问

作者你好,我拜读了您的论文,在语义分割部分是有multi scale的实验的,但是问题在于如果图片尺寸输入是某些奇数,比如说229或者223,那么上采样和下采样的部分就会不一致x = self.up(c_up) + self.down(c_down),我反复翻看了数据处理部分(mmsegmentaion中ade的config文件),并没有找到对这一错误的处理,于是想请教一下作者是如何处理这一部分的。

TypeError: main() takes 1 positional argument but 3 were given

When I run the following command :

python main.py  --cfg configs/revcol_base_1k_384_finetune.yaml --batch-size 4 \
    --data-path ../data/classification --finetune revcol_base_22k.pth

It outputs this:

=> merge config from configs/revcol_base_1k_384_finetune.yaml
Traceback (most recent call last):
  File "main.py", line 422, in <module>
    main(None, config, ngpus_per_node)
TypeError: main() takes 1 positional argument but 3 were given

How to export onnx model in save_memory=True?

We are trying to convert Revcol to TensorRT format, but when converting to ONNX, we found that when using save_memory=True, the conversion does not work properly.
Here is our conversion test code:

import torch
from models.revcol import *
model = revcol_tiny(save_memory=True, inter_supv=False, drop_path = 0.1, num_classes=10, kernel_size = 3)

for i in range(model.num_subnet):
    getattr(model, f'subnet{str(i)}').save_memory = False

x = torch.zeros(1, 3, 224, 224)
torch.onnx.export(model, x, './weights/revcol_tiny.onnx', verbose=False, opset_version=17,
                        training=torch.onnx.TrainingMode.EVAL,
                        do_constant_folding=True,
                        input_names=['images'],
                        output_names=['output'],
                        dynamic_axes=None) 

When save_memory=True, the following error occurs:

File [d:\SoftWare\anaconda3\envs\torch\lib\site-packages\torch\onnx\utils.py:506](file:///D:/SoftWare/anaconda3/envs/torch/lib/site-packages/torch/onnx/utils.py:506), in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
    188 @_beartype.beartype
    189 def export(
    190     model: Union[torch.nn.Module, torch.jit.ScriptModule, torch.jit.ScriptFunction],
   (...)
    206     export_modules_as_functions: Union[bool, Collection[Type[torch.nn.Module]]] = False,
    207 ) -> None:
    208     r"""Exports a model into ONNX format.
    209 
    210     If ``model`` is not a :class:`torch.jit.ScriptModule` nor a
   (...)
    503             All errors are subclasses of :class:`errors.OnnxExporterError`.
...
    511         '(vmap, grad, jvp, jacrev, ...), it must override the setup_context '
    512         'staticmethod. For more details, please see '
    513         'https://pytorch.org/docs/master/notes/extending.func.html')

RuntimeError: invalid unordered_map<K, T> key

If you add the following code, the export will work, but you should not be able to take advantage of the low memory footprint of Reversible Net.

for i in range(model.num_subnet):
    getattr(model, f'subnet{str(i)}').save_memory = False

Is there any relevant solution?

Best checkpoint saving problem

At L204 in main.py , only when the training hit the SAVE_FREQ will it save the checkpoint and check for whether to save the best checkpoint. The problem is, however, if the model happen to obtain the best accuracy outside the SAVE_FREQ epoch then the best weight will not be saved.

Since most of our team members run the code on a relatively small datasets, we tent to set a large value on SAVE_FREQ to save some hard drive space. This lead to the situation that the training always miss the chance to save the best.pth.

Is this a intended behavior ? Can you please add a arg to control whether it will save the best.pth every time the model obtain the highest accuracy even if it's not on the SAVE_FREQ epoch ?

Huggingface

Thank you so much great work.

I would like to use the RevCol segmentation model with Huggingface.
However, I could not figure out how to use it.

I tried to load the pretraiend model based on the following URL, following the general Huggingface usage, but it did not work.
https://huggingface.co/LarryTsai/RevCol

I would appreciate any help you could give me.
Thank you in advance.

The class “Fusion”

The source code reads:
self.down = nn.Sequential(
nn.Conv2d(channels[level-1], channels[level], kernel_size=2, stride=2),
LayerNorm(channels[level], eps=1e-6, data_format="channels_first"),
) if level in [1, 2, 3] else nn.Identity()
So when level=0 and first_col=True, how do we implement downsampling?

Any latency information to be reported?

Thank you for your impressive work Tsai! I am wondering whether there are any latency comparisons against other convnet/transformer models? Since the network is built by efficient 3x3 convolution and linear operators, it is expected to have better throughputs.

'tools/dist_train.sh'

First of all, I would say I think this is great work and I am very very interested. Then, I want to debug the segmentation task in the repository. But I didn't find the 'tools/dist_train.sh' when I followed the README.md. Any help will be appreciated.

How to solve package [teREVCOLolor] missing message , thanks

Thanks for all your great work.

We've some error-message during Installation like below , and please give us more advice if any , thanks,
'''
ERROR: Could not find a version that satisfies the requirement teREVCOLolor==1.1.0 (from versions: none)
ERROR: No matching distribution found for teREVCOLolor==1.1.0
'''

size mismatch testing revcol_tiny

Hi, congrats on the paper!

I tried testing the code but I get the following error running the below test code:

RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 3

from models.revcol import revcol_tiny
import torch
nt = revcol_tiny(True, num_classes=10)
a = torch.ones([1,3,40,40])
out = nt.forward(a)
print(out.shape)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.