hkzhang-git / parc-net Goto Github PK

[ECCV 2022] Source code of "EdgeFormer: Improving Light-weight ConvNets by Learning from Vision Transformers"

Python 99.87% Shell 0.13%

parc-net's Introduction

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

ParC-Net ECCV 2022

This reposity was named EdgeFormer, which is changed to ParC-Net, as "Former" indicates that the model is some variant of transformer.

Official PyTorch implementation of ParC-Net

ParC-ConvNext, ParC-MobilenetV2 and ParC-Resnet50 have been uploaded. Please find in ParC-ConvNets

Introduction

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5× parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance.

ParC block

Position aware circular convolution

Experimental results

EdgeFormer-S

Tasks	performance	#params	pretrained models
Classification	78.6 (Top1 acc)	5.0	model
Detection	28.8 (mAP)	5.2	model
Segmentation	79.7 (mIOU)	5.8	model

Inference speed

We deploy the proposed EdgeFormer and baseline on widely used low power chip Rockchip RK3288 and DP chip for comparison. DP is the code name of a in house unpublished low power neural network processor that highly optimizes the convolutions. We use ONNX [1] and MNN to port these models to RK3288 and DP chip and time each model for 100 iterations to measure the average inference speed.

Models	#params (M)	Madds (M)	RK3288 inference speed (ms)	DP (ms)	Top1 acc
MobileViT-S	5.6	2010	457	368	78.4
ParC-Net-S	5.0 (-11%)	1740 (-13%)	353 (+23%)	98 (3.77x)	78.6 (+0.2%)

Applying Edgeformer designs on various lightweight backbones

Classification experiments. CPU used here is Xeon E5-2680 v4. *Authors of EdgeViT do not clarify the type of CPU used in their paper. ** We train ResNet50 with training strategy proposed in ConvNext. ResNet50 achieves 79.1 top 1 accuracy, which is much higher than 76.5 the accuracy reported in the original paper.

Models	# params	Madds	Devices	Speed(ms)	Top1 acc	Source
MobileViT-S	5.6 M	2.0G	RK3288	457	78.4	ICLR 22
ParC-Net-S	5.0 M	1.7G	RK3288	353	78.6	Ours
MobileViT-S	5.6 M	2.0G	DP	368	78.4	ICLR 22
ParC-Net-S	5.0 M	1.7G	DP	98	78.6	Ours
ResNet50	26 M	2.1G	CPU	98	79.1**	CVPR 22 new training setting
ParC-ResNet50	24 M	2.0G	CPU	98	79.6	Ours
MobileNetV2	3.5 M	0.3G	CPU	24	70.2	CVPR 18
ParC-MobileNetV2	3.5 M	0.3G	CPU	27	71.1	Ours
ConvNext-XT	7.4 M	0.6G	CPU	47	77.5	CVPR 22
ParC-ConvNext-XT	7.4 M	0.6G	CPU	48	78.3	Ours
EdgeViT-XS	6.7 M	1.1G	CPU*	54*	77.5	Arxiv 22/05

Detection experiments

Models	# params	AP box	AP50 box	AP75 box	AP mask	AP50 mask	AP75 mask
ConvNext-XT	-	47.2	65.6	51.4	41.0	63.0	44.2
ParC-ConvNext-XT	-	47.7	66.2	52.0	41.5	63.6	44.6
ResNet-50	-	47.5	65.6	51.6	41.1	63.1	44.6
ParC-ResNet-50	-	48.1	66.4	52.3	41.8	64.0	45.1
MobileNetv2	-	43.7	61.9	47.6	37.9	59.1	40.8
ParC-MobileNetv2	-	44.3	62.7	47.8	39.0	60.3	42.1

Segmentation experiments

Models	# params	mIoU	mACC	aACC
ConvNext-XT	-	42.17	54.18	79.72
ParC-ConvNext-XT	-	42.32	54.48	80.30
ResNet-50	-	42.27	52.91	79.88
ParC-ResNet-50	-	43.85	54.66	80.43
MobileNetv2	-	32.80	48.75	74.42
ParC-MobileNetv2	-	35.13	49.64	75.73

ConvNext block and ConvNext-GCC block

In terms of designing a pure ConvNet via learning from ViTs, our proposed ParC-Net is most closely related to a parallel work ConvNext. By comparing ParC-Net with Convnext, we notice that their improvements are different and complementary. To verify this point, we build a combination network, where ParC blocks are used to replace several ConvNext blocks in the end of last two stages. Experiment results show that the replacement signifcantly improves classification accuracy, while slightly decreases the number of parameters. Results on ResNet50, MobileNetV2 and ConvNext-T shows that models which focus on optimizing FLOPs-accuracy trade-offs can also benefit from our ParC-Net designs. Corresponding code will be released soon.

Installation

We implement the ParC-Net with PyTorch-1.9.0, CUDA=11.1.

PiP

The environment can be build in the local python environment using the below command:

pip install -r requirements.txt

Dokcer

A docker image containing the environment will be provided soon.

Training

Training settings are listed in yaml files (./config/classification/xxx/xxxx.yaml, ./config/detection/xxx/xxxx.yaml, ./config/segmentation/xxx/xxxx.yaml )

Classifiction

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main_train.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml

Detection

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --common.config-file --common.config-file config/detection/ssd_edgeformer_s.yaml

Segmentation

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --common.config-file --common.config-file config/segmentation/deeplabv3_edgeformer_s.yaml

Evaluation

Classifiction

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_cls.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml --model.classification.pretrained ./pretrained_models/classification/checkpoint_ema_avg.pt

Detection

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_det.py --common.config-file ./config/detection/edgeformer/ssd_edgeformer_s.yaml --model.detection.pretrained ./pretrained_models/detection/checkpoint_ema_avg.pt --evaluation.detection.mode validation_set --evaluation.detection.resize-input-images

Segmentation

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_seg.py --common.config-file ./config/detection/edgeformer/deeplabv3_edgeformer_s.yaml --model.segmentation.pretrained ./pretrained_models/segmentation/checkpoint_ema_avg.pt --evaluation.segmentation.mode validation_set --evaluation.segmentation.resize-input-images

Acknowledgement

We thank authors of MobileVit for sharing their code. We implement our EdgeFormer based on their source code. If you find this code is helpful in your research, please consider citing our paper and MobileVit

@inproceedings{zhang2022parcnet,
  title={ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer},
  author={Zhang, Haokui and Hu, Wenze and Wang, Xiaoyu},
  booktitle={European Conference on Computer Vision},
  pages={},
  year={2022}
}

@inproceedings{mehta2021mobilevit,
  title={Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={ICLR},
  year={2022}
}

parc-net's People

Stargazers

Watchers

Forkers

cv-ip wyq517 pugangqiang perfyperfect windsorwho lxl24 kelodgsmile huyhoang17 liyunfei0411 hhhuaang wilbur-lqw grandeep shuowang-ai zfy9822 dl-vit gazelei yaochx xiaojake 294coder happysheep224 xinlongye wjgaas qiuhuan serissa yossi-git berumotto-vermouth liu-mengyang jawaechan yesthing wangxder xiongfangjin19 springsprite ouya-bytes zivzone pokoni wangdazhong258 jiashenggu yimingfa zy001122 eaven21 dl-cnn fhfeishi jiangqingfeng1992 river-cold shenxiuqiang17 9vivian88 suntong1999 dl-vit mahatkc blinker25 hownz

parc-net's Issues

替换conv层

哪部分算法可以替换conv层？

想问一下global_kernel_size是根据什么设置的？

算法运行问题

在运行算法时，会报“TypeError: unsupported operand type(s) for %: 'Sequential' and 'Sequential'”问题，是否由于class gcc_dk(nn.Module)中的self.kernel_generate_conv = nn.Sequential(
nn.Conv2d(channel, channel, kernel_size=(3, 1), padding=(1, 0), bias=False, groups=channel),
nn.BatchNorm2d(channel),
nn.Hardswish(),
nn.Conv2d(channel, channel, kernel_size=(3, 1), padding=(1, 0), bias=False, groups=channel),
)出现了问题？

抱歉，打扰您了

想问下您这个网络模型，我可以应用到我特征点提取与匹配作为提取特征的骨干网络吗，谢谢您

训练问题

您好，请教下如何用cpu进行训练？

关于生成训练日志的问题

尊敬的作者您好，这是一篇非常有意义的工作，我在使用您的训练代码的时候，发现并不会生成训练日志，只有在终端上有输出，请问我该如何解决这个问题，期待您的回复！

代码实现与论文中的图是不是不一样？

如题，看了下代码，没有找到图中的 Channel wise attention 呀，而且，ParC-block 也只有一次 ParC-H 和 ParC-V 的 concatenate。求指教~

ParC block 的代码在哪个.py里面？

Global Circular Conv Difference

In your code, there is gcc_ca and gcc_dk, I am interesed in the dynamic kernel, how does it perform? Your default config used gcc_ca only.

The original version is constant kernel size and bilinear interpolate to H or W shape.

Dynamic version is dynamic kernel size using taking average of the input in different axis and conv the result as the kernel which is an interesting idea, and it does seem better?

where does parc defined, is same as edgeformer？

Any plan to release EdgeFormer-XS pretrained weights?

模块问题

这个Meta Former中的Token mixer可以当作一个Pool层，用在CNN架构中吗

Main_train.py

Thanks for your code. Could you release the main_train.py? Thanks again.

ModuleNotFoundError: No module named 'data.sampler'

    您好，感谢您提供了这一富有创造性的想法，但是当我试图运行`main_train.py`时，发现options路径下并没有data文件，这一缺失导致代码第一步就无法正常运行，请您查看下，谢谢

Traceback (most recent call last):
File "main_train.py", line 6, in
from options.opts import get_training_arguments
File "/content/EdgeFormer/options/opts.py", line 3, in
from data.sampler import arguments_sampler
ModuleNotFoundError: No module named 'data.sampler'

How to train our own dataset?

How to train our own dataset? What kind of format should we suppose to use?For example, classification is destination!

请教一下通过parc改进mobilenet部分

您好，实在不好意思打扰一下，有两个问题希望请教一下
（1）请问论文中通过parc改进mobilenet的操作是用代码中的gcc_ca_mf_block去把mobilenet中的InvertedResidualBlock全部给替换了吗，或是其他做法呢？
（2）gcc_ca_mf_block、gcc_dk_mf_block、gcc_dk_ca_mf_block，这三个parc-block，哪一个是论文结果中用来改进Mobilenet等网络的block呢？
十分感谢！

replace the traditional conv layer

Which part of the algorithm can replace the traditional conv layer?

和convnext结合模型

我想自己训练一个模型，使用了convnext的部分卷积网络，看到您的解决方案中，parcnet和convnext结合的效果更好，请问如果我想要这样结合起来作为一个新的模型块，应该怎么做呢？

No module named 'engine'

File "main_train.py", line 15, in
from engine import Trainer
ModuleNotFoundError: No module named 'engine'

Is the 'engine' file missing？

数据集问题

您好，我想请问一下数据集的格式是怎样的？我按照您代码上下载数据集但并不完整

How does EdgeFormer perform on scene text recognition (STR)?

Hi,

Thanks for sharing the code. EdgeFormer is a very interesting architecture.

Do you test the architecture on STR? If so, can you share the testing results?

Thanks :-)

即插即用

您好，看到您的代码后很受启发，想引用您这个项目。但我没看懂哪个模块是即插即用的，还有该如何把这个模块放到我自己的模型中。期待您的回复

ParC and Gcc

Do ParC and Gcc mean the same thing?

请问您使用的是哪个版本的imagenet

How to use gcc with pretrained model?

In paper and code, the meta_kernel_size is fixed in training. The meta_kernel_size is calculated by input size(256 in code), so the inference size should be less than 256. If I want to use the pretrained model in other work with different input size, the meta_kernel_size is incompatible.

When will you release your code?

Hi, I have read your paper, and I am very interested in your works.
When will you plan to release your code?

模块替换

edgeformer_block.py中的 gcc_dk就是readme里边替换了convnext 的dwconv 7*7 结构吗？

Image segmentation dataset problem

I see the following examples of data sets given in your code:
Dataset class for the PASCAL VOC 2012 dataset

    The structure of PASCAL VOC dataset should be something like this
    + pascal_voc/VOCdevkit/VOC2012/
    + --- Annotations
    + --- JPEGImages
    + --- SegmentationClass
    + --- SegmentationClassAug_Visualization/
    + --- ImageSets
    + --- list
    + --- SegmentationClassAug
    + --- SegmentationObject

However, there are only a few files in the dataset I downloaded：

I want to know how these files are generated?

--- SegmentationClassAug_Visualization/
--- list
--- SegmentationClassAug

cannot import name 'check_frozen_norm_layer' from 'utils.common_utils'

Hi author,
When I run the command:
CUDA_VISIBLE_DEVICES=0 python eval_det.py --common.config-file ./config/detection/edgeformer/ssd_edgeformer_s.yaml --model.detection.pretrained ./pretrained_models/detection/checkpoint_ema_avg.pt --evaluation.detection.mode validation_set --evaluation.detection.resize-input-images
I get this error. Can you tell me what to do now?

TypeError: unsupported operand type(s) for /: 'str' and 'int'

关于论文中ParC模块的疑惑

Great Work！您好，我有个问题想请教一下您：这里按我对您的论文的理解，我觉得kernel_size参数应该是（self.kernel_size, 1）。
https://github.com/hkzhang91/ParC-Net/blob/f6780d0ad795835df2181b8fd4737276deb6e211/ParC_ConvNets/ParC_resnet50.py#L129

目标检测数据集问题

插入到当前现有的卷积模型

作者您好，关于您说到Parc block是一个即插即用模型，可以轻松插入到现有的卷积模型.Parc-Res50将一部分3*3卷积替换为Parc-block，那么re50的预训练权重还有使用吗？是直接指定选取部分预训练权重训练？

How to convert .pt to .onnx for segmentation model?

A small question about you paper?

Hello,congratulation your paper has been accepted by ECCV. But I saw your paper's name has been changed.And some module's names also changed. Edgeformer block has changed to PaCC block,GCC-V and GCC-H had changed to PaCC-V and PaCC-H.First in your Figure 1 GCC-H and GCC-V not changed, Second why changed them all to PaCC xxx, ?

图像分割预训练权值文件

您好，请问一下，有.pth的图像分割预训练权值文件吗

数据集加载问题

我想用另外一个分类的数据集，放在模型中去训练，我应该怎么修改呢

picture shape question

Hello! Very interesting work. could I ask you a question? May I ask whether the width and height of the pictures should be the same in Parc-net

3D人体姿态估计

请问该模型能直接用在基于transformer的3D人体姿态估计任务上吗？

在导出onnx模型的时候出现了不支持可变性卷积的问题，请问是需要自定义onnx算子吗

我的环境是python3.8 pytorch 1.13 torchvision 0.14.0 onnx 1.13

Convert to ONNX problem

Hi!

I'm trying to convert EdgeFormer model from .pt to .onnx via your example model_convert2onnx.py, and I get following error:

RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.

when running model_convert2onnx.py, having a problem, where i find a example?

RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.

GCC module

GCC模块是在哪个文件中实现的，我好像没有找到

License?

Hi,
thanks for the awesome work.
Whats the license of this repo? thanks

What is "Sptatial model" meas?

This possible error appears in Figure 1 of the paper. Do you want to spell "sptatial model" as "spatial"?

RuntimeError: stack expects each tensor to be equal size, but got [3, 366, 500] at entry 0 and [3, 335, 500] at entry 1

Hi author,
I am trying the command of "CUDA_VISIBLE_DEVICES=0 python eval_seg.py --common.config-file ./config/detection/edgeformer/deeplabv3_edgeformer_s.yaml --model.segmentation.pretrained ./pretrained_models/segmentation/checkpoint_ema_avg.pt --evaluation.segmentation.mode validation_set --evaluation.segmentation.resize-input-images"
and run into this error. Can you please tell me how to fix it?

关于position embedding

您好，请问您在Parc-net中的Position aware circular convolution的position embedding是如何得到的，在原文中写道“peV is instance position embedding (PE) and it is generated from a base embedding epeV via bilinear interpo-lation function F ()."，其中的base embedding为何是C×B×1的。

想问一下模型转换成onnx的问题

非常感谢作者分享parC net的源码，但我在调用model_convert2onnx.py时，遇见RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.错误，调用命令如下：CUDA_VISIBLE_DEVICES=0 python model_convert2onnx.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml，请问正确的转换设置怎么样的

PE位置编码怎么适应输入图片尺寸的变化

当我把Parc-Block集成进入Yolov5的Backbone时，在训练时使用的图片大小为1280*1280，但是val时图片并未resize，此时PE做维度相加时会出现不匹配的问题，想问一下论文中提到的PE会跟随输入特征图大小进行变化是怎么实现的，具体在哪几行代码