Coder Social home page Coder Social logo

parc-net's Introduction

ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer

中文版

ParC-Net ECCV 2022

This reposity was named EdgeFormer, which is changed to ParC-Net, as "Former" indicates that the model is some variant of transformer.

Official PyTorch implementation of ParC-Net


ParC-ConvNext, ParC-MobilenetV2 and ParC-Resnet50 have been uploaded. Please find in ParC-ConvNets

Introduction

Recently, vision transformers started to show impressive results which outperform large convolution based models significantly. However, in the area of small models for mobile or resource constrained devices, ConvNet still has its own advantages in both performance and model complexity. We propose ParC-Net, a pure ConvNet based backbone model that further strengthens these advantages by fusing the merits of vision transformers into ConvNets. Specifically, we propose position aware circular convolution (ParC), a light-weight convolution op which boasts a global receptive field while producing location sensitive features as in local convolutions. We combine the ParCs and squeeze-exictation ops to form a meta-former like model block, which further has the attention mechanism like transformers. The aforementioned block can be used in plug-and-play manner to replace relevant blocks in ConvNets or transformers. Experiment results show that the proposed ParC-Net achieves better performance than popular light-weight ConvNets and vision transformer based models in common vision tasks and datasets, while having fewer parameters and faster inference speed. For classification on ImageNet-1k, ParC-Net achieves 78.6% top-1 accuracy with about 5.0 million parameters, saving 11% parameters and 13% computational cost but gaining 0.2% higher accuracy and 23% faster inference speed (on ARM based Rockchip RK3288) compared with MobileViT, and uses only 0.5× parameters but gaining 2.7% accuracy compared with DeIT. On MS-COCO object detection and PASCAL VOC segmentation tasks, ParC-Net also shows better performance.

ParC block

Position aware circular convolution

Experimental results

EdgeFormer-S

Tasks performance #params pretrained models
Classification 78.6 (Top1 acc) 5.0 model
Detection 28.8 (mAP) 5.2 model
Segmentation 79.7 (mIOU) 5.8 model

Inference speed

We deploy the proposed EdgeFormer and baseline on widely used low power chip Rockchip RK3288 and DP chip for comparison. DP is the code name of a in house unpublished low power neural network processor that highly optimizes the convolutions. We use ONNX [1] and MNN to port these models to RK3288 and DP chip and time each model for 100 iterations to measure the average inference speed.

Models #params (M) Madds (M) RK3288 inference speed (ms) DP (ms) Top1 acc
MobileViT-S 5.6 2010 457 368 78.4
ParC-Net-S 5.0 (-11%) 1740 (-13%) 353 (+23%) 98 (3.77x) 78.6 (+0.2%)

Applying Edgeformer designs on various lightweight backbones

Classification experiments. CPU used here is Xeon E5-2680 v4. *Authors of EdgeViT do not clarify the type of CPU used in their paper. ** We train ResNet50 with training strategy proposed in ConvNext. ResNet50 achieves 79.1 top 1 accuracy, which is much higher than 76.5 the accuracy reported in the original paper.

Models # params Madds Devices Speed(ms) Top1 acc Source
MobileViT-S 5.6 M 2.0G RK3288 457 78.4 ICLR 22
ParC-Net-S 5.0 M 1.7G RK3288 353 78.6 Ours
MobileViT-S 5.6 M 2.0G DP 368 78.4 ICLR 22
ParC-Net-S 5.0 M 1.7G DP 98 78.6 Ours
ResNet50 26 M 2.1G CPU 98 79.1** CVPR 22 new training setting
ParC-ResNet50 24 M 2.0G CPU 98 79.6 Ours
MobileNetV2 3.5 M 0.3G CPU 24 70.2 CVPR 18
ParC-MobileNetV2 3.5 M 0.3G CPU 27 71.1 Ours
ConvNext-XT 7.4 M 0.6G CPU 47 77.5 CVPR 22
ParC-ConvNext-XT 7.4 M 0.6G CPU 48 78.3 Ours
EdgeViT-XS 6.7 M 1.1G CPU* 54* 77.5 Arxiv 22/05

Detection experiments

Models # params AP box AP50 box AP75 box AP mask AP50 mask AP75 mask
ConvNext-XT - 47.2 65.6 51.4 41.0 63.0 44.2
ParC-ConvNext-XT - 47.7 66.2 52.0 41.5 63.6 44.6
ResNet-50 - 47.5 65.6 51.6 41.1 63.1 44.6
ParC-ResNet-50 - 48.1 66.4 52.3 41.8 64.0 45.1
MobileNetv2 - 43.7 61.9 47.6 37.9 59.1 40.8
ParC-MobileNetv2 - 44.3 62.7 47.8 39.0 60.3 42.1

Segmentation experiments

Models # params mIoU mACC aACC
ConvNext-XT - 42.17 54.18 79.72
ParC-ConvNext-XT - 42.32 54.48 80.30
ResNet-50 - 42.27 52.91 79.88
ParC-ResNet-50 - 43.85 54.66 80.43
MobileNetv2 - 32.80 48.75 74.42
ParC-MobileNetv2 - 35.13 49.64 75.73

ConvNext block and ConvNext-GCC block

In terms of designing a pure ConvNet via learning from ViTs, our proposed ParC-Net is most closely related to a parallel work ConvNext. By comparing ParC-Net with Convnext, we notice that their improvements are different and complementary. To verify this point, we build a combination network, where ParC blocks are used to replace several ConvNext blocks in the end of last two stages. Experiment results show that the replacement signifcantly improves classification accuracy, while slightly decreases the number of parameters. Results on ResNet50, MobileNetV2 and ConvNext-T shows that models which focus on optimizing FLOPs-accuracy trade-offs can also benefit from our ParC-Net designs. Corresponding code will be released soon.

Installation

We implement the ParC-Net with PyTorch-1.9.0, CUDA=11.1.

PiP

The environment can be build in the local python environment using the below command:

pip install -r requirements.txt

Dokcer

A docker image containing the environment will be provided soon.

Training

Training settings are listed in yaml files (./config/classification/xxx/xxxx.yaml, ./config/detection/xxx/xxxx.yaml, ./config/segmentation/xxx/xxxx.yaml )

Classifiction

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main_train.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml

Detection

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --common.config-file --common.config-file config/detection/ssd_edgeformer_s.yaml

Segmentation

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0,1,2,3 python main_train.py --common.config-file --common.config-file config/segmentation/deeplabv3_edgeformer_s.yaml

Evaluation

Classifiction

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_cls.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml --model.classification.pretrained ./pretrained_models/classification/checkpoint_ema_avg.pt

Detection

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_det.py --common.config-file ./config/detection/edgeformer/ssd_edgeformer_s.yaml --model.detection.pretrained ./pretrained_models/detection/checkpoint_ema_avg.pt --evaluation.detection.mode validation_set --evaluation.detection.resize-input-images

Segmentation

cd EdgeFormer-main
CUDA_VISIBLE_DEVICES=0 python eval_seg.py --common.config-file ./config/detection/edgeformer/deeplabv3_edgeformer_s.yaml --model.segmentation.pretrained ./pretrained_models/segmentation/checkpoint_ema_avg.pt --evaluation.segmentation.mode validation_set --evaluation.segmentation.resize-input-images

Acknowledgement

We thank authors of MobileVit for sharing their code. We implement our EdgeFormer based on their source code. If you find this code is helpful in your research, please consider citing our paper and MobileVit

@inproceedings{zhang2022parcnet,
  title={ParC-Net: Position Aware Circular Convolution with Merits from ConvNets and Transformer},
  author={Zhang, Haokui and Hu, Wenze and Wang, Xiaoyu},
  booktitle={European Conference on Computer Vision},
  pages={},
  year={2022}
}
@inproceedings{mehta2021mobilevit,
  title={Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={ICLR},
  year={2022}
}

parc-net's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parc-net's Issues

算法运行问题

在运行算法时,会报“TypeError: unsupported operand type(s) for %: 'Sequential' and 'Sequential'”问题,是否由于class gcc_dk(nn.Module)中的self.kernel_generate_conv = nn.Sequential(
nn.Conv2d(channel, channel, kernel_size=(3, 1), padding=(1, 0), bias=False, groups=channel),
nn.BatchNorm2d(channel),
nn.Hardswish(),
nn.Conv2d(channel, channel, kernel_size=(3, 1), padding=(1, 0), bias=False, groups=channel),
)出现了问题?

抱歉,打扰您了

想问下您这个网络模型,我可以应用到我特征点提取与匹配作为提取特征的骨干网络吗,谢谢您

训练问题

您好,请教下如何用cpu进行训练?

关于生成训练日志的问题

尊敬的作者您好,这是一篇非常有意义的工作,我在使用您的训练代码的时候,发现并不会生成训练日志,只有在终端上有输出,请问我该如何解决这个问题,期待您的回复!

Global Circular Conv Difference

In your code, there is gcc_ca and gcc_dk, I am interesed in the dynamic kernel, how does it perform? Your default config used gcc_ca only.

The original version is constant kernel size and bilinear interpolate to H or W shape.

Dynamic version is dynamic kernel size using taking average of the input in different axis and conv the result as the kernel which is an interesting idea, and it does seem better?

模块问题

这个Meta Former中的Token mixer可以当作一个Pool层,用在CNN架构中吗

Main_train.py

Thanks for your code. Could you release the main_train.py? Thanks again.

ModuleNotFoundError: No module named 'data.sampler'

    您好,感谢您提供了这一富有创造性的想法,但是当我试图运行`main_train.py`时,发现options路径下并没有data文件,这一缺失导致代码第一步就无法正常运行,请您查看下,谢谢

Traceback (most recent call last):
File "main_train.py", line 6, in
from options.opts import get_training_arguments
File "/content/EdgeFormer/options/opts.py", line 3, in
from data.sampler import arguments_sampler
ModuleNotFoundError: No module named 'data.sampler'

How to train our own dataset?

How to train our own dataset? What kind of format should we suppose to use?For example, classification is destination!

请教一下通过parc改进mobilenet部分

您好,实在不好意思打扰一下,有两个问题希望请教一下
(1)请问论文中通过parc改进mobilenet的操作是用代码中的gcc_ca_mf_block去把mobilenet中的InvertedResidualBlock全部给替换了吗,或是其他做法呢?
(2)gcc_ca_mf_block、gcc_dk_mf_block、gcc_dk_ca_mf_block,这三个parc-block,哪一个是论文结果中用来改进Mobilenet等网络的block呢?
十分感谢!

和convnext结合模型

我想自己训练一个模型,使用了convnext的部分卷积网络,看到您的解决方案中,parcnet和convnext结合的效果更好,请问如果我想要这样结合起来作为一个新的模型块,应该怎么做呢?

No module named 'engine'

File "main_train.py", line 15, in
from engine import Trainer
ModuleNotFoundError: No module named 'engine'

Is the 'engine' file missing?

数据集问题

您好,我想请问一下数据集的格式是怎样的?我按照您代码上下载数据集但并不完整

即插即用

您好,看到您的代码后很受启发,想引用您这个项目。但我没看懂哪个模块是即插即用的,还有该如何把这个模块放到我自己的模型中。期待您的回复

How to use gcc with pretrained model?

In paper and code, the meta_kernel_size is fixed in training. The meta_kernel_size is calculated by input size(256 in code), so the inference size should be less than 256. If I want to use the pretrained model in other work with different input size, the meta_kernel_size is incompatible.

模块替换

edgeformer_block.py中的 gcc_dk就是readme里边替换了convnext 的dwconv 7*7 结构吗?

Image segmentation dataset problem

I see the following examples of data sets given in your code:
Dataset class for the PASCAL VOC 2012 dataset

    The structure of PASCAL VOC dataset should be something like this
    + pascal_voc/VOCdevkit/VOC2012/
    + --- Annotations
    + --- JPEGImages
    + --- SegmentationClass
    + --- SegmentationClassAug_Visualization/
    + --- ImageSets
    + --- list
    + --- SegmentationClassAug
    + --- SegmentationObject

However, there are only a few files in the dataset I downloaded:
image

I want to know how these files are generated?

  • --- SegmentationClassAug_Visualization/
  • --- list
  • --- SegmentationClassAug

cannot import name 'check_frozen_norm_layer' from 'utils.common_utils'

Hi author,
When I run the command:
CUDA_VISIBLE_DEVICES=0 python eval_det.py --common.config-file ./config/detection/edgeformer/ssd_edgeformer_s.yaml --model.detection.pretrained ./pretrained_models/detection/checkpoint_ema_avg.pt --evaluation.detection.mode validation_set --evaluation.detection.resize-input-images
I get this error. Can you tell me what to do now?

插入到当前现有的卷积模型

作者您好,关于您说到Parc block是一个即插即用模型,可以轻松插入到现有的卷积模型.Parc-Res50将一部分3*3卷积替换为Parc-block,那么re50的预训练权重还有使用吗?是直接指定选取部分预训练权重训练?

A small question about you paper?

Hello,congratulation your paper has been accepted by ECCV. But I saw your paper's name has been changed.And some module's names also changed. Edgeformer block has changed to PaCC block,GCC-V and GCC-H had changed to PaCC-V and PaCC-H.First in your Figure 1 GCC-H and GCC-V not changed, Second why changed them all to PaCC xxx, ?

数据集加载问题

我想用另外一个分类的数据集,放在模型中去训练,我应该怎么修改呢

picture shape question

Hello! Very interesting work. could I ask you a question? May I ask whether the width and height of the pictures should be the same in Parc-net

3D人体姿态估计

请问该模型能直接用在基于transformer的3D人体姿态估计任务上吗?

Convert to ONNX problem

Hi!

I'm trying to convert EdgeFormer model from .pt to .onnx via your example model_convert2onnx.py, and I get following error:

RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.

GCC module

GCC模块是在哪个文件中实现的,我好像没有找到

License?

Hi,
thanks for the awesome work.
Whats the license of this repo? thanks

RuntimeError: stack expects each tensor to be equal size, but got [3, 366, 500] at entry 0 and [3, 335, 500] at entry 1

Hi author,
I am trying the command of "CUDA_VISIBLE_DEVICES=0 python eval_seg.py --common.config-file ./config/detection/edgeformer/deeplabv3_edgeformer_s.yaml --model.segmentation.pretrained ./pretrained_models/segmentation/checkpoint_ema_avg.pt --evaluation.segmentation.mode validation_set --evaluation.segmentation.resize-input-images"
and run into this error. Can you please tell me how to fix it?

关于position embedding

您好,请问您在Parc-net中的Position aware circular convolution的position embedding是如何得到的, 在原文中写道“peV is instance position embedding (PE) and it is generated from a base embedding epeV via bilinear interpo-lation function F ().",其中的base embedding为何是C×B×1的。

想问一下模型转换成onnx的问题

非常感谢作者分享parC net的源码,但我在调用model_convert2onnx.py时,遇见RuntimeError: Unsupported: ONNX export of convolution for kernel of unknown shape.错误,调用命令如下:CUDA_VISIBLE_DEVICES=0 python model_convert2onnx.py --common.config-file ./config/classification/edgeformer/edgeformer_s.yaml,请问正确的转换设置怎么样的

PE位置编码怎么适应输入图片尺寸的变化

当我把Parc-Block集成进入Yolov5的Backbone时,在训练时使用的图片大小为1280*1280,但是val时图片并未resize,此时PE做维度相加时会出现不匹配的问题,想问一下论文中提到的PE会跟随输入特征图大小进行变化是怎么实现的,具体在哪几行代码

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.