Coder Social home page Coder Social logo

d-li14 / involution Goto Github PK

View Code? Open in Web Editor NEW
1.3K 15.0 175.0 499 KB

[CVPR 2021] Involution: Inverting the Inherence of Convolution for Visual Recognition, a brand new neural operator

Home Page: https://arxiv.org/abs/2103.06255

License: MIT License

Python 100.00%
involution operator pytorch image-classification object-detection instance-segmentation semantic-segmentation cvpr2021 pre-trained-model

involution's Introduction

Duo Li's github stats

involution's People

Contributors

d-li14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

involution's Issues

Two questions

Very interesting paper! I am wondering the difference between the involution operator and the point processing of PointNet (Qi 2016). Also, how is the accuracy if the average pooling layers are all replaced by max pooling? Thanks!

The code in Algorithm 1

Hello,in your paper,the Algorithm I you wrtie
#################### forward pass ####################
x_unfolded = unfold(x) # B,CxKxK,HxW
x_unfolded = x_unfolded.view(B, G, C//G, K * K, H, W)

Howerver, in the PyTorh,the true shape of x_unfolded may be B,KxKxC,HxW
so as for me,the code maybe as followings:
x_unfolded = unfold(x) # B,KxKxC,HxW
x_unfolded = x_unfolded.view(B, K * K, G, C//G,, H, W).permute(0,2,3,1,4,5)

Reproduction of results?

I use the rednet50.pth in this repo to train fater rcnn in mmdetection, but the training loss is nan.

out_channel

你好,如果我想自己定义out_channel我应该怎么办呢

involution_cuda代码解析

我想把involution整合到我的代码中,但involution_cuda有点难以理解,有大佬可以解析一下相关代码吗。

Concerns about the repository/model

I have read your paper and was very excited about the results you report. However, after having used your repo, I am left with some concerns:

  1. The code apparently does not work for mixed precision training. Are there any plans to extend it to work for this? As you can imagine, mixed precision training is quite important when working with limited resources.
  2. It appears memory usage during training is substantially higher for RedNet 26 as compared to ResNet26. Is this expected behavior? I don't believe this was mentioned in the paper, so I would just like to make sure that this is not an issue from my part.

Thanks a lot in advance.

Nan

when I use I meet
image

Which Cupy/CUDA version is required to run the involution_cuda?

Which Cupy/CUDA version is required to run the involution_cuda? I installed the following with conda:

cudatoolkit               11.0.221             h6bb024c_0    anaconda
cudnn                     8.1.0.77             h90431f1_0    conda-forge
cupy                      9.0.0            py37h4fdb0f7_1    conda-forge
cutensor                  1.2.2.5              h96e36e3_3    conda-forge

And although I can import cupy successfully, the following error occurs when I try to use involution_cuda:

cupy.cuda.compiler.CompileException: /tmp/tmpkhpxkpnj/817e920ba5cfc69af1016cdb64b60bad_2.cubin.cu(6): error: identifier "None" is undefined

/tmp/tmpkhpxkpnj/817e920ba5cfc69af1016cdb64b60bad_2.cubin.cu(6): error: identifier "None" is undefined

/tmp/tmpkhpxkpnj/817e920ba5cfc69af1016cdb64b60bad_2.cubin.cu(6): error: identifier "None" is undefined

/tmp/tmpkhpxkpnj/817e920ba5cfc69af1016cdb64b60bad_2.cubin.cu(13): error: identifier "None" is undefined

4 errors detected in the compilation of "/tmp/tmpkhpxkpnj/817e920ba5cfc69af1016cdb64b60bad_2.cubin.cu".

Would appreciate any help, thanks!

What version of mmdet does det support?

I use mmdet2.7, but it doesn't work.


Traceback (most recent call last):
  File "/home/ubuntu/bigdisk/part1/mmdet2/tools/train.py", line 185, in <module>
    main()
  File "/home/ubuntu/bigdisk/part1/mmdet2/tools/train.py", line 159, in main
    cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/mmcv/utils/config.py", line 363, in __getattr__
    return getattr(self._cfg_dict, name)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/mmcv/utils/config.py", line 43, in __getattr__
    raise ex
AttributeError: 'ConfigDict' object has no attribute 'train_cfg'

No module named 'mmdet.models.dense_heads.rpn_test_mixin'

Just like the title,when I used mmdetecion to run this " python demo/image_demo.py".

I got the error report, as follows:

Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.

TorchVision: 0.6.0+cu101
OpenCV: 4.5.4-dev
MMCV: 1.3.8
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.1
MMDetection: 2.18.0+

How you installed PyTorch [e.g., pip, conda, source]
from source

Error traceback
Traceback (most recent call last):
File "demo/image_demo.py", line 5, in
from mmdet.apis import (async_inference_detector, inference_detector,
File "/home/YJJ/CodeReuse/mmdetection-master/mmdet/apis/init.py", line 2, in
from .inference import (async_inference_detector, inference_detector,
File "/home/YJJ/CodeReuse/mmdetection-master/mmdet/apis/inference.py", line 12, in
from mmdet.datasets import replace_ImageToTensor
File "/home/YJJ/CodeReuse/mmdetection-master/mmdet/datasets/init.py", line 12, in
from .utils import (NumClassCheckHook, get_loading_pipeline,
File "/home/YJJ/CodeReuse/mmdetection-master/mmdet/datasets/utils.py", line 9, in
from mmdet.models.dense_heads import GARPNHead, RPNHead, RPNHead_involution
File "/home/YJJ/CodeReuse/mmdetection-master/mmdet/models/init.py", line 7, in
from .dense_heads import * # noqa: F401,F403
File "/home/YJJ/CodeReuse/mmdetection-master/mmdet/models/dense_heads/init.py", line 24, in
from .rpn_head_involution import RPNHead_involution
File "/home/YJJ/CodeReuse/mmdetection-master/mmdet/models/dense_heads/rpn_head_involution.py", line 13, in
from .rpn_test_mixin import RPNTestMixin
ModuleNotFoundError: No module named 'mmdet.models.dense_heads.rpn_test_mixin'

使用involution_cuda.py,发生了错误

想yolov5在中使用involution代替conv,使用involution_naive.py 的时候,运行是没问题的。
但是使用involution_cuda.py,在_involution_cuda函数处出现了问题,报出了NotImplementedError。
恳请作者帮忙看看。

Traceback (most recent call last):
File "train.py", line 532, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 87, in train
model = Model(opt.cfg, ch=3, nc=nc).to(device) # create
File "/home/admin/xiewei/yolov5/models/yolo.py", line 88, in init
m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward
File "/home/admin/xiewei/yolov5/models/yolo.py", line 118, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/home/admin/xiewei/yolov5/models/yolo.py", line 134, in forward_once
x = m(x) # run
File "/home/admin/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/admin/xiewei/yolov5/models/experimental.py", line 286, in forward
out = _involution_cuda(x, weight, stride=self.stride, padding=(self.kernel_size-1)//2)
File "/home/admin/xiewei/yolov5/models/experimental.py", line 247, in _involution_cuda
raise NotImplementedError
NotImplementedError

[feature request] involution 3D

Hi. The work on involution is awesome and I'd like to try them on medical imaging that requires 3D operation.
I was wondering if you could spare some time to implement the 3D version fo involution_cuda?
I found the involution_naive could not be extended to 3D because the torch.fold and torch.unfold only support 4d tensor.
In previous issues, authors also mentioned that the naive version was slower and consumed more memory. I guess that the problem will be aggravated for 3D version.
Looking forward to trying this new ops. Thank you.

bug

作者,您好!
我成功的在yolo中使用了det/mmdet/models/utils/involution_naive.py。但是,在使用involution_cuda.py 的过程中碰到了麻烦。
我解决了上一个问题 https://github.com/d-li14/involution/issues/15,但是遇到了新的问题,问题如下:
from n params module arguments
0 -1 1 3520 models.common.Focus [3, 32, 3]
1 -1 1 1154 models.experimental.involution [32, 7, 2]
2 -1 1 16768 models.common.C3 [32, 64, 1]
3 -1 1 73984 models.common.Conv [64, 128, 3, 2]
4 -1 1 156928 models.common.C3 [128, 128, 3]
5 -1 1 295424 models.common.Conv [128, 256, 3, 2]
6 -1 1 625152 models.common.C3 [256, 256, 3]
7 -1 1 1180672 models.common.Conv [256, 512, 3, 2]
8 -1 1 656896 models.common.SPP [512, 512, [5, 9, 13]]
9 -1 1 1182720 models.common.C3 [512, 512, 1, False]
10 -1 1 131584 models.common.Conv [512, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 1 361984 models.common.C3 [512, 256, 1, False]
14 -1 1 33024 models.common.Conv [256, 128, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 1 90880 models.common.C3 [256, 128, 1, False]
18 -1 1 147712 models.common.Conv [128, 128, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 1 296448 models.common.C3 [256, 256, 1, False]
21 -1 1 590336 models.common.Conv [256, 256, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 1 1182720 models.common.C3 [512, 512, 1, False]
24 [17, 20, 23] 1 229245 models.yolo.Detect [80, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model Summary: 288 layers, 7257151 parameters, 7257151 gradients, 16.1 GFLOPS

Scaled weight_decay = 0.0005
Optimizer groups: 63 .bias, 63 conv.weight, 59 other
train: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|████| 128/128 [00:00<?, ?it/sPlotting labels... coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
train: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]
val: Scanning '../coco128/labels/train2017.cache' for images and labels... 128 found, 0 missing, 2 empty, 0 corrupted: 100%|██████| 128/128 [00:00<?, ?it/s]

autoanchor: Analyzing anchors... anchors/target = 4.26, Best Possible Recall (BPR) = 0.9946
Image sizes 640 train, 640 test
Using 2 dataloader workers
Logging results to runs/train/exp22
Starting training for 300 epochs...

 Epoch   gpu_mem       box       obj       cls     total   targets  img_size

0%| | 0/64 [00:05<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 532, in
train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 297, in train
pred = model(imgs) # forward
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/dhh135246/anaconda3/envs/pytorch1.7.0-gpu/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: init() missing 3 required positional arguments: 'source', 'name', and 'options'

希望能得到您解决此问题的建议,谢谢!

Why use 7x7 involution but not 3x3 involution?

Congratulation! Nice work in rethink conv module!
A question is why you use 7x7 involution to instead the BottleNeck module's 3x3 convolution? Why not use 3x3 involution?
In my view, the modern CNN architecture randomly use large kernel like 5x5 or 7x7. And i just wondering the reason of large kernel involution.

cuda

image
Hello, thank you very much for your work. I would like to ask what is the difference between these two. Is it feasible to use naive version without configuring CUDA environment

CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: too many resources requested for launch

After I fixed the last problem, I encountered this problem:

log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]
work_dir = '/home/ubuntu/mmdet2.x_model/faster_rcnn_red50_neck_fpn_1x_voc_SIXray'
gpu_ids = range(0, 1)

2021-03-17 16:17:23,116 - mmdet - INFO - Start running, host: ubuntu@mcj, work_dir: /home/ubuntu/mmdet2.x_model/faster_rcnn_red50_neck_fpn_1x_voc_SIXray
2021-03-17 16:17:23,117 - mmdet - INFO - workflow: [('train', 1)], max: 12 epochs
Traceback (most recent call last):
  File "/home/ubuntu/bigdisk/part1/mmdet2/tools/train.py", line 185, in <module>
    main()
  File "/home/ubuntu/bigdisk/part1/mmdet2/tools/train.py", line 181, in main
    meta=meta)
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/apis/train.py", line 150, in train_detector
    runner.run(data_loaders, cfg.workflow, cfg.total_epochs)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 125, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
    self.run_iter(data_batch, train_mode=True)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
    **kwargs)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/detectors/base.py", line 246, in train_step
    losses = self(**data)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/detectors/base.py", line 180, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/detectors/two_stage.py", line 142, in forward_train
    x = self.extract_feat(img)
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/detectors/two_stage.py", line 82, in extract_feat
    x = self.backbone(img)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/backbones/rednet.py", line 592, in forward
    x = self.stem(x)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "/home/ubuntu/anaconda3/envs/mmdet2.x/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/utils/involution_cuda.py", line 278, in forward
    out = _involution_cuda(x, weight, stride=self.stride, padding=(self.kernel_size-1)//2)
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/utils/involution_cuda.py", line 235, in _involution_cuda
    out = _involution.apply(input, weight, _pair(stride), _pair(padding), _pair(dilation))
  File "/home/ubuntu/bigdisk/part1/mmdet2/mmdet/models/utils/involution_cuda.py", line 171, in forward
    stream=Stream(ptr=torch.cuda.current_stream().cuda_stream))
  File "cupy/cuda/function.pyx", line 182, in cupy.cuda.function.Function.__call__
  File "cupy/cuda/function.pyx", line 164, in cupy.cuda.function._launch
  File "cupy_backends/cuda/api/driver.pyx", line 299, in cupy_backends.cuda.api.driver.launchKernel
  File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: too many resources requested for launch

CUDA version error: the provided PTX was compiled with an unsupported toolchain.

Thanks your work, it's very useful, and I've used involution Pytorch version for a while, but recently I need to try a bigger feature than before, and my GPU memory is not enough, so I try to use CUDA version, but here is an error I couldn't find a solution on the internet:
File "cupy/cuda/function.pyx", line 241, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 243, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_UNSUPPORTED_PTX_VERSION: the provided PTX was compiled with an unsupported toolchain.

it seems is a CUDA version error, but I really don't know how to solve it, could you please help me about this?
Here is my CUDA info:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Mon_Oct_12_20:09:46_PDT_2020
Cuda compilation tools, release 11.1, V11.1.105
Build cuda_11.1.TC455_06.29190527_0

NVIDIA-SMI 455.23.04 Driver Version: 455.23.04 CUDA Version: 11.1

Thanks a lot!

Missing mmcv

Hi, in rednet.py:
from mmcv.cnn import (ConvModule, build_conv_layer, build_norm_layer, constant_init, kaiming_init)
But the code seems to lack mmcv

batch_size error

Hi, when conducting the classification experiment on imagenet, the default batch_size is 32 x 8 on my machine if I follow the instructions. In your training log, the batch_size is 32 x 64.
Why is this difference?

Version: latest mmcv,mmcls, mmdet as well as pytorch 1.7.1, torchvision 0.8.2.

关于这篇工作的一点理解

哈哈又是我,英语有点蹩脚,就直接拿中文了

我的理解是以前的标准卷积,其卷积核在空间域上是共享的(比如一个3x3卷积核直接滑窗滑过去),而在通道域上是互相独立的。

而你们的想法是用两个像bottleneck结构的1x1标准卷积生成一个卷积核权重,形状为(N, Groups, Kernelsize, Kernelsize, H, w)。然后对输入做一个分组,卷积

而因为之前的两次1x1卷积已经聚合了通道上的信息,所以这里没有对通道进行独立(我猜也是为了节省计算量?),而是每一组的特征图在通道上共享卷积核,但在H, W空间维度上分别独立有一个卷积核

下面是我画的一个示意图

image

如我理解有误,还望作者指证,十分感谢

CUDA vs Naive Speedup?

@d-li14 hi, thanks for your contributions and for this amazing idea!

I'd like to try your involution() module in a non-mmdetection repo (YOLOv5), and was trying to figure out the best technical way to do this using your existing code here:

The naive implementation seems easier to integrate into new works, so I'd like to use that, and my main question is:
How much of a speed change do you see in training (and inference) when moving from naive to cuda? Thanks!

why must the feature maps maintain the same size H*W??

If i have not understood Involution, it always keep the same size of the input. That is :
Input shape: (B, C, W, H)
Output shape: (B, C, W, H)
I also confirm this by Involution2d in your Involution.py.
if I use dilation=k >1 , kernel size =(1,1), that means I have to use padding=1 to keep the image (or feature map) the same size?
In fact, in your code, that means there are H*W patches (kernels):

batch_size, in_channels, height, width = input.shape
# Unfold and reshape input tensor
input_unfolded = self.unfold(self.initial_mapping(input))
input_unfolded = input_unfolded.view(batch_size, self.groups, self.out_channels // self.groups,
                                             self.kernel_size[0] * self.kernel_size[1], height, width)
kernel = self.span_mapping(self.sigma_mapping(self.reduce_mapping(self.o_mapping(input))))
kernel = kernel.view(
            batch_size, self.groups, self.kernel_size[0] * self.kernel_size[1], height, width).unsqueeze(dim=2)

However, I think it does not make sense, I think it should keep the features of convolution that feature maps can shrink by Involution kernels. For example:
微信图片_20210508203212

Sorry for my poor draft, thanks a lot if you can reply me!

cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

@d-li14 Hi,

I am using involution_cuda.py to replace convolution with involution module you provide in this repo. The training process is totally fine. However, I will encounter this error when doing evaluation. I have no idea about what causes this error and how to solve it.

Traceback (most recent call last):
File "extract_emb.py", line 100, in
main()
File "extract_emb.py", line 96, in main
store_emb(model, args)
File "extract_emb.py", line 30, in store_emb
output = model(data)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "models/rednet.py", line 126, in forward
out = self.layer3(out)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/
nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "models/rednet.py", line 58, in forward
out = F.relu(self.bn2(self.conv2(out)))
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/torch/
nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "models/involution_cuda.py", line 278, in fo
rward
out = _involution_cuda(x, weight, stride=self.stride, padding=(self.kernel_size - 1) // 2)
File "models/involution_cuda.py", line 235, in _i
nvolution_cuda
out = _involution.apply(input, weight, _pair(stride), _pair(padding), _pair(dilation))
File "models/involution_cuda.py", line 167, in fo
rward
pad_h=padding[0], pad_w=padding[1])
File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
File "models/involution_cuda.py", line 27, in loa
d_kernel
kernel_code = cupy.cuda.compile_with_cache(code)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/cupy/c
uda/compiler.py", line 376, in compile_with_cache
cache_in_memory)
File "local/miniconda3/envs/pytorch/lib/python3.7/site-packages/cupy/c
uda/compiler.py", line 431, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 222, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 224, in cupy.cuda.function.Module.load
File "cupy_backends/cuda/api/driver.pyx", line 246, in cupy_backends.cuda.api.driver.moduleLoadData
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered
Exception ignored in: 'cupy.cuda.function.Module.dealloc'
Traceback (most recent call last):
File "cupy_backends/cuda/api/driver.pyx", line 253, in cupy_backends.cuda.api.driver.moduleUnload
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

The calculation of the FLOPs

This is an interesting work, but I wonder why the FLOPs of RedNet are less than the ResNet when using the same number of layers. It seems that comparing with ResNet, RedNet will additionally generate the involution weights, then slide through the feature map just like convolution operation.

How about replacing involution for convolution in BasicBlock?

Hi,

great work! I have run simple experiments with RedNet on CIFAR10. I notice that all RedNets in the paper use Bottleneck block for ResLayer. I've tried to replace the second 3x3 convolution in BasicBlock with involution to get RedNet18, RedNet34 from ResNet18, ResNet34 respectively . But the performance decreases according to my results. The accuracy of ResNet18 on CIFAR10 is 93.9%, while the RedNet18 is 92.1%.

I've noticed one sentence in the paper, which says "Indispensably, linear transformations (realized by 1×1 convolutions) are interspersed for channel information exchange". So, I am wondering if involution can succeed in BasicBlock for ResLayer. Or is 1×1 convolution an indispensable part for involution's success since there is no 1×1 convolution in BasicBlock of ResNet18?

Thank!

Is it equivalent to a specific form of "attention"?

Thanks for your interesting idea!

I have not looked into the entire code yet, but from the code for Involution, it is kind of like attention (connect one pixel with only the K*K neighboring pixels). How did you use Involution in a specific framework? I mean what did you use it to replace for in, for example, ResNet? Or did you just build the Involution/"attention" on the existing framework without removing any existing things (It is highly likely not since you claimed your network uses less parameters)?

Thanks!

train model

error:
ModuleNotFoundError: No module named 'mmdet.models.dense_heads .rpn_test_mixin'

I didn't see the file about rpn_test_mixin

CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: too many resources requested for launch

Traceback (most recent call last):
File "tools/train.py", line 163, in
main()
File "tools/train.py", line 159, in main
meta=meta)
File "/home/zxl/mm/mmsegmentation/mmseg/apis/train.py", line 116, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
self.call_hook('after_train_iter')
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 308, in call_hook
getattr(hook, fn_name)(self)
File "/home/zxl/mm/mmsegmentation/mmseg/core/evaluation/eval_hooks.py", line 89, in after_train_iter
gpu_collect=self.gpu_collect)
File "/home/zxl/mm/mmsegmentation/mmseg/apis/test.py", line 140, in multi_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/base.py", line 124, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/base.py", line 106, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 265, in simple_test
seg_logit = self.inference(img, img_meta, rescale)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 250, in inference
seg_logit = self.whole_inference(img, img_meta, rescale)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 217, in whole_inference
seg_logit = self.encode_decode(img, img_meta)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 87, in encode_decode
x = self.extract_feat(img)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 79, in extract_feat
x = self.backbone(img)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/backbones/rednet.py", line 456, in forward
x = self.stem(x)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/utils/involution_cuda.py", line 281, in forward
out = _involution_cuda(x, weight, stride=self.stride, padding=(self.kernel_size-1)//2)
File "/home/zxl/mm/mmsegmentation/mmseg/models/utils/involution_cuda.py", line 238, in _involution_cuda
out = _involution.apply(input, weight, _pair(stride), _pair(padding), _pair(dilation))
File "/home/zxl/mm/mmsegmentation/mmseg/models/utils/involution_cuda.py", line 174, in forward
stream=Stream(ptr=torch.cuda.current_stream().cuda_stream))
File "cupy/cuda/function.pyx", line 182, in cupy.cuda.function.Function.call
File "cupy/cuda/function.pyx", line 164, in cupy.cuda.function._launch
File "cupy_backends/cuda/api/driver.pyx", line 299, in cupy_backends.cuda.api.driver.launchKernel
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: too many resources requested for launch
Traceback (most recent call last):
File "tools/train.py", line 163, in
main()
File "tools/train.py", line 159, in main
meta=meta)
File "/home/zxl/mm/mmsegmentation/mmseg/apis/train.py", line 116, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 131, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/iter_based_runner.py", line 66, in train
self.call_hook('after_train_iter')
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/base_runner.py", line 308, in call_hook
getattr(hook, fn_name)(self)
File "/home/zxl/mm/mmsegmentation/mmseg/core/evaluation/eval_hooks.py", line 89, in after_train_iter
gpu_collect=self.gpu_collect)
File "/home/zxl/mm/mmsegmentation/mmseg/apis/test.py", line 140, in multi_gpu_test
result = model(return_loss=False, rescale=True, **data)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 619, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/base.py", line 124, in forward
return self.forward_test(img, img_metas, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/base.py", line 106, in forward_test
return self.simple_test(imgs[0], img_metas[0], **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 265, in simple_test
seg_logit = self.inference(img, img_meta, rescale)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 250, in inference
seg_logit = self.whole_inference(img, img_meta, rescale)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 217, in whole_inference
seg_logit = self.encode_decode(img, img_meta)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 87, in encode_decode
x = self.extract_feat(img)
File "/home/zxl/mm/mmsegmentation/mmseg/models/segmentors/encoder_decoder.py", line 79, in extract_feat
x = self.backbone(img)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/backbones/rednet.py", line 456, in forward
x = self.stem(x)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/zxl/mm/mmsegmentation/mmseg/models/utils/involution_cuda.py", line 281, in forward
out = _involution_cuda(x, weight, stride=self.stride, padding=(self.kernel_size-1)//2)
File "/home/zxl/mm/mmsegmentation/mmseg/models/utils/involution_cuda.py", line 238, in _involution_cuda
out = _involution.apply(input, weight, _pair(stride), _pair(padding), _pair(dilation))
File "/home/zxl/mm/mmsegmentation/mmseg/models/utils/involution_cuda.py", line 174, in forward
stream=Stream(ptr=torch.cuda.current_stream().cuda_stream))
File "cupy/cuda/function.pyx", line 182, in cupy.cuda.function.Function.call
File "cupy/cuda/function.pyx", line 164, in cupy.cuda.function._launch
File "cupy_backends/cuda/api/driver.pyx", line 299, in cupy_backends.cuda.api.driver.launchKernel
File "cupy_backends/cuda/api/driver.pyx", line 124, in cupy_backends.cuda.api.driver.check_status
cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES: too many resources requested for launch
Traceback (most recent call last):
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/zxl/anaconda3/envs/ms/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)

损失为nan

您好,在cls任务上,当我用involution卷积替换resnet50的3*3卷积时,损失为nan,原因有可能是什么啊

位置信息编码

请问"在生成involution kernel时隐式地编码了pixel的位置信息"是什么意思呢?

The version of mmcv

Hello, thank you very much for your work.Could you tell me what version of mmcv is your code uesed?Thank you very much.

在使用involution的时候发生了错误

因为想把involution用在其他的网络里面尝试,所以只是单独复制了项目中的involution_cuda.py,然后引用其中的involution类。
但是使用过程中发现如下错误,恳请作者帮忙看一看什么原因导致的错误。
在项目中如下文件中的27行报错:

det/mmdet/models/utils/involution_cuda.py  #文件

kernel_code = cupy.cuda.compile_with_cache(code)  #报错代码

我的配置是python3.7,CUDA9.1,安装的cupy-cuda91,下图是完整报错信息(忽略xxx..,这个算是手动打码)

File "xxx../lib/models/involution_cuda.py", line 281, in forward
    out = _involution_cuda(x, weight, stride=self.stride, padding=(self.kernel_size-1)//2)
File "xxx../lib/models/involution_cuda.py", line 238, in _involution_cuda
    out = _involution.apply(input, weight, _pair(stride), _pair(padding), _pair(dilation))
File "xxx../lib/models/involution_cuda.py", line 170, in forward
    pad_h=padding[0], pad_w=padding[1])
File "cupy/util.pyx", line 81, in cupy.util.memoize.decorator.ret
File "xxx../lib/models/involution_cuda.py", line 27, in load_kernel
    kernel_code = cupy.cuda.compile_with_cache(code)
File "xxx../anaconda3/envs/python37/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 298, in compile_with_cache
    extra_source, backend)
File "xxx../anaconda3/envs/python37/lib/python3.7/site-packages/cupy/cuda/compiler.py", line 352, in _compile_with_cache_cuda
    ls.add_ptr_data(ptx, 'cupy.ptx')
File "cupy/cuda/function.pyx", line 230, in cupy.cuda.function.LinkState.add_ptr_data
File "cupy/cuda/function.pyx", line 232, in cupy.cuda.function.LinkState.add_ptr_data
File "cupy/cuda/driver.pyx", line 198, in cupy.cuda.driver.linkAddData
File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_PTX: a PTX JIT compilation failed

Why the GPU memory of rednet50 is larger than resnet50?

I used Tesla V100.
When I train resnet50-retinanet, the gpu memory is 9.26G.
But when I train rednet50-retinanet, the gpu memory is 10.98G.
In your paper, the params and floats of rednet50 is less than resnet50.
image

I only change resnet50 to rednet50, don't change the neck, the head, the input size and other params.
I don't know why. I expect for your reply

Why the involution could summarize the context in a wider spatial arrangement?

Nice work of rethinking conv modules.

The question is why could involution summarize the context into a wider spatial array?

In my view, only the process of changing 3x3 convolution of ResNet to 7x7 involution to create RedNet seems to be the only factor of wider receptive field.

Is there any inherent nature of involution for summarizing the context into a wider spatial array?

raise error

Like this

File "/cloud/seg/modules/trainer2.py", line 426, in train_epoch loss_m.backward() File "/environment/python/versions/miniconda3-4.7.12/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/environment/python/versions/miniconda3-4.7.12/lib/python3.7/site-packages/torch/autograd/__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag File "/environment/python/versions/miniconda3-4.7.12/lib/python3.7/site-packages/torch/autograd/function.py", line 89, in apply return self._forward_cls.backward(self, *args) # type: ignore File "/cloud/seg/modules/inovo.py", line 245, in backward assert grad_output.is_cuda and grad_output.is_contiguous() AssertionError

involution_cuda.py # line 27

Hi, my bug occur at # line27 in involution_cuda.py, kernel_code = cupy.cuda.compile_with_cache(code), and seem to be a cupy compile error:

cupy.cuda.compiler.CompileException: /tmp/tmp69btppfa/45f36f9abded28e374e19885d0b4818c_2.cubin.cu(6): error: identifier "None" is undefined
/tmp/tmp69btppfa/45f36f9abded28e374e19885d0b4818c_2.cubin.cu(6): error: identifier "None" is undefined
/tmp/tmp69btppfa/45f36f9abded28e374e19885d0b4818c_2.cubin.cu(6): error: identifier "None" is undefined
/tmp/tmp69btppfa/45f36f9abded28e374e19885d0b4818c_2.cubin.cu(13): error: identifier "None" is undefined
/tmp/tmp69btppfa/45f36f9abded28e374e19885d0b4818c_2.cubin.cu(13): error: identifier "None" is undefined

I have no idea to deal with this, would you offer me any help?
Pytorch Environment: torch1.6+cu9.2, cupy-cuda9.2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.