christophreich1996 / involution Goto Github PK

PyTorch reimplementation of the paper "Involution: Inverting the Inherence of Convolution for Visual Recognition" (2D and 3D Involution) [CVPR 2021].

Home Page: https://arxiv.org/pdf/2103.06255.pdf

License: MIT License

Python 100.00%

involution deep-learning computer-vision machine-learning 2d-involution visual-recognition pytorch 3d-involution cvpr2021

involution's Introduction

Involution: Inverting the Inherence of Convolution for Visual Recognition

Unofficial PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition by Duo Li, Jie Hu, Changhu Wang et al. published at CVPR 2021.

This repository includes a pure PyTorch implementation of a 2D and 3D involution.

Please note that the official implementation provides a more memory efficient CuPy implementation of the 2D involution. Additionally, shikishima-TasakiLab provides a fast and memory efficent CUDA implementation of the 2D Involution.

Installation

The 2D and 3D involution can be easily installed by using pip.

pip install git+https://github.com/ChristophReich1996/Involution

Example Usage

Additional examples, such as strided involutions or transposed convolution like involutions, can be found in the example.py file.

The 2D involution can be used as a nn.Module as follows:

import torch
from involution import Involution2d

involution = Involution2d(in_channels=32, out_channels=64)
output = involution(torch.rand(1, 32, 128, 128))

The 2D involution takes the following parameters.

Parameter	Description	Type
in_channels	Number of input channels	int
out_channels	Number of output channels	int
sigma_mapping	Non-linear mapping as introduced in the paper. If none BN + ReLU is utilized (default=None)	Optional[nn.Module]
kernel_size	Kernel size to be used (default=(7, 7))	Union[int, Tuple[int, int]]
stride	Stride factor to be utilized (default=(1, 1))	Union[int, Tuple[int, int]]
groups	Number of groups to be employed (default=1)	int
reduce_ratio	Reduce ration of involution channels (default=1)	int
dilation	Dilation in unfold to be employed (default=(1, 1))	Union[int, Tuple[int, int]]
padding	Padding to be used in unfold operation (default=(3, 3))	Union[int, Tuple[int, int]]
bias	If true bias is utilized in each convolution layer (default=False)	bool
force_shape_match	If true potential shape mismatch is solved by performing avg pool (default=False)	bool
**kwargs	Unused additional key word arguments	Any

The 3D involution can be used as a nn.Module as follows:

import torch
from involution import Involution3d

involution = Involution3d(in_channels=8, out_channels=16)
output = involution(torch.rand(1, 8, 32, 32, 32))

The 3D involution takes the following parameters.

Parameter	Description	Type
in_channels	Number of input channels	int
out_channels	Number of output channels	int
sigma_mapping	Non-linear mapping as introduced in the paper. If none BN + ReLU is utilized	Optional[nn.Module]
kernel_size	Kernel size to be used (default=(7, 7, 7))	Union[int, Tuple[int, int, int]]
stride	Stride factor to be utilized (default=(1, 1, 1))	Union[int, Tuple[int, int, int]]
groups	Number of groups to be employed (default=1)	int
reduce_ratio	Reduce ration of involution channels (default=1)	int
dilation	Dilation in unfold to be employed (default=(1, 1, 1))	Union[int, Tuple[int, int, int]]
padding	Padding to be used in unfold operation (default=(3, 3, 3))	Union[int, Tuple[int, int, int]]
bias	If true bias is utilized in each convolution layer (default=False)	bool
force_shape_match	If true potential shape mismatch is solved by performing avg pool (default=False)	bool
**kwargs	Unused additional key word arguments	Any

Reference

@inproceedings{Li2021,
    author = {Li, Duo and Hu, Jie and Wang, Changhu and Li, Xiangtai and She, Qi and Zhu, Lei and Zhang, Tong and Chen, Qifeng},
    title = {Involution: Inverting the Inherence of Convolution for Visual Recognition},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2021}
}

involution's People

Contributors

Stargazers

Watchers

involution's Issues

Unable to run Involution on ImageNet dataset

RuntimeError: CUDA out of memory. Tried to allocate 37.52 GiB (GPU 0; 10.76 GiB total capacity; 1.61 GiB already allocated; 7.97 GiB free; 1.63 GiB reserved in total by PyTorch)

The extreme high memory requirement (37.52 GiB) is not reasonable!

Default padding described in README is not the same as default padding in the code

README -: (1,1)
CODE -: (3,3)

bug

The difference between your implementation and official implementation

Thanks for your work . But I have a question , the official implementation can't change the output channel, why yours can?

Question: there may be something wrong?

thanks for your contribution!
Here, for some reason, i need to realize the "involution2D，3D" by myself, and I take this project for validation.
However, my results can not be the same as yours. In the begining, i think it may be my fault, but after check i am not sure!!!
So could you help me?
Here is my question:
1、I think the “Tensor.unfold()" use in "involution.py" are not right........( may be ).
Here is the code ( with problems):
‘’‘
input_unfolded = self.pad(input_initial)
.unfold(dimension=2, size=self.kernel_size[0], step=self.stride[0])
.unfold(dimension=3, size=self.kernel_size[1], step=self.stride[1])
.unfold(dimension=4, size=self.kernel_size[2], step=self.stride[2])
input_unfolded = input_unfolded.reshape(batch_size, self.groups, self.out_channels // self.groups,
self.kernel_size[0] * self.kernel_size[1] * self.kernel_size[2], -1)
input_unfolded = input_unfolded.reshape(tuple(input_unfolded.shape[:-1])
+ (out_depth, out_height, out_width))
’‘’

In officials, they use "nn.Unfold()" and this is right.
the Tensor.unfold（） returns ”B,C,H,W,K,K“， and the "nn.Unfold()" returns "B,CxKxK,HxW".
So I think the " permute" needed be used if use ”Tensor.unfold（）“.
And I give an example for comparsion:
################The Code:##############

def nnUnfold_Tensorunfold():
input = torch.ones((1, 1, 5, 5))
# ----------------nnUnfold----------------- #
Unfold1 = nn.Unfold(3, 1, (3 - 1) // 2, 1)
input_unfolded = Unfold1(input) #====>B,CxKxK,HxW
input_unfolded = input_unfolded.contiguous().view(1,9,5,5)
print("Official: nn.Unfold():",input_unfolded)
# ---------------Tensorunfold--------------- #
pad = nn.ConstantPad2d(padding=(1, 1,1, 1), value=0.)
input = pad(input)
input_unfolded = input
input_unfolded = input_unfolded.unfold(dimension=2, size=3, step=1)
input_unfolded = input_unfolded.unfold(dimension=3, size=3, step=1) #===>B,C,H,W,K,K
before = input_unfolded.contiguous().view(1,9,5,5)
print("Wrong: Tensor.unfold():",before)
after = input_unfolded.permute(0,1,4,5,2,3).contiguous().view(1,9,5,5) #====> permute should be used
print("Right: after permute:",after)
# --------------------------------- #
if name == 'main':
nnUnfold_Tensorunfold()

################The Results:##############
Official: nn.Unfold(): tensor([[[[0., 0., 0., 0., 0.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [0., 0., 0., 0., 0.]]]])

Wrong: Tensor.unfold(): tensor([[[[0., 0., 0., 0., 1.],
[1., 0., 1., 1., 0.],
[0., 0., 1., 1., 1.],
[1., 1., 1., 0., 0.],
[0., 1., 1., 1., 1.]],

     [[1., 1., 0., 0., 0.],
      [1., 1., 1., 1., 1.],
      [1., 0., 0., 0., 1.],
      [1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.]],

     [[1., 0., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 0., 1.],
      [1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 0., 1.],
      [1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 0., 1.]],

     [[1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 0., 0., 1.],
      [1., 1., 1., 1., 1.],
      [0., 0., 0., 1., 1.]],

     [[1., 1., 1., 1., 0.],
      [0., 0., 1., 1., 1.],
      [1., 1., 1., 0., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 0., 0., 0.]]]])

Right: after permute: tensor([[[[0., 0., 0., 0., 0.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [0., 0., 0., 0., 0.]]]])

########################################
Maybe i am wrong..... could you help me?

Problem with tensor size

I replaced the Conv2D with a Involution2D and it stopped working for me, the parameters are the same, what can I do?

conv_transpose2d

Hi,

What should be the correct format of using 2D involution for the transposed convolution (F.conv_transpose2d)?

Whether this module support the case that kernel's width not equal height?

I have read part of your code and was very excited about the results your code. However, after read your repo, I am left with some concerns:
1、The 2D involution takes the following parameters. shoud be modified to The 3D involution takes the following parameters. of your description of 3D involution fragment in the readme file ?
2、The implementation of nn.Unfold by offical pytorch describe the output of the function ，as the issue #7 describe, when i change the line 6 involution_2d = Involution2d(in_channels=4, out_channels=8) to involution_2d = Involution2d(in_channels=4, out_channels=8,kernel_size=(2,3)) in examples.py the exception will appear

  File "C:/Users/Desktop/InvolutionA/examples.py", line 8, in <module>
    output = involution_2d(input)
  File "D:\anaconda\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Desktop\InvolutionA\involution\involution.py", line 127, in forward
    input_unfolded = input_unfolded.view(batch_size, self.groups, self.out_channels // self.groups,
RuntimeError: shape '[2, 1, 8, 6, 68, 68]' is invalid for input of size 450432

Process finished with exit code 1

thanks for your response.

In the process of generating kernel, you may be forget to use BN and Relu activate function, as descibed in paper.

kernel = self.span_mapping(self.reduce_mapping(self.o_mapping(input)))
after self.reduce_mapping(), need use activate function.

The problem of weight and height

Hello, I noticed that in line 118 of the Involution2D function, height and width represent the input size. If I change the stride parameter entered, an error occurs on line 121. May I ask whether the calculation of size should be added, such as:
height=(height+2self.padding[0]-self.dilation[0](self.kernel_size[0]-1)-1)//self.stride[0]+1
width=(width+2self.padding[1]-self.dilation[1](self.kernel_size[1]-1)-1)//self.stride[1]+1

Is it possible to create involution1D?

need that

the tensor size problem doesn't match

>>> import torch
>>> from involution import Involution2d, Involution3d
>>> involution_2d = Involution2d(3, 16, kernel_size=3, padding=1, stride=2, bias=False)
>>> input_ = torch.rand(2, 3, 507, 684)
>>> output = involution_2d(input_)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ouc/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ouc/anaconda3/envs/sttr/lib/python3.6/site-packages/involution/involution.py", line 133, in forward
    output = (kernel * input_unfolded).sum(dim=3)
RuntimeError: The size of tensor a (253) must match the size of tensor b (254) at non-singleton dimension 4

the sequence of padding in involution3d

Hi, as the picture showed, if padding sequence fitted to nn.ConstantPad3d() is (self.padding[0], self.padding[0],self.padding[1], self.padding[1],self.padding[2], self.padding[2])), that means self.padding = (W_pad,H_pad,D_pad), but (D_pad, H_pad, W_pad) may be customary, and in nn.Conv3D(), the padding sequence is also (D_pad, H_pad, W_pad), so I suggest change the padding sequence fitted to nn.ConstantPad3d() to (self.padding[2], self.padding[2],self.padding[1], self.padding[1],self.padding[0], self.padding[0])). Or maybe you can add a sequence annotation to parameter 'padding'.