Coder Social home page Coder Social logo

christophreich1996 / involution Goto Github PK

View Code? Open in Web Editor NEW
104.0 6.0 20.0 59 KB

PyTorch reimplementation of the paper "Involution: Inverting the Inherence of Convolution for Visual Recognition" (2D and 3D Involution) [CVPR 2021].

Home Page: https://arxiv.org/pdf/2103.06255.pdf

License: MIT License

Python 100.00%
involution deep-learning computer-vision machine-learning 2d-involution visual-recognition pytorch 3d-involution cvpr2021

involution's Introduction

Involution: Inverting the Inherence of Convolution for Visual Recognition

License: MIT

Unofficial PyTorch reimplementation of the paper Involution: Inverting the Inherence of Convolution for Visual Recognition by Duo Li, Jie Hu, Changhu Wang et al. published at CVPR 2021.

This repository includes a pure PyTorch implementation of a 2D and 3D involution.

Please note that the official implementation provides a more memory efficient CuPy implementation of the 2D involution. Additionally, shikishima-TasakiLab provides a fast and memory efficent CUDA implementation of the 2D Involution.

Installation

The 2D and 3D involution can be easily installed by using pip.

pip install git+https://github.com/ChristophReich1996/Involution

Example Usage

Additional examples, such as strided involutions or transposed convolution like involutions, can be found in the example.py file.

The 2D involution can be used as a nn.Module as follows:

import torch
from involution import Involution2d

involution = Involution2d(in_channels=32, out_channels=64)
output = involution(torch.rand(1, 32, 128, 128))

The 2D involution takes the following parameters.

Parameter Description Type
in_channels Number of input channels int
out_channels Number of output channels int
sigma_mapping Non-linear mapping as introduced in the paper. If none BN + ReLU is utilized (default=None) Optional[nn.Module]
kernel_size Kernel size to be used (default=(7, 7)) Union[int, Tuple[int, int]]
stride Stride factor to be utilized (default=(1, 1)) Union[int, Tuple[int, int]]
groups Number of groups to be employed (default=1) int
reduce_ratio Reduce ration of involution channels (default=1) int
dilation Dilation in unfold to be employed (default=(1, 1)) Union[int, Tuple[int, int]]
padding Padding to be used in unfold operation (default=(3, 3)) Union[int, Tuple[int, int]]
bias If true bias is utilized in each convolution layer (default=False) bool
force_shape_match If true potential shape mismatch is solved by performing avg pool (default=False) bool
**kwargs Unused additional key word arguments Any

The 3D involution can be used as a nn.Module as follows:

import torch
from involution import Involution3d

involution = Involution3d(in_channels=8, out_channels=16)
output = involution(torch.rand(1, 8, 32, 32, 32))

The 3D involution takes the following parameters.

Parameter Description Type
in_channels Number of input channels int
out_channels Number of output channels int
sigma_mapping Non-linear mapping as introduced in the paper. If none BN + ReLU is utilized Optional[nn.Module]
kernel_size Kernel size to be used (default=(7, 7, 7)) Union[int, Tuple[int, int, int]]
stride Stride factor to be utilized (default=(1, 1, 1)) Union[int, Tuple[int, int, int]]
groups Number of groups to be employed (default=1) int
reduce_ratio Reduce ration of involution channels (default=1) int
dilation Dilation in unfold to be employed (default=(1, 1, 1)) Union[int, Tuple[int, int, int]]
padding Padding to be used in unfold operation (default=(3, 3, 3)) Union[int, Tuple[int, int, int]]
bias If true bias is utilized in each convolution layer (default=False) bool
force_shape_match If true potential shape mismatch is solved by performing avg pool (default=False) bool
**kwargs Unused additional key word arguments Any

Reference

@inproceedings{Li2021,
    author = {Li, Duo and Hu, Jie and Wang, Changhu and Li, Xiangtai and She, Qi and Zhu, Lei and Zhang, Tong and Chen, Qifeng},
    title = {Involution: Inverting the Inherence of Convolution for Visual Recognition},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month = {June},
    year = {2021}
}

involution's People

Contributors

christophreich1996 avatar deh40 avatar memmelma avatar shikishima-tasakilab avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

involution's Issues

Unable to run Involution on ImageNet dataset

RuntimeError: CUDA out of memory. Tried to allocate 37.52 GiB (GPU 0; 10.76 GiB total capacity; 1.61 GiB already allocated; 7.97 GiB free; 1.63 GiB reserved in total by PyTorch)

The extreme high memory requirement (37.52 GiB) is not reasonable!

Question: there may be something wrong?

thanks for your contribution!
Here, for some reason, i need to realize the "involution2D,3D" by myself, and I take this project for validation.
However, my results can not be the same as yours. In the begining, i think it may be my fault, but after check i am not sure!!!
So could you help me?
Here is my question:
1、I think the “Tensor.unfold()" use in "involution.py" are not right........( may be ).
Here is the code ( with problems):
‘’‘
input_unfolded = self.pad(input_initial)
.unfold(dimension=2, size=self.kernel_size[0], step=self.stride[0])
.unfold(dimension=3, size=self.kernel_size[1], step=self.stride[1])
.unfold(dimension=4, size=self.kernel_size[2], step=self.stride[2])
input_unfolded = input_unfolded.reshape(batch_size, self.groups, self.out_channels // self.groups,
self.kernel_size[0] * self.kernel_size[1] * self.kernel_size[2], -1)
input_unfolded = input_unfolded.reshape(tuple(input_unfolded.shape[:-1])
+ (out_depth, out_height, out_width))
’‘’

In officials, they use "nn.Unfold()" and this is right.
the Tensor.unfold() returns ”B,C,H,W,K,K“, and the "nn.Unfold()" returns "B,CxKxK,HxW".
So I think the " permute" needed be used if use ”Tensor.unfold()“.
And I give an example for comparsion:
################The Code:##############

def nnUnfold_Tensorunfold():
input = torch.ones((1, 1, 5, 5))
# ----------------nnUnfold----------------- #
Unfold1 = nn.Unfold(3, 1, (3 - 1) // 2, 1)
input_unfolded = Unfold1(input) #====>B,CxKxK,HxW
input_unfolded = input_unfolded.contiguous().view(1,9,5,5)
print("Official: nn.Unfold():",input_unfolded)
# ---------------Tensorunfold--------------- #
pad = nn.ConstantPad2d(padding=(1, 1,1, 1), value=0.)
input = pad(input)
input_unfolded = input
input_unfolded = input_unfolded.unfold(dimension=2, size=3, step=1)
input_unfolded = input_unfolded.unfold(dimension=3, size=3, step=1) #===>B,C,H,W,K,K
before = input_unfolded.contiguous().view(1,9,5,5)
print("Wrong: Tensor.unfold():",before)
after = input_unfolded.permute(0,1,4,5,2,3).contiguous().view(1,9,5,5) #====> permute should be used
print("Right: after permute:",after)
# --------------------------------- #
if name == 'main':
nnUnfold_Tensorunfold()

################The Results:##############
Official: nn.Unfold(): tensor([[[[0., 0., 0., 0., 0.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [0., 0., 0., 0., 0.]]]])

Wrong: Tensor.unfold(): tensor([[[[0., 0., 0., 0., 1.],
[1., 0., 1., 1., 0.],
[0., 0., 1., 1., 1.],
[1., 1., 1., 0., 0.],
[0., 1., 1., 1., 1.]],

     [[1., 1., 0., 0., 0.],
      [1., 1., 1., 1., 1.],
      [1., 0., 0., 0., 1.],
      [1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.]],

     [[1., 0., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 0., 1.],
      [1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 0., 1.],
      [1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 0., 1.]],

     [[1., 0., 1., 1., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 0., 0., 1.],
      [1., 1., 1., 1., 1.],
      [0., 0., 0., 1., 1.]],

     [[1., 1., 1., 1., 0.],
      [0., 0., 1., 1., 1.],
      [1., 1., 1., 0., 0.],
      [0., 1., 1., 0., 1.],
      [1., 0., 0., 0., 0.]]]])

Right: after permute: tensor([[[[0., 0., 0., 0., 0.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.],
[0., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[0., 0., 0., 0., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.]],

     [[0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [1., 1., 1., 1., 1.],
      [0., 0., 0., 0., 0.]],

     [[1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [1., 1., 1., 1., 0.],
      [0., 0., 0., 0., 0.]]]])

########################################
Maybe i am wrong..... could you help me?

Problem with tensor size

image
I replaced the Conv2D with a Involution2D and it stopped working for me, the parameters are the same, what can I do?

conv_transpose2d

Hi,

What should be the correct format of using 2D involution for the transposed convolution (F.conv_transpose2d)?

Whether this module support the case that kernel's width not equal height?

I have read part of your code and was very excited about the results your code. However, after read your repo, I am left with some concerns:
1、The 2D involution takes the following parameters. shoud be modified to The 3D involution takes the following parameters. of your description of 3D involution fragment in the readme file ?
2、The implementation of nn.Unfold by offical pytorch describe the output of the function ,as the issue #7 describe, when i change the line 6 involution_2d = Involution2d(in_channels=4, out_channels=8) to involution_2d = Involution2d(in_channels=4, out_channels=8,kernel_size=(2,3)) in examples.py the exception will appear

  File "C:/Users/Desktop/InvolutionA/examples.py", line 8, in <module>
    output = involution_2d(input)
  File "D:\anaconda\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\Users\Desktop\InvolutionA\involution\involution.py", line 127, in forward
    input_unfolded = input_unfolded.view(batch_size, self.groups, self.out_channels // self.groups,
RuntimeError: shape '[2, 1, 8, 6, 68, 68]' is invalid for input of size 450432

Process finished with exit code 1

thanks for your response.

The problem of weight and height

Hello, I noticed that in line 118 of the Involution2D function, height and width represent the input size. If I change the stride parameter entered, an error occurs on line 121. May I ask whether the calculation of size should be added, such as:
height=(height+2self.padding[0]-self.dilation[0](self.kernel_size[0]-1)-1)//self.stride[0]+1
width=(width+2self.padding[1]-self.dilation[1](self.kernel_size[1]-1)-1)//self.stride[1]+1

the tensor size problem doesn't match

>>> import torch
>>> from involution import Involution2d, Involution3d
>>> involution_2d = Involution2d(3, 16, kernel_size=3, padding=1, stride=2, bias=False)
>>> input_ = torch.rand(2, 3, 507, 684)
>>> output = involution_2d(input_)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ouc/anaconda3/envs/sttr/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/ouc/anaconda3/envs/sttr/lib/python3.6/site-packages/involution/involution.py", line 133, in forward
    output = (kernel * input_unfolded).sum(dim=3)
RuntimeError: The size of tensor a (253) must match the size of tensor b (254) at non-singleton dimension 4

the sequence of padding in involution3d

Hi, as the picture showed, if padding sequence fitted to nn.ConstantPad3d() is (self.padding[0], self.padding[0],self.padding[1], self.padding[1],self.padding[2], self.padding[2])), that means self.padding = (W_pad,H_pad,D_pad), but (D_pad, H_pad, W_pad) may be customary, and in nn.Conv3D(), the padding sequence is also (D_pad, H_pad, W_pad), so I suggest change the padding sequence fitted to nn.ConstantPad3d() to (self.padding[2], self.padding[2],self.padding[1], self.padding[1],self.padding[0], self.padding[0])). Or maybe you can add a sequence annotation to parameter 'padding'.
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.