adobe / antialiased-cnns Goto Github PK

pip install antialiased-cnns to improve stability and accuracy

Home Page: https://richzhang.github.io/antialiased-cnns/

License: Other

Python 100.00%

convolutional-neural-networks icml-2019 icml shift-invariant shift-equivariant antialiasing cnns artificial-intelligence computer-vision

antialiased-cnns's Issues

Smooth convolutional filters instead of feature maps?

From the paper:

How do the learned convolutional filters change? Our proposed change smooths the internal feature maps for purposes of downsampling. How does training with this layer affect the learned convolutional layers? ... the anti-aliased networks (red-purple) actually learn smoother filters throughout the network, relative to the baseline (black). Adding in more aggressive low-pass filtering further decreases the TV (increasing smoothness). This indicates that our method actually induces a smoother feature extractor overall.

I wonder is it possible to smooth convolutional filters itself instead of feature maps?

is it possible not to use the pad layer?

hi @richzhang
is is possible not to use pad layer in Downsample layer? or just use dw-conv to replace it?
thanks

Bilinear Upsampler Implementation

Hi, it's really refreshing to see signal processing principles used in deep networks. I have a question about the upsampling mechanism.

After going through the original code and a related issue #28, following is my attempt to implement a bilinear-upsampler:

class BilinearUpsample(nn.Module):
    def __init__(self, channels=None, interpolation_factor=2):
        super(BilinearUpsample, self).__init__()
        self.channels = channels
        self.stride = interpolation_factor

        lpf = torch.tensor([[1., 2., 1.],
                            [2., 4., 2.],
                            [1., 2., 1.]])
        lpf = lpf/torch.sum(lpf)*4
        lpf = lpf.unsqueeze(dim=0).unsqueeze(dim=0)
        lpf = lpf.repeat([self.channels, 1, 1, 1])
        self.register_buffer('lpf', lpf)

    def forward(self, x):
            return F.conv_transpose2d(input = x, weight = self.lpf, stride=self.stride, padding = 1, output_padding=1, groups=self.channels)

Is this implementation correct ?
The result seems very different from PyTorch bilinear interpolation.

Using

BilinearUpsample(channels=num_channels)(a)

F.interpolate(a, scale_factor=2, mode='bilinear', align_corners=True)

returns very different output.

Max-blur-pool used in text recognition model (CRNN)

Hi,

we tried max-blur-pool for a text-reading model (CRNN).
This model consists of convolutional layers followed by recurrent layers (LSTM).
We also have pooling strides of e.g. (1, 2), in this case we use a filter of size 1 in the one direction and of 5 in the other direction, and compute the outer product of them.

I evaluate max-pool and max-blur-pool by shifting the input images by -4, -3, ..., +3 pixels in horizontal direction.

1st experiment

Recognize text in shift=0 image
Shift image, and compute probability of the shift=0 text in the current image
Compute amplitude of "wobble" by taking max(probability) - min(probability) (see plot below for illustration)
Compare histograms of amplitudes to see which model has less "wobble"

In this experiment, when using max-blur-pool, we get a median amplitude of 0.107 for max-pool and 0.063 for max-blur-pool.
So here max-blur-pool performs better (the same applies when computing the mean of the amplitude values)

2nd experiment

Evaluate model by comparing ground truth texts and recognized texts for different shifts
Compute character error rate
Compute difference between best and worst character error rate as a measure for how robust the model is w.r.t. shifts

In this experiment, max-pool performs much better, by having only a max. abs. difference of 0.04% for character error rate, while max-blur-pool has up to 0.55%.

3rd experiment

Recognize text in shift=0 image
Recognize texts in all other shifted images
Sum up edit distances between text in original image and all shifted versions

In the 2nd experiment, blur-pool performed worse than max-pool, however, we were comparing to the ground-truth, so the difference could be caused by overall model performance.
We also want to know how consistent the model is, even though the recognized text is wrong, we want it to be the same for all shifts.

Again, max-pool (slightly) performed better than max-blur-pool, by having a summed edit distance of 51 instead of 53.

Questions

Do you have any idea why max-pool outperforms max-blur-pool? (I hoped that the recognized text in the shift=0 image does not change that much in the shifted versions of the image)
Did you ever try and analyze non-square blur filters (e.g. 1x3, 1x5, 3x5, ...)?
Did you ever conduct any experiments with models having recurrent layers after the convolutional layers?
Is there any benefit in using even-sized filter kernels? (the original code uses 4 as the default)

P.S.: experiments were repeated for a blur filter size of 3 instead of 5, however, the results were worse, and I only wanted to use the best max-blur-pool model for comparison.

my keras impl

class BlurPool(KL.Layer):
    """
    https://arxiv.org/abs/1904.11486 https://github.com/adobe/antialiased-cnns
    """
    def __init__(self, filt_size=5, stride=2, **kwargs):
        self.strides = (stride,stride)
        self.filt_size = filt_size
        self.padding = ( (int(1.*(filt_size-1)/2), int(np.ceil(1.*(filt_size-1)/2)) ), (int(1.*(filt_size-1)/2), int(np.ceil(1.*(filt_size-1)/2)) ) )
        if(self.filt_size==1):
            self.a = np.array([1.,])
        elif(self.filt_size==2):
            self.a = np.array([1., 1.])
        elif(self.filt_size==3):
            self.a = np.array([1., 2., 1.])
        elif(self.filt_size==4):    
            self.a = np.array([1., 3., 3., 1.])
        elif(self.filt_size==5):    
            self.a = np.array([1., 4., 6., 4., 1.])
        elif(self.filt_size==6):    
            self.a = np.array([1., 5., 10., 10., 5., 1.])
        elif(self.filt_size==7):    
            self.a = np.array([1., 6., 15., 20., 15., 6., 1.])
        super(BlurPool, self).__init__(**kwargs)
    def compute_output_shape(self, input_shape):
        height = input_shape[1] // self.strides[0]
        width = input_shape[2] // self.strides[1] 
        channels = input_shape[3]
        return (input_shape[0], height, width, channels)
        
    def call(self, x):
        k = self.a
        k = k[:,None]*k[None,:]
        k = k / np.sum(k)
        k = np.tile (k[:,:,None,None], (1,1,K.int_shape(x)[-1],1) )                
        k = K.constant (k, dtype=K.floatx() )
        
        x = K.spatial_2d_padding(x, padding=self.padding)
        x = K.depthwise_conv2d(x, k, strides=self.strides, padding='valid')
        return x

Unable to establish SSL connection,when use command: bash weights/download_antialiased_models.sh

About kernel_size of initial max pooing for Resnet

Thank you for sharing this repo.

I remember that kernel_size of initial max pooing for original ResNet is 3.

Why did you change the kernel_size of initial max pooing for ResNet from 3 to 2?

https://github.com/adobe/antialiased-cnns/blob/master/models_lpf/resnet.py#L170

Is there a special reason?

HTTP Error 403: Forbidden when loading weights

Expected Behaviour

Model weights are downloaded from aws.

Actual Behaviour

urllib.error.HTTPError: HTTP Error 403: Forbidden

Reproduce Scenario (including but not limited to)

Same issue as #45

Steps to Reproduce

Platform and Version

Ubuntu 16.04 and 22.04

Sample Code that illustrates the problem

Logs taken while reproducing problem

can Downsample use tensorflow?

do you have the code of downsample with tensorflow? I really need it, thankyou for your help

Why Conv->Relu->BlurPool ?

Why Conv->Relu->BlurPool and not Conv->BlurPool->Relu?

Just as a sidenote for maxpooling there is no difference https://stackoverflow.com/questions/35543428/activation-function-after-pooling-layer-or-convolutional-layer

ResNet parameter "pool_only=True"

In experiment , ResNet use parameter "pool_only=True" ?

This parameter effect accuracy ?

I think "pool_only=False" is more make sense ,but "pool_only=True" is default.

In main.py , I can't find any variable to setting this parameter

When I read the code , I am very confuse

About the implementation

Thanks for your great work! I have two questions regarding the implementation details:
(1) In the situation of strided-convolution, why the BlurPool layer is placed after the ReLU rather than right next to the convolution?
It would be much more flexible if the conv and blurpool can be coupled.
I was considering the implementation in the pre-activation resnet.
(2) This question might be silly, but why not apply bilinear interpolation layers to downsample the feature map? I haven't seen any work use it.

Depthwise convolutions

How would one use this for depthwise convolutions, can groups be taken in as an input and passed through to F.conv2d instead of input.shape[1] ?

Visualizing shift-equivariance

Thanks very much for the great work!

I am very curious about the visualizing method in this paper. Do you have any plans about releasing the code about that part?

Thanks!

What about upsampling?

Thank you for your awesome work. I have a question:
What if I want to perform upsampling instead of downsampling? As I understand, aliasing is a problem only when downsampling. But I came across this paper A Style-Based Generator Architecture for Generative Adversarial Networks where they also blur during upsampling, citing your work. Here the blur was applied after upsampling (instead of before as in downsampling). Could you comment on that?

I intend to apply your technique in a VAE or GAN, and I would like to know whether I should include the blur in the decoder/generator.

3D implementation ?

Hello

very nice woks !
I do also struggle with spatial shift sensitivity of my 3D CNN and I would like to test your solution.
Is someone aware of a 3D implementation of those filter ? or can you guide us on the implementation ?

Many thanks

About Internal feature distance for shift Equivariance visualization

Dear Author:
Thank you for sharing your insight and code. I have a question as below
For the Shift Equivariance visualization in Figure 5, it looks like the feature map always has same resolution with input image(32x32), how could you achieve that? As I know, the resolution of feature map should reduce as downsampling is executed.
If the bilinear interpolation is used, is that a fair feature distance metrics?
A more general question is how could one measure shift-equivariance between pixel level shift on input image and corresponding sub-pixel shift on feature map
Thank you in advance!

HTTP Error 403: Forbidden when loading weights

Expected Behaviour

File from AWS loads properly

Actual Behaviour

AWS returns Access Denied for any file
urllib.error.HTTPError: HTTP Error 403: Forbidden

Reproduce Scenario (including but not limited to)

pip install antialiased_cnns
Then

import antialiased_cnns
model = antialiased_cnns.resnet50(pretrained=True)

Platform and Version

Reproduced this both on my Mac and 2 Linux servers

Is there any particular reason for puting Blurpool before the skip-connection layers?

Hi. Thanks for your brilliant work on solving the antialiasing problem. I find in the implementation of Antialiased-ResNet50, the order to apply the antialiased layers in the skip connections is different from the one in the main pathway.
The original skip connection is:
...->conv(stride=2)->...
Now it is:
...blurpool(stride=2)->conv(stride=1)->...
But from my understanding, it should be:
...->conv(stride=1)->blurpool(stride=2)->...
And in this way, it is identical to how you deal with the Bottleneck.conv2 and Bottleneck.conv3.
Is there any reason the calculation order is inverse in the skip connections?

Why do deeper CNNs have better shift consistency?

With baseline CNNs with no anti-aliasing, we see better shift consistency if we increase the CNN's depth, e.g. VGG11 -> VGG19, Resnet18 -> Resnet152. Why is that so?

Padding size issue for small images

Hello all,

I would like to train the model with CIFAR10.

But it gives error with blurring kernel size larger than 3:

RuntimeError: Padding size should be less than the corresponding input dimension, but got: padding (2, 2) at dimension 3 of input [64, 512, 2, 2]

What do you suggest? Is there a way to apply blurpool for small images?

Increased memory usage vs. torchvision equivalent

Hello,

First of all: fantastic paper and contribution -- and the pypi package is the cherry on top :D

I decided to try switching one of my model trainings to use antialiased_cnns.resnet34 as a drop-in replacement for torchvision.models.resnet34. It seems however that the memory needs are almost 1.5x higher with the anti-aliased CNN. This is based on the fact that with the torchvision version, my model trains with a batch size of 16 per GPU (it's a sequence model, so the actual number of images going through the CNN per batch is actually much higher). With the anti-aliased CNN, I get CUDA out of memory errors for any batch size above 11.

Were you aware of this? I'm not really expecting you to post a fix, just wondering if it makes sense to you and if you were already aware of it.

Thanks again!

Could you please provide a 3D implementation in pytorch?

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Sample Code that illustrates the problem

Logs taken while reproducing problem

Asking for Anti-antialiased U-Net codes

Hi, could you provide the anti-antialiased U-Net codes which you used in pix2pix? I wonder how you apply blurring after upsampling. Thank you in advance!

What about tf.image.resize?

@richzhang Hi. Thank you for work.
Since in TensorFlow 2.0, tf.image.resize supports gradients. Will it be more efficient or superior in performance to downsample with aforementioned function(various interpolation methods are provided)?

Is AvrPool already an antialiased downsampling method?

I wonder is AvrPool already an antialiased downsample? and it's just a special case - filter size 2?

antialiased-cnns/models_lpf/__init__.py

Line 29 in 3f3f960

a = np.array([1., 1.])

Adaptive version?

Hi folks,

A lot of implementations use adaptive versions of poolings to support images with different resolutions (e.g. most of the models from this repo use it).

Is it possible to modify downsampling layer to support adaptive behaviour?

IPython missing from requirements.txt

Expected Behaviour

In a new python virtualenv, after pip installing the requirements, I ran python main.py --data /home/datasets/ImageNet -e -f 3 -a alexnet_lpf --weights ./weights/alexnet_lpf3.pth.tar and expected evaluation output.

Actual Behaviour

Traceback (most recent call last):
  File "main.py", line 62, in <module>
    import models_lpf
  File "/home/aswanberg/antialiased-cnns/models_lpf/__init__.py", line 12, in <module>
    from IPython import embed
ModuleNotFoundError: No module named 'IPython'

Reproduce Scenario (including but not limited to)

Seems like this would happen any time someone uses this repo with a fresh python install.

Steps to Reproduce

Create a new python virtualenv (or create a new python install some other way).
Run the above command.

Platform and Version

Ubuntu

Utility for own model require a retrained Imagenet Pretrained ?

Thanks for your great work.
Models applied in other task usually require a Imagenet Pretrained backbones.
If I want to use this Module into own backbone , is it necessary to train a ImageNet Pretrain ... ?
Or just replace MaxPool with MaxBlurPool then load the original Pretrain weight, then to train other task ?
For example you don't have Res101 in weights/download_antialiased_models.sh , but I hope to farely compare Res101 with the same Pretrained weights.

How is this approach related to Fourier Domain Convnets?

Hi,

interesting approach! How is this related to networks doing pooling in Fourier Domain, as discussed in http://ecmlpkdd2017.ijs.si/papers/paperID11.pdf (and the references therein)?

can i cite the figure of the paper directly?

hi, author,
i want to cite your figure 4 in my paper.
however, i can't find the information about right/permission.
can i cite your figure if i give clear indication in my paper?
expecting your answer, many thanks!

Confuse abount "pool_only" in resnet.py and alexnet.py

I am confused about the meaning of args "pool_only" inside the resnet.py and alexnet.py
Default is False for Alexnet and True for ResNet, I can't find where this argument change in main.py. I think pool_only=True is the original version of the model. Is that correct?

Use buffers instead of parameters in the Downsample layer

Thank you for sharing this great work and for open-sourcing your code!

I think that the things to watch out for could be avoided by defining the weights in the Downsample layer as buffers instead of parameters. See torch.nn.Module.register_buffer.

Implementation of original ResNet

Hi!
I am confused about the code In the L170-171 of resnet.py.
In the original implementation of resnet, It should be max-pooling 2D, which is

nn.MaxPool2d(kernel_size=2, stride=2)

Why do you use nn.MaxPool2d(kernel_size=2, stride=1) and Downsample layer (with Gaussian blur) instead?

Feature Req: Making the channel argument optional

If the channel argument is set to None, keep the filter kernel with only one channel and then in the forward pass use PyTorch's .expand() function to match the input's channel count. I'm uncertain of the performance impact so only having this as an optional behavior seems safest.

This helps with testing before finalizing a design since you don't have to change the channel argument each time the channel count changes.

Feature Req: Separable Convolution

The filters are made by multiplying against its flipped copy, so it should work fine if it was kept as, for example, 1x7 instead of 7x7. Then conv2d twice with the second conv2d using the weights after swapping the width and height dimensions.

I'm uncertain if this provides much of an improvement for a size of 3, but as the filter size grows it should be faster due to the number of reads increasing linearly as the width multiplied by two instead of exponentially as the width squared.

Edit: Sorry for the closed / reopen notifications, I thought I did something wrong when trying this again recently.

FLOPS increase?

Hi. Have you measured how does BlurPool affect the total number of multiply-adds needed for the model? I suppose it would make them a little bit more computationally heavy and wonder how huge is the increase.

Can't find Low Pass Filter Usage in Code

Hi,

As mentioned in paper, low pass filter is used for convolution after blur pool. But I can find its implementation in code. Conv2d filters are seemed to learn by backpropagation.

Am I missing something?

Thanks,

No strided pooling

Hi, as the theory, strided downsample layer should be revised, and I wonder that pooling with size=3, stride=1 should be replaced as your version?
Thx!

Are conv2d_transpose layers shift variant, too?

Hi, thanks for sharing the code!
I really enjoyed reading your paper.

As I understand, you have mainly considered downsampling layers (pool, strided convolution)
does the same shift variance issue exist for conv2d_transpose layers as well?
If it does, could you share your insights on how to replace the layer?

Thanks very much!

RuntimeError when training resnext50_32x4d

Hi,
I have trained a resnet50 model successfully, but when I train resnext50_32x4d, there is an error:
models_lpf/resnet.py", line 147, in forward
out += identity
RuntimeError: The size of tensor a (20) must match the size of tensor b (80) at non-singleton dimension 3.

In addition, in models_lpf/resnet.py, "groups=4, width_per_group=32" in resnext32x4d is differenet from "groups=32, width_per_group=4" in pytorch offical code "torchvision/models/resnet.py"
Do you have any advices?

Is the blur_kernel trainable or parametric?

Thank you for your great work.

I have a question. Is the blur_kernel trainable or fixed？I think it's fixed gussian parameter to blur?

is it good for autoencoder?

Some Problems on Reproducing the Results on cub200

Hi, I want to train antialias-cnn on cub200 dataset. I use ResNet-50 backbone.

However, I do not see increase of accuracy with BlurPool after several experiments. Do you have any suggestions for antialias-cnn on fine-grained datasets (e.g. cub200)?

Thank you.

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Sample Code that illustrates the problem

Logs taken while reproducing problem

weights connot download can you offer other links,thanks

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Sample Code that illustrates the problem

Logs taken while reproducing problem

If stride=1, is there a difference between BlurPool and maxpool

Larger strides/downsampling factors

First, thanks for the very nice work!

In your implementation as well as in the paper, it seems that the proposed filters (which are the binomial coefficients) are only valid for strides/downsampling factors of 2. Extrapolating from this, does it mean that I need to use the trinomial coefficients for stride 3, quadrinomial coefficients for stride 4, and so on?

By the way, you could simplify your code in downsample.py using scipy.special.binom instead of hard-coding each filter. Something like a = np.asarray([binom(filt_size-1, i) for i in range(filt_size)]) which will take care of arbitrary filt_size

Any plans to explore using sinc filter for downsampling?

Awesome work.

Take a look at alias-free gan by Nvidia if you haven't already.
https://nvlabs.github.io/alias-free-gan/

The filters in this research only reduce alias by smoothing but using a sinc kaiser filter like alias-free gan can almost completely remove aliasing. It will be very interesting to see how the network performs.

adobe / antialiased-cnns Goto Github PK

antialiased-cnns's Issues

1st experiment

2nd experiment

3rd experiment

Questions

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Sample Code that illustrates the problem

Logs taken while reproducing problem

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Platform and Version

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Sample Code that illustrates the problem

Logs taken while reproducing problem

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Sample Code that illustrates the problem

Logs taken while reproducing problem

Expected Behaviour

Actual Behaviour

Reproduce Scenario (including but not limited to)

Steps to Reproduce

Platform and Version

Sample Code that illustrates the problem

Logs taken while reproducing problem

Recommend Projects

Recommend Topics

Recommend Org