adobe / antialiased-cnns Goto Github PK
View Code? Open in Web Editor NEWpip install antialiased-cnns to improve stability and accuracy
Home Page: https://richzhang.github.io/antialiased-cnns/
License: Other
pip install antialiased-cnns to improve stability and accuracy
Home Page: https://richzhang.github.io/antialiased-cnns/
License: Other
From the paper:
How do the learned convolutional filters change? Our proposed change smooths the internal feature maps for purposes of downsampling. How does training with this layer affect the learned convolutional layers? ... the anti-aliased networks (red-purple) actually learn smoother filters throughout the network, relative to the baseline (black). Adding in more aggressive low-pass filtering further decreases the TV (increasing smoothness). This indicates that our method actually induces a smoother feature extractor overall.
I wonder is it possible to smooth convolutional filters itself instead of feature maps?
hi @richzhang
is is possible not to use pad layer in Downsample layer? or just use dw-conv to replace it?
thanks
Hi, it's really refreshing to see signal processing principles used in deep networks. I have a question about the upsampling mechanism.
After going through the original code and a related issue #28, following is my attempt to implement a bilinear-upsampler:
class BilinearUpsample(nn.Module):
def __init__(self, channels=None, interpolation_factor=2):
super(BilinearUpsample, self).__init__()
self.channels = channels
self.stride = interpolation_factor
lpf = torch.tensor([[1., 2., 1.],
[2., 4., 2.],
[1., 2., 1.]])
lpf = lpf/torch.sum(lpf)*4
lpf = lpf.unsqueeze(dim=0).unsqueeze(dim=0)
lpf = lpf.repeat([self.channels, 1, 1, 1])
self.register_buffer('lpf', lpf)
def forward(self, x):
return F.conv_transpose2d(input = x, weight = self.lpf, stride=self.stride, padding = 1, output_padding=1, groups=self.channels)
Is this implementation correct ?
The result seems very different from PyTorch bilinear interpolation.
Using
BilinearUpsample(channels=num_channels)(a)
vs
F.interpolate(a, scale_factor=2, mode='bilinear', align_corners=True)
returns very different output.
Hi,
we tried max-blur-pool for a text-reading model (CRNN).
This model consists of convolutional layers followed by recurrent layers (LSTM).
We also have pooling strides of e.g. (1, 2), in this case we use a filter of size 1 in the one direction and of 5 in the other direction, and compute the outer product of them.
I evaluate max-pool and max-blur-pool by shifting the input images by -4, -3, ..., +3 pixels in horizontal direction.
In this experiment, when using max-blur-pool, we get a median amplitude of 0.107 for max-pool and 0.063 for max-blur-pool.
So here max-blur-pool performs better (the same applies when computing the mean of the amplitude values)
In this experiment, max-pool performs much better, by having only a max. abs. difference of 0.04% for character error rate, while max-blur-pool has up to 0.55%.
In the 2nd experiment, blur-pool performed worse than max-pool, however, we were comparing to the ground-truth, so the difference could be caused by overall model performance.
We also want to know how consistent the model is, even though the recognized text is wrong, we want it to be the same for all shifts.
Again, max-pool (slightly) performed better than max-blur-pool, by having a summed edit distance of 51 instead of 53.
P.S.: experiments were repeated for a blur filter size of 3 instead of 5, however, the results were worse, and I only wanted to use the best max-blur-pool model for comparison.
class BlurPool(KL.Layer):
"""
https://arxiv.org/abs/1904.11486 https://github.com/adobe/antialiased-cnns
"""
def __init__(self, filt_size=5, stride=2, **kwargs):
self.strides = (stride,stride)
self.filt_size = filt_size
self.padding = ( (int(1.*(filt_size-1)/2), int(np.ceil(1.*(filt_size-1)/2)) ), (int(1.*(filt_size-1)/2), int(np.ceil(1.*(filt_size-1)/2)) ) )
if(self.filt_size==1):
self.a = np.array([1.,])
elif(self.filt_size==2):
self.a = np.array([1., 1.])
elif(self.filt_size==3):
self.a = np.array([1., 2., 1.])
elif(self.filt_size==4):
self.a = np.array([1., 3., 3., 1.])
elif(self.filt_size==5):
self.a = np.array([1., 4., 6., 4., 1.])
elif(self.filt_size==6):
self.a = np.array([1., 5., 10., 10., 5., 1.])
elif(self.filt_size==7):
self.a = np.array([1., 6., 15., 20., 15., 6., 1.])
super(BlurPool, self).__init__(**kwargs)
def compute_output_shape(self, input_shape):
height = input_shape[1] // self.strides[0]
width = input_shape[2] // self.strides[1]
channels = input_shape[3]
return (input_shape[0], height, width, channels)
def call(self, x):
k = self.a
k = k[:,None]*k[None,:]
k = k / np.sum(k)
k = np.tile (k[:,:,None,None], (1,1,K.int_shape(x)[-1],1) )
k = K.constant (k, dtype=K.floatx() )
x = K.spatial_2d_padding(x, padding=self.padding)
x = K.depthwise_conv2d(x, k, strides=self.strides, padding='valid')
return x
Thank you for sharing this repo.
I remember that kernel_size of initial max pooing for original ResNet is 3.
Why did you change the kernel_size of initial max pooing for ResNet from 3 to 2?
https://github.com/adobe/antialiased-cnns/blob/master/models_lpf/resnet.py#L170
Is there a special reason?
Model weights are downloaded from aws.
urllib.error.HTTPError: HTTP Error 403: Forbidden
Same issue as #45
Ubuntu 16.04 and 22.04
do you have the code of downsample with tensorflow? I really need it, thankyou for your help
Why Conv->Relu->BlurPool and not Conv->BlurPool->Relu?
Just as a sidenote for maxpooling there is no difference https://stackoverflow.com/questions/35543428/activation-function-after-pooling-layer-or-convolutional-layer
Thanks for your great work! I have two questions regarding the implementation details:
(1) In the situation of strided-convolution, why the BlurPool layer is placed after the ReLU rather than right next to the convolution?
It would be much more flexible if the conv and blurpool can be coupled.
I was considering the implementation in the pre-activation resnet.
(2) This question might be silly, but why not apply bilinear interpolation layers to downsample the feature map? I haven't seen any work use it.
How would one use this for depthwise convolutions, can groups be taken in as an input and passed through to F.conv2d instead of input.shape[1] ?
Thanks very much for the great work!
I am very curious about the visualizing method in this paper. Do you have any plans about releasing the code about that part?
Thanks!
Thank you for your awesome work. I have a question:
What if I want to perform upsampling instead of downsampling? As I understand, aliasing is a problem only when downsampling. But I came across this paper A Style-Based Generator Architecture for Generative Adversarial Networks where they also blur during upsampling, citing your work. Here the blur was applied after upsampling (instead of before as in downsampling). Could you comment on that?
I intend to apply your technique in a VAE or GAN, and I would like to know whether I should include the blur in the decoder/generator.
Hello
very nice woks !
I do also struggle with spatial shift sensitivity of my 3D CNN and I would like to test your solution.
Is someone aware of a 3D implementation of those filter ? or can you guide us on the implementation ?
Many thanks
Dear Author:
Thank you for sharing your insight and code. I have a question as below
For the Shift Equivariance visualization in Figure 5, it looks like the feature map always has same resolution with input image(32x32), how could you achieve that? As I know, the resolution of feature map should reduce as downsampling is executed.
If the bilinear interpolation is used, is that a fair feature distance metrics?
A more general question is how could one measure shift-equivariance between pixel level shift on input image and corresponding sub-pixel shift on feature map
Thank you in advance!
File from AWS loads properly
AWS returns Access Denied for any file
urllib.error.HTTPError: HTTP Error 403: Forbidden
pip install antialiased_cnns
Then
import antialiased_cnns
model = antialiased_cnns.resnet50(pretrained=True)
Reproduced this both on my Mac and 2 Linux servers
Hi. Thanks for your brilliant work on solving the antialiasing problem. I find in the implementation of Antialiased-ResNet50, the order to apply the antialiased layers in the skip connections is different from the one in the main pathway.
The original skip connection is:
...->conv(stride=2)->...
Now it is:
...blurpool(stride=2)->conv(stride=1)->...
But from my understanding, it should be:
...->conv(stride=1)->blurpool(stride=2)->...
And in this way, it is identical to how you deal with the Bottleneck.conv2 and Bottleneck.conv3.
Is there any reason the calculation order is inverse in the skip connections?
With baseline CNNs with no anti-aliasing, we see better shift consistency if we increase the CNN's depth, e.g. VGG11 -> VGG19, Resnet18 -> Resnet152. Why is that so?
Hello all,
I would like to train the model with CIFAR10.
But it gives error with blurring kernel size larger than 3:
RuntimeError: Padding size should be less than the corresponding input dimension, but got: padding (2, 2) at dimension 3 of input [64, 512, 2, 2]
What do you suggest? Is there a way to apply blurpool for small images?
Hello,
First of all: fantastic paper and contribution -- and the pypi package is the cherry on top :D
I decided to try switching one of my model trainings to use antialiased_cnns.resnet34
as a drop-in replacement for torchvision.models.resnet34
. It seems however that the memory needs are almost 1.5x higher with the anti-aliased CNN. This is based on the fact that with the torchvision version, my model trains with a batch size of 16 per GPU (it's a sequence model, so the actual number of images going through the CNN per batch is actually much higher). With the anti-aliased CNN, I get CUDA out of memory errors for any batch size above 11.
Were you aware of this? I'm not really expecting you to post a fix, just wondering if it makes sense to you and if you were already aware of it.
Thanks again!
Hi, could you provide the anti-antialiased U-Net codes which you used in pix2pix? I wonder how you apply blurring after upsampling. Thank you in advance!
@richzhang Hi. Thank you for work.
Since in TensorFlow 2.0, tf.image.resize supports gradients. Will it be more efficient or superior in performance to downsample with aforementioned function(various interpolation methods are provided)?
I wonder is AvrPool already an antialiased downsample? and it's just a special case - filter size 2?
antialiased-cnns/models_lpf/__init__.py
Line 29 in 3f3f960
Hi folks,
A lot of implementations use adaptive versions of poolings to support images with different resolutions (e.g. most of the models from this repo use it).
Is it possible to modify downsampling layer to support adaptive behaviour?
In a new python virtualenv, after pip installing the requirements, I ran python main.py --data /home/datasets/ImageNet -e -f 3 -a alexnet_lpf --weights ./weights/alexnet_lpf3.pth.tar
and expected evaluation output.
Traceback (most recent call last):
File "main.py", line 62, in <module>
import models_lpf
File "/home/aswanberg/antialiased-cnns/models_lpf/__init__.py", line 12, in <module>
from IPython import embed
ModuleNotFoundError: No module named 'IPython'
Seems like this would happen any time someone uses this repo with a fresh python install.
Ubuntu
Thanks for your great work.
Models applied in other task usually require a Imagenet Pretrained backbones.
If I want to use this Module into own backbone , is it necessary to train a ImageNet Pretrain ... ?
Or just replace MaxPool with MaxBlurPool then load the original Pretrain weight, then to train other task ?
For example you don't have Res101 in weights/download_antialiased_models.sh , but I hope to farely compare Res101 with the same Pretrained weights.
Hi,
interesting approach! How is this related to networks doing pooling in Fourier Domain, as discussed in http://ecmlpkdd2017.ijs.si/papers/paperID11.pdf (and the references therein)?
hi, author,
i want to cite your figure 4 in my paper.
however, i can't find the information about right/permission.
can i cite your figure if i give clear indication in my paper?
expecting your answer, many thanks!
Hi
I am confused about the meaning of args "pool_only" inside the resnet.py and alexnet.py
Default is False for Alexnet and True for ResNet, I can't find where this argument change in main.py. I think pool_only=True is the original version of the model. Is that correct?
Thank you for sharing this great work and for open-sourcing your code!
I think that the things to watch out for could be avoided by defining the weights in the Downsample layer as buffers instead of parameters. See torch.nn.Module.register_buffer
.
Hi!
I am confused about the code In the L170-171 of resnet.py.
In the original implementation of resnet, It should be max-pooling 2D, which is
nn.MaxPool2d(kernel_size=2, stride=2)
Why do you use nn.MaxPool2d(kernel_size=2, stride=1) and Downsample layer (with Gaussian blur) instead?
If the channel argument is set to None, keep the filter kernel with only one channel and then in the forward pass use PyTorch's .expand() function to match the input's channel count. I'm uncertain of the performance impact so only having this as an optional behavior seems safest.
This helps with testing before finalizing a design since you don't have to change the channel argument each time the channel count changes.
The filters are made by multiplying against its flipped copy, so it should work fine if it was kept as, for example, 1x7 instead of 7x7. Then conv2d twice with the second conv2d using the weights after swapping the width and height dimensions.
I'm uncertain if this provides much of an improvement for a size of 3, but as the filter size grows it should be faster due to the number of reads increasing linearly as the width multiplied by two instead of exponentially as the width squared.
Edit: Sorry for the closed / reopen notifications, I thought I did something wrong when trying this again recently.
Hi. Have you measured how does BlurPool affect the total number of multiply-adds needed for the model? I suppose it would make them a little bit more computationally heavy and wonder how huge is the increase.
Hi,
As mentioned in paper, low pass filter is used for convolution after blur pool. But I can find its implementation in code. Conv2d filters are seemed to learn by backpropagation.
Am I missing something?
Thanks,
Hi, as the theory, strided downsample layer should be revised, and I wonder that pooling with size=3, stride=1 should be replaced as your version?
Thx!
Hi, thanks for sharing the code!
I really enjoyed reading your paper.
As I understand, you have mainly considered downsampling layers (pool, strided convolution)
does the same shift variance issue exist for conv2d_transpose layers as well?
If it does, could you share your insights on how to replace the layer?
Thanks very much!
Hi,
I have trained a resnet50 model successfully, but when I train resnext50_32x4d, there is an error:
models_lpf/resnet.py", line 147, in forward
out += identity
RuntimeError: The size of tensor a (20) must match the size of tensor b (80) at non-singleton dimension 3.
In addition, in models_lpf/resnet.py, "groups=4, width_per_group=32" in resnext32x4d is differenet from "groups=32, width_per_group=4" in pytorch offical code "torchvision/models/resnet.py"
Do you have any advices?
Thank you for your great work.
I have a question. Is the blur_kernel trainable or fixed?I think it's fixed gussian parameter to blur?
is it good for autoencoder?
Hi, I want to train antialias-cnn on cub200 dataset. I use ResNet-50 backbone.
However, I do not see increase of accuracy with BlurPool after several experiments. Do you have any suggestions for antialias-cnn on fine-grained datasets (e.g. cub200)?
Thank you.
If stride=1, is there a difference between BlurPool and maxpool
First, thanks for the very nice work!
In your implementation as well as in the paper, it seems that the proposed filters (which are the binomial coefficients) are only valid for strides/downsampling factors of 2. Extrapolating from this, does it mean that I need to use the trinomial coefficients for stride 3, quadrinomial coefficients for stride 4, and so on?
By the way, you could simplify your code in downsample.py using scipy.special.binom instead of hard-coding each filter. Something like a = np.asarray([binom(filt_size-1, i) for i in range(filt_size)])
which will take care of arbitrary filt_size
Awesome work.
Take a look at alias-free gan by Nvidia if you haven't already.
https://nvlabs.github.io/alias-free-gan/
The filters in this research only reduce alias by smoothing but using a sinc kaiser filter like alias-free gan can almost completely remove aliasing. It will be very interesting to see how the network performs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.