titu1994 / keras-resnext Goto Github PK

Implementation of ResNeXt models from the paper Aggregated Residual Transformations for Deep Neural Networks in Keras 2.0+.

License: MIT License

Python 100.00%

keras-resnext's Introduction

Keras ResNeXt

Implementation of ResNeXt models from the paper Aggregated Residual Transformations for Deep Neural Networks in Keras 2.0+.

Contains code for building the general ResNeXt model (optimized for datasets similar to CIFAR) and ResNeXtImageNet (optimized for the ImageNet dataset).

Salient Features

ResNeXt updates the ResNet block with a new expanded block architecture, which depends on the cardinality parameter. It can be further visualised in the below diagram from the paper.

However, since grouped convolutions are not directly available in Keras, an equivalent variant is used in this repository (see block 2)

Usage

For the general ResNeXt model (for all datasets other than ImageNet),

from resnext import ResNext

model = ResNext(image_shape, depth, cardinality, width, weight_decay)

For the ResNeXt model which has been optimized for ImageNet,

from resnext import ResNextImageNet

image_shape = (112, 112, 3) if K.image_data_format() == 'channels_last' else (3, 112, 112)
model = ResNextImageNet(image_shape)

Note, there are other parameters such as depth, cardinality, width and weight_decay just as in the general model, however the defaults are set according to the paper.

keras-resnext's People

Contributors

Stargazers

Watchers

Forkers

freeyawork frannetty stevenlol jxlin samxiaosheng lemonnight wqh17101 wangnuowa lonestar686 sunsanqis37 xylary tim-lee-cn shubhampachori12110095 neiodavince zhjpqq yskn67 alexliyang volerous tyranthiraxus jtchenpro nick917 adiffm biosopher ryotamono sunshinezhihuo jobqiu grseb9s bruceyang2012 jyp0716 zfxu afcarl cscn89 gonglixue juanlp dizzydwarf75 sher-ali zilipeng zhangguozhou dberma15 amirunpri2018 shenshutao erfaneshrati vanvalen tony32769 tranquanghuy0801 lvguofeng1303 gehongpeng cxx123456 allenzyj luzhch henry586 cruiserx moeinh77 alwaysbetter1314 hyg-sudo wellch xrosliang atamazian formatciadam furkankesgin egrass maxy0524 swansealeo g-mos thisurilekamge haitian2du modelreproduceljy xxuffei miaris98 linchengqiao527 xingkongcwb ilyushin asgsaeid stjordanis yz1392946854 alijafari1997 metehangelgi invest-smart-comp-491

keras-resnext's Issues

`ResNeXt` != `ResNext`

https://github.com/titu1994/Keras-ResNeXt/blob/master/cifar10.py#L15 : import ResNeXt
But https://github.com/titu1994/Keras-ResNeXt/blob/master/resnext.py#L37 : ResNext

ResNeXt != ResNext, typo?

Where is the pretrained model?

Hi Somshubra,

I find that you didn't mention the place to download the pretrained resNext model. The Imagenet pretrained without top one.

Best,
Lele

ImportError: cannot import name '_obtain_input_shape'?

I'm not sure about the implementation of the grouped convolution block. The attached image shows a different approach if I'm not mistaken. The repository has only one convolution per tower while the image shown has 3 convolutions followed by BatchNormalization and Relu, except for the last one which is only followed by BatchNormalization.

Image found in: https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d

Parameter Count

Hi, if it's not too much trouble, i wish to implement ResNeXt-29, 16×64d according to the paper. May i know why the parameter count using (depth = 29, cardinality = 16, width = 64) is 320,956,352 in model.summary() instead of 68.1M as reported in the paper? Similarly, (depth = 29, cardinality = 8, width = 64) is 89,700,288 parameters. Thanks alot!

IMAGENET_TF_WEIGHTS_PATH isn't defined in resnext.py

I'm trying to extract weights from ImageNet for ResNeXt architecture.

While executing resnext.py, I'm receiving below error.

raise ValueError("unknown url type: %r" % self.full_url)
ValueError: unknown url type: ''

After checking the code in detail I found that the variable IMAGENET_TF_WEIGHTS_PATH is not properly defined, it must contain a valid URL.

Can you please provide your suggestion to resolve this issue?

Thanks.

Do you have plan of implementing

Hi titu, great to see ResNeXt model. I'm so curious to know if you have plan of implementing Multi-Attention-Network over the ResNeXt ? https://arxiv.org/ftp/arxiv/papers/2009/2009.02130.pdf

Where can i download the weight?

Number of parameters

when I try to follow the original paper and instantiate CIFAR-10 network (ResNeXt-29, 8x64d), the paper lists 34.4M parameters, but when I run: ResNext(img_dim, depth=29, cardinality=8, width=64, classes=10),
The resulting model summary outputs:

Total params: 89,700,288
Trainable params: 89,599,808
Non-trainable params: 100,480

What is the reason for the big difference?

Hello

First I want to thank you for your implementation of Squeeze-and-Excitation Networks in keras.
I tried your network and it worked very well, but I got a question.
As I underderstand, resnet is very similar to resnext.
Resnet reduces the size of the input 5 times, for exemple :
256 -> 128 , 128->64, 64->32, 32->16, 16->8.
But when I used the code you provided to create a resnext I end up with only 3 reduction of the size :
256->128, 128->64, 64->32
Is it normal?
Does this affect the precision of the network?

i used this inputs to create the network :
resnet_base = SEResNext(input_shape=input_shape,
#depth=155,
depth=56,
cardinality=64,
width=4,
weight_decay=5e-4,
include_top=False,
weights=None,
input_tensor=input_layer,
pooling=None)

Pull request pending

Can you review pull request #25 ? It will convert this code to the package which can be installed with pip.

Splitting Tensors/Grouped Convolutions

@titu1994 ,

Awesome work!

In figure 3 of the paper:

Can you shed light on how these three different iterations of Aggregated Transforms are equivalent? From looking at your code it looks like you choose to implement method b. Is this accurate? Also I saw another implementation that uses lambda layers to do something more akin to item c. That is, if the previous layer channel dimension is 64-d for instance and C=32 (cardinality groups) then this would result in 64/32= 2 feature maps per cardinality group as input to the 32 different convolutions. These feature maps would not overlap and the sum of them across the cardinality groups will always equal 64-d in our example.

How is this the same as having 32 different convolutions all with 64-d channels as input? Your thoughts would be much appreciated!

EDIT: Other implementation - https://gist.github.com/mjdietzx/0cb95922aac14d446a6530f87b3a04ce

What's the best Optimizer to use and what default parameters go well.

According to the paper https://arxiv.org/pdf/1611.05431.pdf
"We use SGD with a mini-batch size of 256 on 8 GPUs (32
per GPU). The weight decay is 0.0001 and the momentum
is 0.9. We start from a learning rate of 0.1, and divide it by
10 for three times using the schedule in"
does adam work better ?

cannot import name 'ResNeXt'`

When running either cifar10.py or cifar100.py I get:

Using TensorFlow backend. Traceback (most recent call last): File "cifar10.py", line 15, in <module> from resnext import hello_ResNeXt ImportError: cannot import name 'hello_ResNeXt'

I've checked that I have both tensorflow and keras 2.0.8. I've search on SO but everything I tried didn't solve the issue. One of the things I've tried is putting everything in a single file, but keeps crashing for the same reason.

Any ideas? Thanks!

Different Implementation from paper and original code

If you look at the original Resnext code with torch at facebookresearch/Resnext and also in the paper
You can't see any LeakyReLU activations but you always used LeakyRelu instead of relu. Did you miss that thing or there is different purpose on this?

Thanks in advance...

Where is ImageNet Weights file?

Thanks for your implementation of Resnext. But I can't find 'resnext_imagenet_32_4_th_dim_ordering_th_kernels_no_top.h5', I think this file is ImageNet Weight file. How can I get this file? Thank you very much!

Running out of Memory

Hi, I am running out of memor while running it for CIFAR10, for Cardinality of 8 and Width of 64. I a using Nvidia Ti-1080, for training, with a batch size of 64. Is there any way I can avoid this issue. I tried using multi_gpu option in keras, and that have slowed my training at least 10 times. Any suggestion would be really appreciated