brjathu / deepcaps Goto Github PK

View Code? Open in Web Editor NEW

151.0 151.0 48.0 54 KB

Official Implementation of "DeepCaps: Going Deeper with Capsule Networks" paper (CVPR 2019).

License: MIT License

Python 100.00%

deepcaps's People

Contributors

Stargazers

Watchers

Forkers

wjgaas jdc08161063 scape1989 qq2737499951 chentaiyue dlwbm123 shuizhilinxin yangsuhui vinojjayasundara gitleej igih lee-hongseok carrielui tcwltcwl genesiscloud guang000 ltggeek ns3284 chaltier0323 lk2018 liguihong brucemareri biprateep bill007bill donglinz elnovim mrazekv bruinxiong arjun-na imseaton ifnature caprdzv sivaramakrishnan-rajaraman ttddtd hopefulrational andreichiro chengyiming moejoe95 waldopm xdusponge aaaeeee piyumalanthony mohammadamindhm akkarimi yanchunyanla amir-ghz phoenixdigitalfx

deepcaps's Issues

Routing in early layers

hello,
the topic I'm referring to was already discussed in Issue 15.
Because your paper says: Reducing the number of routing iterations in the initial layers that are larger in size reduces the complexity .... In addition, using 3D-convolution-inspired routing in the middle layers ... reduces the number of parameters.
But in fact, you are using your routing algorithm just in one single residual connection in the last block (not in 'middle layers'), or have I misunderstood something here?
Do you have an experiment that shows that the routing in this single layer actually improves the accuracy?

ensemble.py:77: DeprecationWarning: `imresize` is deprecated!

If you follow the installation instructions and install a current version of scipy, you will end up with the following error:

Traceback (most recent call last):
  File "ensemble.py", line 82, in <module>
    x_test64 = resize(x_test)
  File "ensemble.py", line 77, in resize
    resized = scipy.misc.imresize(data_set[i], (64, 64))
AttributeError: module 'scipy' has no attribute 'misc'

after changing import scipy to import scipy.misc, the error will change to

Traceback (most recent call last):
  File "ensemble.py", line 82, in <module>
    x_test64 = resize(x_test)
  File "ensemble.py", line 77, in resize
    resized = scipy.misc.imresize(data_set[i], (64, 64))
AttributeError: module 'scipy.misc' has no attribute 'imresize'

A potential workaround I found was downgrading manually to scipy-1.2, but it still give this huge warning:

ensemble.py:77: DeprecationWarning: `imresize` is deprecated!
`imresize` is deprecated in SciPy 1.0.0, and will be removed in 1.3.0.
Use Pillow instead: ``numpy.array(Image.fromarray(arr).resize())``.
  resized = scipy.misc.imresize(data_set[i], (64, 64))

The DeprecationWarning should probably be fixed.

Kernel of Conv3D is larger than featuremap

Thank you so much for your excellent work, I am very interested in your work. But I am a PYTORCH user, and it is a little difficult to read some content of TENSORFLOW. For DEEPCAPS28, I noticed that the feature size in CONV3D is 22, but KERNEL is 33, I would like to know the details of your PADDING. If possible, I hope you can tell me in natural language.
l_skip = ConvCapsuleLayer3D(kernel_size=3, num_capsule=32, num_atoms=8, strides=1, padding='same', routings=3)(l)

The practicality of this network

@brjathu Hi, Thanks for your great work. I want to know whether this network is suitable for two-category image data, such as cat and dog images with shape (64,64,3) ? I train the model with my prevate medical image (two class), It happens the following strange loss, and the accuracy is always not improved. Can to give me some advices about it?

Epoch 46/500
107/106 [==============================] - 15s 137ms/step - loss: 0.6319 - capsnet_loss: 0.2150 - decoder_loss: 1.0422 - capsnet_acc: 0.4690 - val_loss: 0.6452 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0853 - val_capsnet_acc: 0.4788
Epoch 47/500
107/106 [==============================] - 15s 136ms/step - loss: 0.6323 - capsnet_loss: 0.2158 - decoder_loss: 1.0413 - capsnet_acc: 0.4670 - val_loss: 0.6449 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0846 - val_capsnet_acc: 0.4788
Epoch 48/500
107/106 [==============================] - 15s 138ms/step - loss: 0.6331 - capsnet_loss: 0.2150 - decoder_loss: 1.0453 - capsnet_acc: 0.4690 - val_loss: 0.6457 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0867 - val_capsnet_acc: 0.4788
Epoch 49/500
107/106 [==============================] - 15s 139ms/step - loss: 0.6330 - capsnet_loss: 0.2158 - decoder_loss: 1.0428 - capsnet_acc: 0.4670 - val_loss: 0.6446 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0838 - val_capsnet_acc: 0.4788
Epoch 50/500
107/106 [==============================] - 15s 136ms/step - loss: 0.6324 - capsnet_loss: 0.2158 - decoder_loss: 1.0415 - capsnet_acc: 0.4670 - val_loss: 0.6435 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0811 - val_capsnet_acc: 0.4788
Epoch 51/500
107/106 [==============================] - 15s 140ms/step - loss: 0.6295 - capsnet_loss: 0.2150 - decoder_loss: 1.0361 - capsnet_acc: 0.4690 - val_loss: 0.6431 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0802 - val_capsnet_acc: 0.4788
Epoch 52/500
107/106 [==============================] - 15s 138ms/step - loss: 0.6317 - capsnet_loss: 0.2167 - decoder_loss: 1.0377 - capsnet_acc: 0.4650 - val_loss: 0.6433 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0807 - val_capsnet_acc: 0.4788
Epoch 53/500
107/106 [==============================] - 15s 136ms/step - loss: 0.6274 - capsnet_loss: 0.2150 - decoder_loss: 1.0309 - capsnet_acc: 0.4690 - val_loss: 0.6460 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0874 - val_capsnet_acc: 0.4788
Epoch 54/500
107/106 [==============================] - 15s 138ms/step - loss: 0.6333 - capsnet_loss: 0.2175 - decoder_loss: 1.0395 - capsnet_acc: 0.4629 - val_loss: 0.6434 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0809 - val_capsnet_acc: 0.4788
Epoch 55/500
107/106 [==============================] - 15s 137ms/step - loss: 0.6316 - capsnet_loss: 0.2158 - decoder_loss: 1.0394 - capsnet_acc: 0.4670 - val_loss: 0.6433 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0806 - val_capsnet_acc: 0.4788
Epoch 56/500
107/106 [==============================] - 15s 136ms/step - loss: 0.6315 - capsnet_loss: 0.2150 - decoder_loss: 1.0412 - capsnet_acc: 0.4690 - val_loss: 0.6440 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0823 - val_capsnet_acc: 0.4788
Epoch 57/500
107/106 [==============================] - 15s 139ms/step - loss: 0.6295 - capsnet_loss: 0.2158 - decoder_loss: 1.0341 - capsnet_acc: 0.4670 - val_loss: 0.6434 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0808 - val_capsnet_acc: 0.4788
Epoch 58/500
107/106 [==============================] - 14s 135ms/step - loss: 0.6300 - capsnet_loss: 0.2158 - decoder_loss: 1.0355 - capsnet_acc: 0.4670 - val_loss: 0.6428 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0794 - val_capsnet_acc: 0.4788
Epoch 59/500
107/106 [==============================] - 15s 136ms/step - loss: 0.6305 - capsnet_loss: 0.2150 - decoder_loss: 1.0388 - capsnet_acc: 0.4690 - val_loss: 0.6430 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0800 - val_capsnet_acc: 0.4788
Epoch 60/500
107/106 [==============================] - 14s 134ms/step - loss: 0.6297 - capsnet_loss: 0.2158 - decoder_loss: 1.0347 - capsnet_acc: 0.4670 - val_loss: 0.6429 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0796 - val_capsnet_acc: 0.4788
Epoch 61/500
107/106 [==============================] - 15s 139ms/step - loss: 0.6319 - capsnet_loss: 0.2167 - decoder_loss: 1.0382 - capsnet_acc: 0.4650 - val_loss: 0.6451 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0852 - val_capsnet_acc: 0.4788
Epoch 62/500
107/106 [==============================] - 14s 135ms/step - loss: 0.6306 - capsnet_loss: 0.2158 - decoder_loss: 1.0368 - capsnet_acc: 0.4670 - val_loss: 0.6430 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0799 - val_capsnet_acc: 0.4788
Epoch 63/500
107/106 [==============================] - 15s 138ms/step - loss: 0.6267 - capsnet_loss: 0.2142 - decoder_loss: 1.0314 - capsnet_acc: 0.4710 - val_loss: 0.6425 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0787 - val_capsnet_acc: 0.4788
Epoch 64/500
107/106 [==============================] - 15s 137ms/step - loss: 0.6302 - capsnet_loss: 0.2167 - decoder_loss: 1.0339 - capsnet_acc: 0.4650 - val_loss: 0.6424 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0783 - val_capsnet_acc: 0.4788
Epoch 65/500
107/106 [==============================] - 15s 137ms/step - loss: 0.6321 - capsnet_loss: 0.2158 - decoder_loss: 1.0406 - capsnet_acc: 0.4670 - val_loss: 0.6425 - val_capsnet_loss: 0.2110 - val_decoder_loss: 1.0787 - val_capsnet_acc: 0.4788

mkdir fails on initial training run

I following your instructions precisely, but it fails as it is trying to create nested folders with mkdir without using -p flag.

mkdir: cannot create directory ‘model/CIFAR10/13’: No such file or directory
cp: cannot create regular file 'model/CIFAR10/13/deepcaps.py': No such file or directory

ensemble.py fails with: No such file or directory: 'deepcaps_1.npy'

There seem to be some outdated code in ensemble.py which causes it to fail towards the end:

Traceback (most recent call last):
  File "ensemble.py", line 96, in <module>
    d1 = np.load("deepcaps_1.npy")
  File "/home/ubuntu/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 428, in load
    fid = open(os_fspath(file), "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'deepcaps_1.npy'
```

Hyperparameters for Tiny-Imagenet-200

Hi, I work with your code, but I have done bad results using Tiny-Imagenet-200 dataset. The input images have 64 x 64 size and the parameters of the DeepCaps are 138 M about.
Do you know what are the best hyperparameters for this dataset, and what is the accuracy obtained at the end?
Thanks for your help!

Best regards.

Unclear about the model used for MNIST and FashionMNIST

Hi. Thank you for uploading your code.
Could you please clarify the architecture you used for MNIST and FashionMNIST. Is it the same as the one used for CIFAR-10 and SVHN?
Thank you

Hyperparameters for CIFAR10

Hi. Thank you for uploading your code.
Would you let me know the hyperparameters for the CIFAR10?

I tried

model, eval_model = DeepCapsNet(input_shape=x_train.shape[1:], n_class=y_train.shape[1], routings=args.routings) # for 64*64
batch_size = 32

to train on single 1080 ti GPU. But it does not seem to converge.
Thank you.

Confusion in function update_routing

@brjathu Hi. I've read your paper and it's great that you've given your code. Thanks a lot. I have a few doubts in the ConvCapsuleLayer3D layer. Could you clarify them for me?

In the paper, the softmax_3D function was given, with the denominator summed over spatial positions and the capsules in the l+1 th layer. The implementation, however, uses nn.softmax whose denominator sums over only the capsules in update_routing. Could you clarify this?
In the update_routing function, I am unable to understand the reshaping operations. It seems that the dimension for the batch size is being changed. Could you clarify the sizes used?
Thank you.

ValueError: Dimensions must be equal, but are 2560 and 10 for 'digit_caps_2/add' (op: 'AddV2') with input shapes: [?,10,10,2560,32], [10,32].

I am facing this error while training.
I have used the following args:
class args:
numGPU = 1
epochs = 100
batch_size = 8
lr = 0.001
lr_decay = 0.96
lam_recon = 0.4
r = 3
routings = 3
shift_fraction = 0.1
debug = False
digit = 5
save_dir = 'model/CIFAR10/13'
t = False
w = None
ep_num = 0
dataset = "CIFAR10"

The error is common to all datasets.

warnings

@brjathu Hi, Thanks for your great work.I found that after one epoch of training, why did I throw a "NameError: name'warnings' is not defined" error? Can you give me some advices about it?

Environment configuration

Hello author, I am very interested in your paper and experiments. Could you please provide a detailed environment configuration file, proportional to requirements.txt. I see that your readme doesn't give too many operational details for experimental reproduction. Thank you very much if you can reply

Squash dimension

Hi @brjathu, thanks for your great work and for providing the code. I am writing for a doubt on the dimension on which the squash function is applied.

From capslayers.py lines 579/585 I believe to understand that the norm of the tensor is computed along the highest dimension (axis = -1).

In capslayers.py line 351, is it correct that the activations have shape
[batch_size, # of output capsules, # of neurons in output capsules, height, width]
? If this is correct, doing the squash along axis=-1 would mean to do the norm along the width rather than along the neurons of the capsules.

Can you please explain me if I am missing something in the code? Thank you very much

Error running training on multiple GPUs

When running training on multiple GPUs, it fails on creating the Keras model

AttributeError: '_TfDeviceCaptureOp' object has no attribute '_set_device_from_string'

This seems to be related to TF1.14.0 and Keras<2.3.0:
tensorflow/tensorflow#30728

performance

I directly run the code, and only got 10% performance on train set of MNIST dataset. For the val set, the accuracy keeps staying in 0.0000e+00. Is there anything wrong? lol.

Epoch 00001: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00001: capsnet_accuracy improved from -inf to 0.10517, saving model to model/CIFAR10/13/best_weights_2.h5
234/234 [==============================] - 157s 672ms/step - loss: 0.5766 - capsnet_loss: 0.5524 - decoder_loss: 0.0605 - capsnet_accuracy: 0.1052 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 2/100
234/234 [==============================] - ETA: 0s - loss: 0.5477 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1057
Epoch 00002: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00002: capsnet_accuracy improved from 0.10517 to 0.10573, saving model to model/CIFAR10/13/best_weights_2.h5
234/234 [==============================] - 156s 668ms/step - loss: 0.5477 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1057 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 3/100
234/234 [==============================] - ETA: 0s - loss: 0.5477 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1057
Epoch 00003: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00003: capsnet_accuracy did not improve from 0.10573
234/234 [==============================] - 156s 666ms/step - loss: 0.5477 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1057 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 4/100
234/234 [==============================] - ETA: 0s - loss: 0.5478 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1051
Epoch 00004: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00004: capsnet_accuracy did not improve from 0.10573
234/234 [==============================] - 156s 666ms/step - loss: 0.5478 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1051 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 5/100
234/234 [==============================] - ETA: 0s - loss: 0.5477 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1068
Epoch 00005: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00005: capsnet_accuracy improved from 0.10573 to 0.10682, saving model to model/CIFAR10/13/best_weights_2.h5
234/234 [==============================] - 156s 667ms/step - loss: 0.5477 - capsnet_loss: 0.5241 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1068 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 6/100
234/234 [==============================] - ETA: 0s - loss: 0.5475 - capsnet_loss: 0.5239 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1073
Epoch 00006: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00006: capsnet_accuracy improved from 0.10682 to 0.10732, saving model to model/CIFAR10/13/best_weights_2.h5
234/234 [==============================] - 156s 666ms/step - loss: 0.5475 - capsnet_loss: 0.5239 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1073 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 7/100
234/234 [==============================] - ETA: 0s - loss: 0.5476 - capsnet_loss: 0.5240 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1080
Epoch 00007: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00007: capsnet_accuracy improved from 0.10732 to 0.10803, saving model to model/CIFAR10/13/best_weights_2.h5
234/234 [==============================] - 156s 667ms/step - loss: 0.5476 - capsnet_loss: 0.5240 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1080 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 8/100
234/234 [==============================] - ETA: 0s - loss: 0.5475 - capsnet_loss: 0.5239 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1092
Epoch 00008: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00008: capsnet_accuracy improved from 0.10803 to 0.10917, saving model to model/CIFAR10/13/best_weights_2.h5
234/234 [==============================] - 156s 666ms/step - loss: 0.5475 - capsnet_loss: 0.5239 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1092 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 9/100
234/234 [==============================] - ETA: 0s - loss: 0.5475 - capsnet_loss: 0.5238 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1059
Epoch 00009: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00009: capsnet_accuracy did not improve from 0.10917
234/234 [==============================] - 156s 667ms/step - loss: 0.5475 - capsnet_loss: 0.5238 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1059 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 10/100
234/234 [==============================] - ETA: 0s - loss: 0.5475 - capsnet_loss: 0.5239 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1072
Epoch 00010: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00010: capsnet_accuracy did not improve from 0.10917
234/234 [==============================] - 156s 666ms/step - loss: 0.5475 - capsnet_loss: 0.5239 - decoder_loss: 0.0591 - capsnet_accuracy: 0.1072 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 0.0010
Epoch 11/100
234/234 [==============================] - ETA: 0s - loss: 0.5472 - capsnet_loss: 0.5236 - decoder_loss: 0.0589 - capsnet_accuracy: 0.1115
Epoch 00011: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00011: capsnet_accuracy improved from 0.10917 to 0.11146, saving model to model/CIFAR10/13/best_weights_2.h5
234/234 [==============================] - 156s 668ms/step - loss: 0.5472 - capsnet_loss: 0.5236 - decoder_loss: 0.0589 - capsnet_accuracy: 0.1115 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 5.0000e-04
Epoch 12/100
234/234 [==============================] - ETA: 0s - loss: 0.5472 - capsnet_loss: 0.5236 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1114
Epoch 00012: saving model to model/CIFAR10/13/best_weights_1.h5

Epoch 00012: capsnet_accuracy did not improve from 0.11146
234/234 [==============================] - 156s 667ms/step - loss: 0.5472 - capsnet_loss: 0.5236 - decoder_loss: 0.0590 - capsnet_accuracy: 0.1114 - val_loss: 0.0000e+00 - val_capsnet_loss: 0.0000e+00 - val_decoder_loss: 0.0000e+00 - val_capsnet_accuracy: 0.0000e+00 - lr: 5.0000e-04

The network is a 16 layers of CNN and a fully connected capsule layer?

Hi, I really want to know how the 3D-convolution-based dynamic routing works from the code.
In the code it just applies Conv2d without any routing procedure.
If so, the proposed 3D dynamic routing is just Conv2d with a different understanding.
Could you explain more about the details?
Thanks!