Coder Social home page Coder Social logo

naturomics / capsnet-tensorflow Goto Github PK

View Code? Open in Web Editor NEW
3.8K 249.0 1.2K 1.59 MB

A Tensorflow implementation of CapsNet(Capsules Net) in paper Dynamic Routing Between Capsules

License: Apache License 2.0

Python 98.18% R 1.82%
capsnet tensorflow capsule capsule-network routing-algorithm dynamic-routing

capsnet-tensorflow's People

Contributors

cove9988 avatar naman-bhalla avatar naturomics avatar parthsuresh avatar wfus avatar www0wwwjs1 avatar xdxx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

capsnet-tensorflow's Issues

Improper loop of b_IJ

Hi,
Thanks for your great job. I found that the b_IJ is update with the order of J in your code.
In CapsConv

            for j in range(self.num_outputs):
                with tf.variable_scope('caps_' + str(j)):
                    caps_j, b_IJ = capsule(input, b_IJ, j)
                    capsules.append(caps_j)

In capsule
c_IJ = tf.nn.softmax(b_IJ, dim=2)

In your case, b_I(J+1) is not independent with b_IJ, which means the sequence matter the routing process. But in my opinion, all b_IJ should be update in parallel. Thanks for your reply in advance!

Routing algorithm

To the owner and all other visitors:

I do not mean to be offensive, but I decided to speak out my understanding of this routing algorithm as I have not seen any correct implementation so far yet.

The correct implementation of the routing algorithm should be treated something like the dynamic RNN in TensorFlow. In other words, if you implement it in a static way, and if you do 3 iterations, the two caps layers are actually 6 such layers. The primary layer performs line 4 and output to the digits layer, and then the digits layer performs line 5, 6, and 7 with b_ij updated, and then loop back to the primary layer again. This will need to use tf.while_loop if you use a dynamic way.

What confuses me or stops me from implementing myself is I am not sure how the weights and biases associated with the conv units are updated, as I assume other than the weights and biases associated with the capsules, each individual conv unit inside still carries its own parameters. Maybe I missed this by reading the paper.

Feel free to correct me if you believe I am wrong. Thanks.

Training on different input dimensions than MNIST

Thanks for writing the code so shortly after the article was released. I'm trying to change the structure such that the capsule network can be trained for any image(x,y,z), but I am having trouble re-structuring the code. Can you help me identify which lines needs to be modified. I am guessing all lines with ... 28, 28, 1) -> ... 32, 32, 3) for CIFAR 10. But I am still not able to make it work.

Thank you again 👍

reshape question

in capsNet init() else branch, how can the label(a placeholder with shape (batch_size,)) be reshaped to (batch_size,10,1) ?

Reshape is correct?

capsules = tf.reshape(capsules, (cfg.batch_size, -1, self.vec_len, 1))

I think this line is not preserving the following in the paper:

"Each primary capsule output sees the outputs of all 256 × 81 Conv1 units whose receptive fields overlap with the location of the center of the capsule."

i.e. we should ensure that the first capsule after the view corresponds to the pixel [0,0] of the first 8 filters, and the second with [0,1] and so on.

How to test a new image?

Hi,

If I want to load an image and get its softmax score, How to write the script?
I've been trying several hours, since I'm a beginner in tensorflow and it's kind diffcult for me.

    with tf.Graph().as_default():
        image = tf.cast(image_arr, tf.float32)
        image = tf.image.per_image_standardization(image)
        image = tf.reshape(image, [1,28, 28, 1])
        #x = tf.placeholder(tf.float32,shape = [1,28, 28, 1])
        feature=CapsNet.test_net(image)
        logits = tf.nn.softmax(feature)
        #saver = tf.train.Saver()
        aaa=1
        with tf.Session() as sess:
            saver = tf.train.import_meta_graph('./logdir/model_epoch_0048_step_23899.meta')
            saver.restore(sess, './logdir/model_epoch_0048_step_23899')
            print(image.shape)
            test_result = sess.run(aaa,image)

ValueError: Dimensions must be equal, but are 16 and 128 for 'sub_3' (op: 'Sub') with input shapes: [128,10,16,784], [128,784].

Hi,nice work! But I got an error in my local computer:

python train.py
Traceback (most recent call last):
  File "train.py", line 11, in <module>
    capsNet = CapsNet(is_training=cfg.is_training)
  File "/home/joffrey/projects/CapsNet-Tensorflow/capsNet.py", line 16, in __init__
    self.loss()
  File "/home/joffrey/projects/CapsNet-Tensorflow/capsNet.py", line 84, in loss
    squared = tf.square(self.decoded - orgin)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py", line 865, in binary_op_wrapper
    return func(x, y, name=name)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2629, in _sub
    result = _op_def_lib.apply_op("Sub", x=x, y=y, name=name)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2632, in create_op
    set_shapes_for_outputs(ret)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1911, in set_shapes_for_outputs
    shapes = shape_func(op)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1861, in call_with_requiring
    return call_cpp_shape_fn(op, require_shape_fn=True)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 595, in call_cpp_shape_fn
    require_shape_fn)
  File "/home/joffrey/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/common_shapes.py", line 659, in _call_cpp_shape_fn_impl
    raise ValueError(err.message)
ValueError: Dimensions must be equal, but are 16 and 128 for 'sub_3' (op: 'Sub') with input shapes: [128,10,16,784], [128,784].

My env version:

Python 2.7.13 |Anaconda custom (64-bit)| (default, Dec 20 2016, 23:09:15)

tensorflow==1.3.0rc2
tensorflow-gpu==1.3.0

Inconsistency with the Paper

I noticed when reading your code that you have left an inconsistency between your code and the original paper by Hinton. When you run the decoder, the input is the masked only correct capsule. This does not follow what Hinton did in the paper, because they mask the remaining capsules to 0, and pass all of the capsules to the next layer. This way positionally, the decoder can decide what it is trying to construct. The specific error is in this line self.masked_v = tf.matmul(tf.squeeze(self.caps2), tf.reshape(self.Y, (-1, 10, 1)), transpose_a=True). Therefore the first layer of the decoder should take an input of size 160, not 16.

Should we squash in the PrimaryCaps layer?

Hi,
Sorry to interrupt you again. I feel exicting when I see your work proceeding. I here realize that you do squashing operation in PrimaryCaps layer, which I don't see the reason. The paper uses squashing during routing process, but there is no routing process between Conv1 and PrimaryCaps. So I wonder is it reasonable to put squashing operation in PrimaryCaps layer? Expecting your reply! Thanks in advance.

will the norm in squash func be a scalar

Hi, good job !
I have a small question that: in the squash func, you keep the dims of norm('vec_squared_norm') as
that of the 'vector', I wonder why not collapse its dims to [batch_size,1] ?
From where I stand, the norm should be a scalar.
e.g.:
x= [a,b,c,d]
||x||^2 = norm(x)^2 = (|a|^2+|b|^2+|c|^2+|d|^2)

thus x --> norm(x)^2 : [batch_size, 1, num_caps, vec_len, 1] --> [batch_size, 1] ?

Deer man

I think we should make a wechat group here for who interest this kind of subject. My wechat is bn31201 . Hope your adding, make some deep communicating.

Only 10% test accuracy on rotated images!!!!

CapsNet is said to perform better for rotated images.. but i trained the network with original images .. and tested the model with rotated images... the test accuracy was 10%... which is so depressing..

'apply' method is not defined in 'capsLayer.py'

In 'capsLayer.py', the 'fully_connected' function uses 'CapseLayer' class to build a fully connected layer. It returns 'layer.apply(inputs)'. However, I did not find the 'apply' method definition in the class. Are you going to define it? Or it is just my problem that I did not find the definition? Could you please tell me where it is defined?

confused about softmax(v_length)

I am confused that why you use the softmax to v_length here. since i have not found this operate in hitton's paper and Figure 1?
In contrast, it seems that the capsnet allows a muti-label classification which means that it is not necessary to use the softmax to v_length, according to Section 3 of hitton's paper (To allow for multiple digits, we use a separate margin),

b_IJ update

Hi,
in CapsLayer.py, consider the current code. One sees that if cfg.iter_routing == 1, that b_IJ never gets updated. Surely that is not the intent? Shouldn't b_IJ be updated at every iteration of the routing? Thanks.

Gordon

if r_iter == cfg.iter_routing - 1:
                # line 5:
                # weighting u_hat with c_IJ, element-wise in the last two dims
                # => [batch_size, 1152, 10, 16, 1]
                s_J = tf.multiply(c_IJ, u_hat)
                # then sum in the second dim, resulting in [batch_size, 1, 10, 16, 1]
                s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
                assert s_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]

                # line 6:
                # squash using Eq.1,
                v_J = squash(s_J)
                assert v_J.get_shape() == [cfg.batch_size, 1, 10, 16, 1]
            elif r_iter < cfg.iter_routing - 1:  # Inner iterations, do not apply backpropagation
                s_J = tf.multiply(c_IJ, u_hat_stopped)
                s_J = tf.reduce_sum(s_J, axis=1, keep_dims=True)
                v_J = squash(s_J)         # <<<<<<<< MISSING UPDATE of B_IJ? 

                # line 7:
                # reshape & tile v_j from [batch_size ,1, 10, 16, 1] to [batch_size, 1152, 10, 16, 1]
                # then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
                # batch_size dim, resulting in [1, 1152, 10, 1, 1]
                v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
                u_produce_v = tf.matmul(u_hat_stopped, v_J_tiled, transpose_a=True)
                assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]

                # b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)
                b_IJ += u_produce_v   # <<<<<< PERHAPS THIS LINE SHOULD BE OUTSIDE THE r_iter LOOP? 

Getting test prediction labels per image

First of all, thank you for this wonderful implementation. Not only does it work like a charm, I am learning a lot about how to use Tensorflow effectively 👍

I trained the code with all default on the MNIST dataset, which returned an accuracy of 99.49 %. That great!

I am now trying to classify some of my own handwritten MNIST digits. I have created 15,000 samples, black and white digits, with the same dimensions as MNIST. I created a small function to feed my data into main.py, and eventually got things working.

My problem is that I get a test-accuracy of ~9%, which equates to random guessing on the 10 classes!

For this reason, I would like to get the predicted labels back for each of the images, so that I can try to debug. Is there an easy way to do this? Could you please provide any hints?

Any help would be much appreciated!

Relu activation in PrimaryCap?

the tf.contrib.layers.conv2d applies a relu activation,the PrimaryCap convolution does not included a relu activation before grouping neurons into capsules and then squashed, or did I miss something from the paper

capsules = tf.contrib.layers.conv2d(input, self.num_outputs * self.vec_len,

something about the Summary

In your code "capsNet.py",you add "self.decoded" to the "tf.summary.image" as "recon_img ",but self.X= input_image/255,and in your code
"
orgin = tf.reshape(self.X, shape=(cfg.batch_size, -1))

    squared = tf.square(self.decoded - orgin)

    self.reconstruction_err = tf.reduce_mean(squared)

"
so self.decoded is not reconstructed image,you need to multiply it by 255,right?

Is your b_ij wrong?

Hi,
Thanks for your contribution. And I think the b_ij defined in your code probably is unmatched with the paper.
Your code is:

self.b_ij = tf.get_variable('b_ij', shape=(1, 1152, 1, 1))
...
c_i = tf.nn.softmax(self.b_ij, dim=1)

But in fact it should be

b_i = tf.get_variable(‘b_i’, shape=(1, 1152, 16, 1))
...
c_i = tf.nn.softmax(b_i, dim=2)

If I have misunderstand your code, please ignore me. Thanks~

Valid padding in CapNets

Hello sir,
I am following the Capsule Network paper and your implementation.
I have a quick question about the valid padding in the conv2 you used to get output for the Primary Caps. So as I understand, after the 1st conv layer, the size of output is (batchsize,20,20,256). So if the conv2 has 256, 9x9 kernel, stride 2 then the formula output should be (20-9+2*p)/2+1 = 6. However, mathematically, the formula above can not be solved so I would like to ask how did exactly padding (valid) works in this situation to have the output is (batchsize,6,6,256).
Thanks !

Note to Huadong

Hi Huadong,

I've been running succesful tests of CapsNets with Pytorch and would like to compare notes with you. Maybe we can take our discussion offline?
My email is: firstname.lastname[@]gmail.com

Let me know!

Tarry

why average b_ij a cross example?

https://github.com/naturomics/CapsNet-Tensorflow/blob/master/capsLayer.py#L151

            # then matmul in the last tow dim: [16, 1].T x [16, 1] => [1, 1], reduce mean in the
            # batch_size dim, resulting in [1, 1152, 10, 1, 1]
            v_J_tiled = tf.tile(v_J, [1, 1152, 1, 1, 1])
            u_produce_v = tf.matmul(u_hat, v_J_tiled, transpose_a=True)
            assert u_produce_v.get_shape() == [cfg.batch_size, 1152, 10, 1, 1]
            b_IJ += tf.reduce_sum(u_produce_v, axis=0, keep_dims=True)

Why would you need to average b across batch dimension? I don't see why would that be good, since that would make the model batch-size dependent. If there is any mention on this in the paper or other source, can you point out where and send a link, appreciated.

Adding layers

Thanks for the nice code. I had a question regarding the capsnet. Would it be possible to add layers (like a conv-caps layer after the first primary layer, or a fully connected caps layer with 20 capsules befor the digit caps layer?)
I tried it myself and am getting terrible results! I can't understand why is it happening. Do you have any idea?

Question about num_outputs

I'm trying to read through the definition of the class CapsLayer. Does num_outputs actually correspond to the number of capsules ? From what I understand from the following code, it looks like the number of capsules is actually stored in vec_len.

capsules = []
    for i in range(self.vec_len):
    # each capsule i: [batch_size, 6, 6, 32]
        ...
        capsules.append(caps_i)

Sorry to bother you about the documentation, it's just to have a better understanding of how capsules work. Thanks for sharing your work by the way.

Extending the Capslayers

Hello! I'm currently working on a project where I'd like to experiment with capsules in lieu of CNNs for Deep-Q Learning. Great work on releasing this implementation! While working with this code I ran into issues with using more capsule layers than just the ones in the CapsNet architecture. For instance, I was wondering if was possible to use multiple convolutional capsule layers with routing and to change their output sizes? I've tried to tweak the code to do this but I keep running into size issues and fear I might break the logical implementation. Any tips greatly appreciated!

Why does CapsLayer version 2 equivalent to version 1?

For the input feature map (batch_size, 20, 20, 256), the Conv of version 1 do 256x32x9x9 for each point in feature map, then concat each 8 output feature maps. And Conv of version version 2 do 256x(32x8)x9x9 for each point. That is to say, in version 1, the result of each point of input feature map is effected by only 32 kernels, but in version 2, it will be effected by 32*8 kernels.

RGB dataset (224*224)

How can we use ur code in other RGB dataset?
Suppose the structure of dataset is like that. it contains some sub-folder. Each sub-folder represents one class.

Class A:
0001.jpg 1
0002.jpg 1
Class B:
0001.jpg 2
0002.jpg 2

questions about the weight maxtrix Wij between ui and vj

Firstly.thanks for your answer on zhihu as well as the implementation on github, it helps me a lot understanding the original paper.

I would like to share my doubt about the very lines just below the figure 2 of the original paper which says "each capsule in the [6,6] grid is sharing their weights with each other".which by my understanding ,means capsule outputs(vector ui) among a [6,6] grid shares the same Wij.thus,just 32 W should be updated using adam.but in your implementation ,I can't find any codes to handle the weights sharing mechanism.

Besides,I think the shape of Wij should be [16,8] as the ui is [1,8] or [8,1] vector and obviously conflicts with the Eq 2 .although it looks like a problem without any importance,I pick it out so that i would be righted if i am wrong with understanding this paper and your implementation.

a question

hey,everyone, i'm really wondering, is this architecture really better than original cnn? is there some wonderful performances finished in this architecture?

mistake on annotation

I think there are some mistake on annotation. I found two.

L61
`
Reshape the input into [batch_size, 1, 1152, 8, 1]
--->

Reshape the input into [batch_size, 1152, 1, 8, 1]
`

L77
`
input: A Tensor with [batch_size, 1, num_caps_l=1152, length(u_i)=8, 1]
--->

input: A Tensor with [batch_size, num_caps_l=1152, 1, length(u_i)=8, 1]
`

order should be change I think

Thanks

Run in GTX960M get error [InternalError (see above for traceback): Dst tensor is not initialized.]——out of memory

When run in windows + GTX960M,I get this error

InternalError (see above for traceback): Dst tensor is not initialized.

Some blogs told that it is caused by lack of GPU memory. But I cannot fix the problem. Wish some one could help me.

运行在Windows10 + GTX960M,出现错误

InternalError (see above for traceback): Dst tensor is not initialized.

我查了些博客,说是GPU 内存不足的时候,会出现这个错误。希望能够修复这个问题。

[Question] Could the CapsNet unit apply to other more complex architecture ?

Hi!

I'm a student interested in Speech Synthesis with neural networks.
I suppose this CapsNet might improve the quality of synthesized speech,
so I try to apply this great program to the other program to generate artificial speech with neural network.

I would like to ask whether this CapsNet could replace other popular neural networks like CNN.

Thank you for answering.

about tf.argmax() function

My tf version is 1.2.1. The following code in capsNet.py :

argmax_idx = tf.argmax(self.softmax_v, axis=1, output type=tf.int32)

should be changed to:

argmax_idx = tf.to_int32(tf.argmax(self.softmax_v, axis=1)) in version 1.2.1.

Why b_IJ is shared between single batch examples.

Forgive me if I got this wrong but it seems like the b_IJ are shared between all examples within a single batch (see reduce_sum and the shape).

I didn't see any mention of the batches in the paper, so I have assumed that there is a separate set of b_IJ weights for every batch. Why do you think that it's better to share those variables?

Edit:
I've corrected the statement:

b_IJ are shared between all batches

to:

b_IJ are shared between all examples within a single batch

which is was I originally meant.

share weights in 6x6x8 grids

In paper, each capsule in the [6 × 6] grid is sharing their
weights with each other and is your code miss this point?

Only 10% accuracy for scaled images!!!!!

CapsNet is said to perform better for scaled images but i trained the network with original images and tested the network with the scaled images to find out the test accuracy to be only 10%... #CapsBoringNet

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.