Comments (7)
This is due to the grouped convolutions that are used in the original AlexNet (and Caffe's implementation of it). Notice the group: 2
parameter over here.
Grouping appears to be a historical artifact. I don't believe I've seen it outside of the original AlexNet implementation. Krizhevsky's own subsequent version in the "One Weird Trick" paper discards the grouping. The current implementation of caffe-tensorflow ignores it as well.
For reference, these are the output blob shapes for AlexNet produced by Caffe:
# Name W H C
-------------------------------------------------
1 data 227 227 3
2 conv1 55 55 96
3 norm1 55 55 96
4 pool1 27 27 96
5 conv2 27 27 256
6 norm2 27 27 256
7 pool2 13 13 256
8 conv3 13 13 384
9 conv4 13 13 384
10 conv5 13 13 256
11 pool5 6 6 256
12 fc6 1 1 4096
13 fc7 1 1 4096
14 fc8 1 1 1000
15 prob 1 1 1000
and the corresponding parameter shapes:
# Name Out In W H Total
------------------------------------------------------------------------------
1 conv1 96 3 11 11 34848
2 conv2 256 48 5 5 307200
3 conv3 384 256 3 3 884736
4 conv4 384 192 3 3 663552
5 conv5 256 192 3 3 442368
6 fc6 4096 9216 - - 37748736
7 fc7 4096 4096 - - 16777216
8 fc8 1000 4096 - - 4096000
from caffe-tensorflow.
I'm sorry, I don't understand how you can import the weights produced by caffe if we can't use the group parameters. When I try to import them, I have the following error:
ValueError: Dimensions Dimension(96) and Dimension(48) are not compatible
This is normal, since I don't have the same size for the weights. The output of the first convolution is 96, and the input of the second convolution is 48.
Is it something that makes the importation impossible? If so, is there a relatively small pretrained equivalent?
from caffe-tensorflow.
As I mentioned, this is due the group
parameter. The sizes are actually consistent because the convolution is performed by splitting the input tensor along the depth into 2 groups first. So, this gives you two 55 x 55 x 48
volumes. You further group the convolution filters into 2 groups as well, giving you two 27 x 27 x 48 x 128
filters. You then convolve each group separately and merge them back together to obtain the 27 x 27 x 256
output.
Grouped convolutions are supported by Caffe, but (AFAIK) not by TensorFlow. You'd have to implement it yourself. The exported parameters, however, are correct.
from caffe-tensorflow.
I think I can manage to make a simple pull request to implement this in the Network class. I will keep you posted when I succeed.
from caffe-tensorflow.
I had some free time today, so I went ahead and implemented support for grouped convolutions. Everything should work correctly now.
Tested on AlexNet and CaffeNet.
from caffe-tensorflow.
Out of curiosity, how does the performance of AlexNet change when you add / remove grouped convolutions? Which version has higher accuracy on ImageNet classification, for example?
from caffe-tensorflow.
I haven't evaluated AlexNet/CaffeNet without grouped convolutions, so I have no hard data.
I believe the original motivation for grouping in Krizhevksy's paper was to allow for splitting the training across two GPUs. Interestingly though, the split resulted in specialization across the groups (see section 6.1: one group was color-agnostic, the other was color-sensitive).
I recently came across this paper which claims to use grouped convolutions and achieve similar or higher accuracy with fewer parameters: http://arxiv.org/pdf/1605.06489.pdf
from caffe-tensorflow.
Related Issues (20)
- Does it convert weights of model with recurrent layers
- Query with alpha value in LRN of tensorflow
- TypeError: Descriptors should not be created directly, but only retrieved from their parent. HOT 2
- google.protobuf.text_format.ParseError: 7:1 : Expected identifier or number, got <. HOT 7
- /transformer.py", line 124, in map_inner_product assert node.parameters.bias_term == True
- Error encountered: Unknown layer type encountered: Normalize HOT 2
- Error "'module' object has no attribute 'byte'" on mnist example
- Unknown layer type encountered: PReLU HOT 2
- Hi,did you slove it correctly?
- Combining multiple prototxt files
- Running into Assertion Error HOT 4
- TypeError: unsupported format string passed to tuple.__format__ HOT 3
- Can't Convert Example HOT 2
- Generic conv implementation does not support grouped convolutions for now.
- numpy array is an unnatural container for weights
- Doesn't work
- Not Working HOT 2
- Cannot determine dimensions of data layer
- Not working HOT 2
- Not working
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from caffe-tensorflow.