hujie-frank / senet Goto Github PK
View Code? Open in Web Editor NEWSqueeze-and-Excitation Networks
License: Apache License 2.0
Squeeze-and-Excitation Networks
License: Apache License 2.0
I‘m wondering how Label Smoothing Regularization works,could you please release the code about that part?3Q
Thanks for sharing such excellent work.
But when I test the SE-ResNet-50 caffemodel, I have encountered some problems.
I add the data layer at the bottom and the accuracy layer at the top of the prototxt:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 224
mean_value: 103.939
mean_value: 116.779
mean_value: 123.68
}
data_param {
source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 10
backend: LMDB
}
}
layer {
name: "accuracy/top1"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top1"
include {
phase: TEST
}
}
layer {
name: "accuracy/top5"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}
and run the test program on ILSVRC12 imagenet dataset:
./build/tools/caffe test --model=models/SE-resnet-50/SE-ResNet-50.prototxt --weights=models/SE-resnet-50/SE-ResNet-50.caffemodel --iterations 5000 -gpu=0
(test batch_size=10)
but i get the following result:
I0830 16:03:59.629042 37849 caffe.cpp:313] Batch 0, accuracy/top1 = 0
I0830 16:03:59.629132 37849 caffe.cpp:313] Batch 0, accuracy/top5 = 0
I0830 16:03:59.629142 37849 caffe.cpp:313] Batch 0, loss = 8.86858
almost all of the accuracy are 0.
Very grateful and hope that getting some advice on this issue from you. Thanks very much.
thank you for realeasing your code.
now,how can I get the SE-ResNet-152.prototxt to the train_SE-ResNet-152.prototxt(train version)?
i'm looking forward your answer,thank you.
Hi Hujie:
First of all, thanks for your excellent work! I can't find the weight_decay parameter in your paper, would you please tell me? ^_^
Best regards,
hungsing
Hi, I didn't find where you have put your axpy_layer in prototxt, is it replaced by "Scale" layer?
Thanks for great network. I found that Se module is general and it can apply to any kind of network. You have tried with the state of the art network but i did not find it apply to densenet. Could you try with se-densenet and what is your performance with it?
According to your paper: single-crop error rates (%) on the ImageNet validation set .
However, I found accuracy in Resnet that is different from your accuracy.
And I am confused about what's meaning of
original
re-implementation
thanks.
You mentioned the conv used are slightly different(3x3 vs. 1x1). However, by comparing your model's prototxt and shicai's, I could not find the difference, could you please point it out more specifically? e.g., the name of the difference layer. Thanks very much!
Why use 1*1 convolution layer instead of full connect layer in SENet?For the less parameter?
Hi,
I modified the deploy prototxt to fine-tune SENet. However, even with a barchsize of 1, i still got out of memory error. Please give some help.
Here are the steps that i used to generate train_val.prototxt:
Then i train it with "caffe train --solver=solver.prototxt --gpu=all --weights=SENet.caffemodel". But no matter how i modify the prototxt, i still got the memory error.
Caffe: master branch
CuDNN: v7.0
Hello. Thank you for sharing fantastic SENet model.
I tried to train with my 1080ti. 11Gb ram.
I succeeded to train SE resnet 101 (train batch 5).
But I failed to train SENet , or SE resnet 152 even though I set the train_batch to 1.
I used bvlc caffe and patched with this repo.
11Gb GPU memory is not enough to train SENet??
Thank you.
Sir, would you please to upload your SENet model to the Baidu Cloud cause we can not download them from the GoogleDrive. Thank you very much in advance.
Hi,
Thanks for the good work and open source code!
In this repo you mention the augmentation methods you use, which includes aspect ratio / rotation / jittering that are not usually used in benchmarking the models (e.g. ResNet / ResNeXt). Did you use these additional augmentation methods to get the reported results in the repo (and the SENet entry in Table 3. of the paper?) I am confused because it seems that you didn't mention them in the implementation section of the paper when you describe the data augmentation methods.
Thanks!
Best,
Hongyi
Does SE-Net work in other network architectures like VGG, Alexnet or some other networks besides GoogLeNet Inception and ResNet?
Besides, does SE module work on detection frameworks or models like SSD, Faster RCNN, etc? I have add the Squeeze and Excitation module in SSD ( aiming to strengthen the 6 features), but it doesn't seem to work.
Looking forward to your suggestions, thanks so much.
Hi Hujie,
First of all thank you open your code,In your introduction,you use two fc-layer in SE block,but in prototxt,I find that you use two conv-layer with 1*1 kernel size. Is this a improvement? If yes ,is this improved performance to compared with the fc-layer.
Please help.
Thanks,
Totoro
Hi
I test the SE-BN-Inception and use the pre-trained caffemodel, but the accuracy is 0.00018 on Imagenet2012.
May I ask why this could happen, is it a problem with my dataset?
Thanks.
When use SENet to FCN, what does the global average pooling layer behave?
Hi,thanks for your shared code.
48 for (int i = blockDim.x / 2; i > 0; i >>= 1) {
49 if (tid < i) {
50 buffer[threadIdx.x] += buffer[threadIdx.x + i];
51 }
52 __syncthreads();
53 }
Sorry, can you explain the logic behind lines from 48 to 53 in axpy_layer.cu? In my opinion, these piece of code should be commented out.
Excuse me,what would be the difference of accuracy if I didn't add param{lr_mult, and decay_mult} when training? What is the default value(lr_mult, and decay_mult)for Caffe?
Hi Jie,
Thanks a lot for sharing the models! Could you also share some details on preprocessing / augmentation so it's easier for others to reproduce the results?
Hi
There is a strange thing when I train the default SE-BN-Inception,The loss is always 6.9 when I didn't use the pretrained model to train on Imagenet2012(even at iteration 0,the loss is 6.9).And if I use the pretrained model the loss is begin to fall from 9.6.
And I add
weight_filler{
type: "msra"
}
in each convolotion layer(didn't use the pretrained model) ,and the loss can be reduced from 6.9(also 6.9 at iteration 0).
May I ask why this would happen?
i have add all layers,and make success,but
Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ......WindowData)
*** Check failure stack trace: ***
so HOW to set the caffe.proto?
can you release the modified caffe.proto?
I tried to reproduce the training from scratch, but the accuracy is 5 point lower on Inception network. Can you share the solver file, e.g. number of iterations, learning rate policy, or any more details of the training?
Is this a new custom architecture you came up with, or is it a variant of ResNext or Inception?
Thanks.
Hi @hujie-frank ,
Have you runned SE-ResNeXt-101 (32 x 4d) on caffe before? No matter how many GPUs I use, the memory problem is serious. SE-ResNeXt-50 is Ok.
Thanks for the wonderful job! @hujie-frank
Could you please share the code that implement the augmentation,such as Aspect Ratio ,Random Rotation, and Pixel Jitter.
Thank you in advance!
你好,非常喜欢你的这个创新,能共享一下训练的train_val.prototxt吗?
谢谢!
Hi Frank,
Good job! I have two puzzles to check with you. :)
The BatchNorm layer has "use_global_stats: true", which means the prototxt is the deploy instead of the train prototxt? Cuase if my understanding is correct, the use_global_stats is set to be FALSE during the training process of BN.
You comment the mean value in the prototxt. Does it mean that we need to minus the mean by myself cause it is possibly the deploy.prototxt?
Thank you!
Best,
Jordan
When I train the SE-ResNeXt-101 (32 x 4d), everything is ok. However, I train the SENet, the logs say that "status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR". I disable CUDNN in the Makefile and recompile the caffe ,the error disappears. Without CUDNN, training the SENet is slower. Could you please tell me how to debug the error while using the CUDNN to accelerate the training?
Greeting!
I am training SE-Resnet-50, and I have enough GPU memory for training.
But when I am trying to do testing with the same config, I got out of memory error.
I am confused, do SE need more memory in testing phase than training?
Thanks for sharing your great work, amazing. By the way, do you have some optimization on the group convolution in Caffe? The SENet may suffer memory or speed problems.
Solving...
F0616 13:59:44.988385 16022 math_functions.cu:79] Check failed: error == cudaSuccess (74 vs. 0) misaligned address
how to solve the problem?
Hi,thanks for your shared code.
i have a question in pooling_layer.cu, a part of code is as followed:
......
......
case PoolingParameter_PoolMethod_AVE:
if (this->layer_param_.pooling_param().global_pooling()) {
// NOLINT_NEXT_LINE(whitespace/operators)
GlobalAvePoolForward<<<bottom[0]->count(0, 2), CAFFE_CUDA_NUM_THREADS>>>(
bottom[0]->count(2), bottom_data, top_data);
} else {
// NOLINT_NEXT_LINE(whitespace/operators)
AvePoolForward<<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
count, bottom_data, bottom[0]->num(), channels_,
height_, width_, pooled_height_, pooled_width_, kernel_h_,
kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_, top_data);
}
break;
......
......
Sorry,Is it necessary to choose the void of GlobalAvePoolForward ?
And What is the difference between GlobalAvePoolForward and AvePoolForward for the result?
Thank you!
I am trying to perform evaluation of your pretrain models on imagenet. I tested a few images and it seems that you are not following the standard caffe "data/ilsvrc12/synset_words.txt " from "http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz".
Where can i find your synset list?
As shown in Figure 8, how to make the distribution of average activations?
Could you share your code that show the distribution of average activations ?
Thanks in advance.
I found out that you use pretrained model of scene in https://github.com/lishen-shirley/Places2-CNNs.git.I also use this pre-trained model with the se block. But the accuracy of my model can not exceed that of the origin models without se block. Can you tell me more details about training with places 365 data? Do you add auxiliary loss as mentioned in 《Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks》? @hujie-frank
Hi,
first off all, congratulations to the great result on ImageNet!
I want to try your architecture in my Master Thesis,
where I try to distinguish action forces from regular pedestrians based on their appearance in my own dataset.
Here is what I did:
I added your provided files to my NVIDIA caffe flavour from
https://github.com/NVIDIA/caffe
in
src/caffe/layers
and
include/caffe/layers
respectively.
Then I ran "make clean".
But when I want to build caffe with "make all -j16" I get the following build error?
In file included from src/caffe/layers/axpy_layer.cpp:8:0:
./include/caffe/layers/axpy_layer.hpp:25:36: error: wrong number of template arguments (1, should be 2)
What do I also need to change in order for a successful build?
cannot find arxiv paper from the given link
Hi,when i run the train the codes,I have put the corresponding include and cpp files in the caffe file, but still encounter the following error and i don't konw how to solver this problem, can you help me,thanks
] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, Python, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***
I'm implementing your network in tensorflow, but I do not know exactly what the scale is.
Can you explain ?
thank you
hi,how to squeeze and excitation VGG networks?Any suggestions?
Thank you for your share!
Hi, I downloaded the code and compiled successfully, but when I test the SENET, the results were wrong.
I use pywrapper and I haven't tried cmd.
import matplotlib; matplotlib.use('agg')
%matplotlib inline
import sys
import os
cafferoot = '/home/liaofangzhou/caffes/caffe'
sys.path.append(os.path.join(cafferoot,'python'))
import caffe
import numpy as np
from matplotlib import pyplot as plt
import pandas
net = caffe.Net('/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/model_def/SENet.prototxt',
'/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/pretrained_model/SENet.caffemodel',
caffe.TEST)
# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.array([104, 117, 123])
print 'mean-subtracted values:', zip('BGR', mu)
# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
transformer.set_transpose('data', (2,0,1)) # move image channels to outermost dimension
transformer.set_mean('data', mu) # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255) # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0)) # swap channels from RGB to BGR
image = caffe.io.load_image(os.path.join(cafferoot,'examples/images/cat.jpg'))
im_input = transformer.preprocess('data', image)
net.blobs['data'].data[:] = im_input
output = net.forward()
output_prob = output['prob'][0] # the output probability vector for the first image in the batch
print(np.argmax(output_prob))
and the result turned out to be 7(cock), but the truth should be 281(cat). And the output probability seems to be quite confident. Why is that?
Thank you in advance!
I0929 12:02:35.547410 5970 caffe.cpp:155] Finetuning from ../model/SE-ResNeXt-101.caffemodel
I0929 12:02:35.856974 5970 net.cpp:761] Ignoring source layer label_data_1_split
I0929 12:02:35.857090 5970 net.cpp:761] Ignoring source layer conv2_1_1x1_increase/bn_conv2_1_1x1_increase/bn_0_split
I0929 12:02:35.857216 5970 net.cpp:761] Ignoring source layer conv2_2_1x1_increase/bn_conv2_2_1x1_increase/bn_0_split
I0929 12:02:35.857317 5970 net.cpp:761] Ignoring source layer conv2_3_1x1_increase/bn_conv2_3_1x1_increase/bn_0_split
I0929 12:02:35.857544 5970 net.cpp:761] Ignoring source layer conv3_1_1x1_increase/bn_conv3_1_1x1_increase/bn_0_split
I0929 12:02:35.857978 5970 net.cpp:761] Ignoring source layer conv3_2_1x1_increase/bn_conv3_2_1x1_increase/bn_0_split
I0929 12:02:35.858289 5970 net.cpp:761] Ignoring source layer conv3_3_1x1_increase/bn_conv3_3_1x1_increase/bn_0_split
I0929 12:02:35.858595 5970 net.cpp:761] Ignoring source layer conv3_4_1x1_increase/bn_conv3_4_1x1_increase/bn_0_split
what is the "bn_0_split"?
any pretrained models for mxnet?
i try to convert the caffe model to mxnet model
but there is a new layer,do you have any pretrain model for mxent
Hi Hujie,
I wonder what is the architectural difference between SE-ResNeXt-101 and SENet, I have OOM issue with SENet and it almost require 7G to initialize the network, but SE-ResNeXt-101 only need <3G.
Why there is hug memory usage difference?
Please help.
Thanks,
Ruxiao
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.