hujie-frank / senet Goto Github PK

View Code? Open in Web Editor NEW

3.3K 3.3K 835.0 1.35 MB

Squeeze-and-Excitation Networks

License: Apache License 2.0

C++ 20.23% Cuda 79.77%

caffe gpu senet

senet's People

Contributors

Stargazers

Watchers

Forkers

weitaoatvison liuguoyou luan-g sophiezhou insmod-he zgsxwsdxg chelovekhe congmonkey johndpope andyliu93 starstylesky 3dmm-icme2023 murugeshmarvel dreadlord1984 lihua213 wanjinchang chenbinghui1 phecy yuckfu lyy5 pierrehao xuguozhi marvis wang-mengjiao stonegiggity yiliangnie baiyancheng20 silence4395 linzichuan jackieyung liubinyijia xc35 runauto johnson-yue xshhhm zbxzc35 wangzhe0623 jieli1994 vseledkin codeaudit issac8huxley thuanvh xiongduan santara benjamesbabala 123chengbo wujiahongpku bicelove davidnewgate chunyanlian pzz2011 izhaolei chaoisanai guoxiangqu aliushn fqss0436 huaxinxiao hongzhenwang haofusheng venkai ml-lab autohe dengql collector-m kambarakun fleapo tony32769 strategist922 fatherofham zhouhui1992 liuc425 wangnuowa bauerzhou xujinchang yinggo dimplesl wzhang1 rongchangzhao hualitlc lyk125 shuangseu zhyj3038 lovelyboy1 olivierjb haiyang21 liyong3forever legolas123 walkoncross zdwong longchuan1985 408550969 zhs1 lvchigo liujie3948 armstrongyang dengshuo jade999 qlaboratory sunpeng1996 senlinuc

senet's Issues

About Label Smoothing Regularization

I‘m wondering how Label Smoothing Regularization works，could you please release the code about that part？3Q

something wrong with the test of SE-ResNet-50?

Thanks for sharing such excellent work.

But when I test the SE-ResNet-50 caffemodel, I have encountered some problems.

I add the data layer at the bottom and the accuracy layer at the top of the prototxt:
layer {
name: "data"
type: "Data"
top: "data"
top: "label"
include {
phase: TEST
}
transform_param {
mirror: false
crop_size: 224
mean_value: 103.939
mean_value: 116.779
mean_value: 123.68
}
data_param {
source: "examples/imagenet/ilsvrc12_val_lmdb"
batch_size: 10
backend: LMDB
}
}

layer {
name: "accuracy/top1"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top1"
include {
phase: TEST
}
}

layer {
name: "accuracy/top5"
type: "Accuracy"
bottom: "classifier"
bottom: "label"
top: "accuracy/top5"
include {
phase: TEST
}
accuracy_param {
top_k: 5
}
}

and run the test program on ILSVRC12 imagenet dataset:
./build/tools/caffe test --model=models/SE-resnet-50/SE-ResNet-50.prototxt --weights=models/SE-resnet-50/SE-ResNet-50.caffemodel --iterations 5000 -gpu=0
(test batch_size=10)

but i get the following result:
I0830 16:03:59.629042 37849 caffe.cpp:313] Batch 0, accuracy/top1 = 0
I0830 16:03:59.629132 37849 caffe.cpp:313] Batch 0, accuracy/top5 = 0
I0830 16:03:59.629142 37849 caffe.cpp:313] Batch 0, loss = 8.86858
almost all of the accuracy are 0.

Very grateful and hope that getting some advice on this issue from you. Thanks very much.

hello,i want ask a question

thank you for realeasing your code.
now,how can I get the SE-ResNet-152.prototxt to the train_SE-ResNet-152.prototxt(train version)?
i'm looking forward your answer,thank you.

What is your weight_decay parameter ?^_^

Hi Hujie:

First of all, thanks for your excellent work! I can't find the weight_decay parameter in your paper, would you please tell me? ^_^

Best regards,
hungsing

MXNet version?

where have you use your axpy_layer

Hi, I didn't find where you have put your axpy_layer in prototxt, is it replaced by "Scale" layer？

Have you try SE-Densenet?

Thanks for great network. I found that Se module is general and it can apply to any kind of network. You have tried with the state of the art network but i did not find it apply to densenet. Could you try with se-densenet and what is your performance with it?

Some doubts in baselines accuracy

According to your paper: single-crop error rates (%) on the ImageNet validation set .

However, I found accuracy in Resnet that is different from your accuracy.

And I am confused about what's meaning of

original

re-implementation

thanks.

the differences with the third-party caffe impl.

You mentioned the conv used are slightly different(3x3 vs. 1x1). However, by comparing your model's prototxt and shicai's, I could not find the difference, could you please point it out more specifically? e.g., the name of the difference layer. Thanks very much!

Why use 1*1 convolution layer instead of full connect layer in SENet？For the less parameter？

SENet out of GPU memory when trying to fine-tuning.

Hi,

I modified the deploy prototxt to fine-tune SENet. However, even with a barchsize of 1, i still got out of memory error. Please give some help.
Here are the steps that i used to generate train_val.prototxt:

add ImageData layers: mean_value 104, 117, 123, cropsize 224
add lr_mult and decay_mult for all conv layers and scale layers
remove use_global_stats for all BatchNorm layers
add solver.prototxt

Then i train it with "caffe train --solver=solver.prototxt --gpu=all --weights=SENet.caffemodel". But no matter how i modify the prototxt, i still got the memory error.

Caffe: master branch
CuDNN: v7.0

GTX 1080ti x 1 , memory shortage

Hello. Thank you for sharing fantastic SENet model.

I tried to train with my 1080ti. 11Gb ram.

I succeeded to train SE resnet 101 (train batch 5).

But I failed to train SENet , or SE resnet 152 even though I set the train_batch to 1.

I used bvlc caffe and patched with this repo.

11Gb GPU memory is not enough to train SENet??

Thank you.

Obtaining SENet model

Sir, would you please to upload your SENet model to the Baidu Cloud cause we can not download them from the GoogleDrive. Thank you very much in advance.

Additional data augmentation needed for best result?

Hi,

Thanks for the good work and open source code!

In this repo you mention the augmentation methods you use, which includes aspect ratio / rotation / jittering that are not usually used in benchmarking the models (e.g. ResNet / ResNeXt). Did you use these additional augmentation methods to get the reported results in the repo (and the SENet entry in Table 3. of the paper?) I am confused because it seems that you didn't mention them in the implementation section of the paper when you describe the data augmentation methods.

Thanks!

Best,
Hongyi

Does SE-Net work in other network architectures or in detection tasks?

Does SE-Net work in other network architectures like VGG, Alexnet or some other networks besides GoogLeNet Inception and ResNet?

Besides, does SE module work on detection frameworks or models like SSD, Faster RCNN, etc? I have add the Squeeze and Excitation module in SSD ( aiming to strengthen the 6 features), but it doesn't seem to work.

Looking forward to your suggestions, thanks so much.

Can not find fc-layer in SE blocks

Hi Hujie,

First of all thank you open your code,In your introduction,you use two fc-layer in SE block,but in prototxt,I find that you use two conv-layer with 1*1 kernel size. Is this a improvement? If yes ,is this improved performance to compared with the fc-layer.

Please help.

Thanks,
Totoro

top-1 accuracy is very low

Hi
I test the SE-BN-Inception and use the pre-trained caffemodel, but the accuracy is 0.00018 on Imagenet2012.
May I ask why this could happen, is it a problem with my dataset?
Thanks.

About global average pooling layer

When use SENet to FCN, what does the global average pooling layer behave?

About axpy_layer.cu

Hi，thanks for your shared code.

48 for (int i = blockDim.x / 2; i > 0; i >>= 1) {
49 if (tid < i) {
50 buffer[threadIdx.x] += buffer[threadIdx.x + i];
51 }
52 __syncthreads();
53 }

Sorry, can you explain the logic behind lines from 48 to 53 in axpy_layer.cu? In my opinion, these piece of code should be commented out.

The effect of lr_mult and decay_mult on accuracy

Excuse me,what would be the difference of accuracy if I didn't add param{lr_mult, and decay_mult} when training? What is the default value(lr_mult, and decay_mult)for Caffe?

Preprocessing Details

Hi Jie,

Thanks a lot for sharing the models! Could you also share some details on preprocessing / augmentation so it's easier for others to reproduce the results?

The loss is always 6.9 when I train the default SE-BN-Inception

Hi
There is a strange thing when I train the default SE-BN-Inception,The loss is always 6.9 when I didn't use the pretrained model to train on Imagenet2012(even at iteration 0,the loss is 6.9).And if I use the pretrained model the loss is begin to fall from 9.6.
And I add
weight_filler{
type: "msra"
}
in each convolotion layer(didn't use the pretrained model) ,and the loss can be reduced from 6.9(also 6.9 at iteration 0).
May I ask why this would happen?

HOW to set the caffe.proto?

i have add all layers,and make success,but
Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ......WindowData)
*** Check failure stack trace: ***
so HOW to set the caffe.proto?

caffe_proto file

can you release the modified caffe.proto?

Training reproduce issue

I tried to reproduce the training from scratch, but the accuracy is 5 point lower on Inception network. Can you share the solver file, e.g. number of iterations, learning rate policy, or any more details of the training?

Is the architecture for the best performing model SENet* described anywhere?

Is this a new custom architecture you came up with, or is it a variant of ResNext or Inception?

Thanks.

how to add axpy layer to caffe

About SE-ResNeXt-101 (32 x 4d)

Hi @hujie-frank ,
Have you runned SE-ResNeXt-101 (32 x 4d) on caffe before? No matter how many GPUs I use, the memory problem is serious. SE-ResNeXt-50 is Ok.

Augmentation

Thanks for the wonderful job! @hujie-frank
Could you please share the code that implement the augmentation,such as Aspect Ratio ,Random Rotation, and Pixel Jitter.
Thank you in advance!

有没有train_val.prototxt吗？

你好，非常喜欢你的这个创新，能共享一下训练的train_val.prototxt吗？
谢谢！

mean value and use_global_stats

Hi Frank,

Good job! I have two puzzles to check with you. :)

The BatchNorm layer has "use_global_stats: true", which means the prototxt is the deploy instead of the train prototxt? Cuase if my understanding is correct, the use_global_stats is set to be FALSE during the training process of BN.
You comment the mean value in the prototxt. Does it mean that we need to minus the mean by myself cause it is possibly the deploy.prototxt?

Thank you!

Best,
Jordan

CUDNN_STATUS_INTERNAL_ERROR

When I train the SE-ResNeXt-101 (32 x 4d), everything is ok. However, I train the SENet, the logs say that "status == CUDNN_STATUS_SUCCESS (4 vs. 0) CUDNN_STATUS_INTERNAL_ERROR". I disable CUDNN in the Makefile and recompile the caffe ,the error disappears. Without CUDNN, training the SENet is slower. Could you please tell me how to debug the error while using the CUDNN to accelerate the training?

Enough memory for training but not enough memory for testing?

Greeting!
I am training SE-Resnet-50, and I have enough GPU memory for training.
But when I am trying to do testing with the same config, I got out of memory error.
I am confused, do SE need more memory in testing phase than training?

About group convolution in SENet

Thanks for sharing your great work, amazing. By the way, do you have some optimization on the group convolution in Caffe? The SENet may suffer memory or speed problems.

train

Solving...
F0616 13:59:44.988385 16022 math_functions.cu:79] Check failed: error == cudaSuccess (74 vs. 0) misaligned address

how to solve the problem?

The detail preprocessing of image classification

About Pooling_layer.cu

Hi，thanks for your shared code.
i have a question in pooling_layer.cu, a part of code is as followed:

......
......
case PoolingParameter_PoolMethod_AVE:
if (this->layer_param_.pooling_param().global_pooling()) {
// NOLINT_NEXT_LINE(whitespace/operators)
GlobalAvePoolForward<<<bottom[0]->count(0, 2), CAFFE_CUDA_NUM_THREADS>>>(
bottom[0]->count(2), bottom_data, top_data);
} else {
// NOLINT_NEXT_LINE(whitespace/operators)
AvePoolForward<<<CAFFE_GET_BLOCKS(count), CAFFE_CUDA_NUM_THREADS>>>(
count, bottom_data, bottom[0]->num(), channels_,
height_, width_, pooled_height_, pooled_width_, kernel_h_,
kernel_w_, stride_h_, stride_w_, pad_h_, pad_w_, top_data);
}
break;
......
......
Sorry，Is it necessary to choose the void of GlobalAvePoolForward ？
And What is the difference between GlobalAvePoolForward and AvePoolForward for the result?
Thank you！

synset words list for your pretrain models

I am trying to perform evaluation of your pretrain models on imagenet. I tested a few images and it seems that you are not following the standard caffe "data/ilsvrc12/synset_words.txt " from "http://dl.caffe.berkeleyvision.org/caffe_ilsvrc12.tar.gz".

Where can i find your synset list?

As shown in Figure 8, how to make the distribution of average activations?

Could you share your code that show the distribution of average activations ?

Thanks in advance.

How do you train your scene model?

I found out that you use pretrained model of scene in https://github.com/lishen-shirley/Places2-CNNs.git.I also use this pre-trained model with the se block. But the accuracy of my model can not exceed that of the origin models without se block. Can you tell me more details about training with places 365 data? Do you add auxiliary loss as mentioned in 《Relay Backpropagation for Effective Learning of Deep Convolutional Neural Networks》？ @hujie-frank

cannot build caffe with axpy_layer (NVIDIA/caffe)

Hi,

first off all, congratulations to the great result on ImageNet!

I want to try your architecture in my Master Thesis,
where I try to distinguish action forces from regular pedestrians based on their appearance in my own dataset.

Here is what I did:

I added your provided files to my NVIDIA caffe flavour from
https://github.com/NVIDIA/caffe
in
src/caffe/layers
and
include/caffe/layers
respectively.

Then I ran "make clean".
But when I want to build caffe with "make all -j16" I get the following build error?

In file included from src/caffe/layers/axpy_layer.cpp:8:0:
./include/caffe/layers/axpy_layer.hpp:25:36: error: wrong number of template arguments (1, should be 2)

What do I also need to change in order for a successful build?

cannot find arxiv paper from the given link

trian error

Hi,when i run the train the codes,I have put the corresponding include and cpp files in the caffe file, but still encounter the following error and i don't konw how to solver this problem, can you help me,thanks
] Check failed: registry.count(type) == 1 (0 vs. 1) Unknown layer type: Axpy (known types: AbsVal, Accuracy, ArgMax, BNLL, BatchNorm, BatchReindex, Bias, Concat, ContrastiveLoss, Convolution, Data, Deconvolution, Dropout, DummyData, ELU, Eltwise, Embed, EuclideanLoss, Exp, Filter, Flatten, HDF5Data, HDF5Output, HingeLoss, Im2col, ImageData, InfogainLoss, InnerProduct, LRN, Log, MVN, MemoryData, MultinomialLogisticLoss, PReLU, Pooling, Power, Python, ROIPooling, ReLU, Reduction, Reshape, SPP, Scale, Sigmoid, SigmoidCrossEntropyLoss, Silence, Slice, SmoothL1Loss, Softmax, SoftmaxWithLoss, Split, TanH, Threshold, Tile, WindowData)
*** Check failure stack trace: ***

What is the "Scale" ?

I'm implementing your network in tensorflow, but I do not know exactly what the scale is.

Can you explain ?
thank you

SE-VGG

hi，how to squeeze and excitation VGG networks？Any suggestions?

convergence

Thank you for your share!

wrong result in python wrapper?

Hi, I downloaded the code and compiled successfully, but when I test the SENET, the results were wrong.

I use pywrapper and I haven't tried cmd.


import matplotlib; matplotlib.use('agg')
%matplotlib inline
import sys
import os
cafferoot = '/home/liaofangzhou/caffes/caffe'
sys.path.append(os.path.join(cafferoot,'python'))

import caffe
import numpy as np
from matplotlib import pyplot as plt
import pandas

net = caffe.Net('/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/model_def/SENet.prototxt', 
                         '/home/liaofangzhou/adv_bak/liaofz/toolkit/sample_defenses/caffe_senet/pretrained_model/SENet.caffemodel',
                          caffe.TEST)


# load the mean ImageNet image (as distributed with Caffe) for subtraction
mu = np.array([104, 117, 123])
print 'mean-subtracted values:', zip('BGR', mu)

# create transformer for the input called 'data'
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})

transformer.set_transpose('data', (2,0,1))  # move image channels to outermost dimension
transformer.set_mean('data', mu)            # subtract the dataset-mean value in each channel
transformer.set_raw_scale('data', 255)      # rescale from [0, 1] to [0, 255]
transformer.set_channel_swap('data', (2,1,0))  # swap channels from RGB to BGR

image = caffe.io.load_image(os.path.join(cafferoot,'examples/images/cat.jpg'))
im_input = transformer.preprocess('data', image)
net.blobs['data'].data[:] = im_input
output = net.forward()
output_prob = output['prob'][0]  # the output probability vector for the first image in the batch
print(np.argmax(output_prob))

and the result turned out to be 7(cock), but the truth should be 281(cat). And the output probability seems to be quite confident. Why is that?

Thank you in advance!

Finetuning from ../model/SE-ResNeXt-101.caffemodel

I0929 12:02:35.547410 5970 caffe.cpp:155] Finetuning from ../model/SE-ResNeXt-101.caffemodel
I0929 12:02:35.856974 5970 net.cpp:761] Ignoring source layer label_data_1_split
I0929 12:02:35.857090 5970 net.cpp:761] Ignoring source layer conv2_1_1x1_increase/bn_conv2_1_1x1_increase/bn_0_split
I0929 12:02:35.857216 5970 net.cpp:761] Ignoring source layer conv2_2_1x1_increase/bn_conv2_2_1x1_increase/bn_0_split
I0929 12:02:35.857317 5970 net.cpp:761] Ignoring source layer conv2_3_1x1_increase/bn_conv2_3_1x1_increase/bn_0_split
I0929 12:02:35.857544 5970 net.cpp:761] Ignoring source layer conv3_1_1x1_increase/bn_conv3_1_1x1_increase/bn_0_split
I0929 12:02:35.857978 5970 net.cpp:761] Ignoring source layer conv3_2_1x1_increase/bn_conv3_2_1x1_increase/bn_0_split
I0929 12:02:35.858289 5970 net.cpp:761] Ignoring source layer conv3_3_1x1_increase/bn_conv3_3_1x1_increase/bn_0_split
I0929 12:02:35.858595 5970 net.cpp:761] Ignoring source layer conv3_4_1x1_increase/bn_conv3_4_1x1_increase/bn_0_split

what is the "bn_0_split"?

any pretrained models for mxnet?

any pretrained models for mxnet?
i try to convert the caffe model to mxnet model
but there is a new layer,do you have any pretrain model for mxent

Difference between SE-ResNeXt-101 and SENet

Hi Hujie,

I wonder what is the architectural difference between SE-ResNeXt-101 and SENet, I have OOM issue with SENet and it almost require 7G to initialize the network, but SE-ResNeXt-101 only need <3G.
Why there is hug memory usage difference?

Please help.

Thanks,
Ruxiao