szq0214 / dsod Goto Github PK

View Code? Open in Web Editor NEW

701.0 701.0 210.0 52 KB

DSOD: Learning Deeply Supervised Object Detectors from Scratch. In ICCV 2017.

License: Other

Python 100.00%

from-scratch object-detection

dsod's People

Contributors

Stargazers

Watchers

Forkers

ml-lab firestonelib zgsxwsdxg issac8huxley tangyoubao stevenlol caibing1872 suzhenghang wanjinchang bityangke wwwanghao sophiezhou sungjinlees 10183308 benjamesbabala liuguoyou labimage peterouzh johnson-yue yxlijun shiyongde tmono keyky cclauss cmxnono xuguozhi farmingyard xjsxujingsong wangzhe0623 yifanh liangxi627 zhangxujinsh bygreencn giorking coocoky starstylesky jiangwqcooler leo-zhou chunyanlian yang-fei michaelaiq kujin66 hualitlc leochencipher xuqy1981 bauerzhou abrams90 simonsleo gherao ronnie-tian vanillaxm boosting facegen bikong2 jae-hyun huangbinyang 646677064 samsmith1995 wellingtonyl balancewing crnsmile sunkaianna youngbaby123 lgen intellige yangzhibo0212 rotorliu vbillys liu-hai-yang cjsure hesitationer 0xqq weiliangxiao frankatmech allensmile zhdai stoneyang-detection trantorrepository fenoms xwang0415 liuguiyangnwpu issxjl2015 qingsong99 asherchan alexliyang duanlh zhangkaij rootopia lxj0276 xtanitfy likeucode xzf125244170 dreadlord1984 dxqjean 3sunny walkoncross zhuzerok arasharchor inkimage jebtang

dsod's Issues

the last pooling size mismatch

the last pooling size is not inconsistent with paper.
in the paper:
the size of the pooling7 feature map = 1*1
in model_libs.py:
the size of the pooling7 feature map = 2*2

Also, DSOD prediction layers differ from the figure 1 in paper.

About the test GPU memory？

@szq0214 @liuzhuang13 When I'm trying test the video，the DSOD300 Occupy GPU memory is 1500M ，just like SSD_300_ResNet101 . I have tried the official optimized version of the densenet.，GPU memory footprint is not particularly serious，Is this a problem with version optimization?

What is your trick to overcome the GPU memory constraints?

I can train with only 6 batch size on my single TITAN X (Pascal) without "out of memory". So what is your trick to overcome the GPU memory constraints in the paper?
Thank you~~

Can not detect the small object?

Hi,
thank you for your job.It is great.

Some question:
1,the pretrained model can not detect the small object .
2. is it great than the RON?

thank you

Reproduced result is lower than the reported result

Hi,

I tried to train DSOD300 using this file on 8 TITAN Xp GPU.

But the result is 76.94% which is lower than the reported result(77.7%).

I couldn't find out why this problem occurs.

Could anyone face this problem?

Did this model have MXNet version?

Do some people implement other version, eg. mxnet?
Thanks

Did the DSOD model code have tensorflow version?

I want to use TensorFlow to implement this DSOS, I try to write, but some mistakes. So are there any TensorFlow versions？
Thanks

Question about pooling layer in your DSOD300

Thx for your sharing code.And I want to make a re-implementation of this net with other framwork.But the definition of pooling layer is different from which in caffe.

In caffe,I think the funtion of size of output is a ceil function as shown in most of your code.But in the final,I don't know why it become a floor function.

I mean that the process should be

300x300→150x150→75x75→38x38→19x19→10x10→5x5→3x3→2x2

But in your code,

model2 = add_bl_layer2(model1, 256, dropout, 1) # pooling4: 10x10
net.Third = model2
model3 = add_bl_layer2(model2, 128, dropout, 1) # pooling5: 5x5
net.Fourth = model3
model4 = add_bl_layer2(model3, 128, dropout, 1) # pooling6: 3x3
net.Fifth = model4
model5 = add_bl_layer2(model4, 128, dropout, 1) # pooling7: 1x1

I don't know why 3x3→1x1.Could you give me some suggestion?

Transition w/o Pooling Layer size mismatch

It seems your model graph is inconsistent with the paper (Table1 Output Size) for the Transition w/o Pooling Layer (1+2)
In the paper:
Transition w/o Pooling Layer (1) channel = 1120
Transition w/o Pooling Layer (2) channel = 1568
In the model graph:
Convolution49 num output = 1184
Convolution66 num output = 256

Also, I don't quite understand of the purpose of Transition w/o Pooling Layer (1), you don't actually compress nor expand its filter number (num input = num output), and you don't branch it out for prediction. By removing it (Convolution49 + BN 50 + ReLU50) you would have a compact Dense Block (3+4) with 8 x 2 = 16 dense layers. So what's the reason to explicitly inject such extra (BN+ReLU+1x1Conv) block in between?

No such File or Directory - Core Dumped

I am getting the following message when running train command

python examples/dsod/DSOD300_pascal.py

I0312 18:13:06.186707 31109 layer_factory.hpp:77] Creating layer data
I0312 18:13:06.186813 31109 net.cpp:100] Creating Layer data
I0312 18:13:06.186830 31109 net.cpp:408] data -> data
I0312 18:13:06.186846 31109 net.cpp:408] data -> label
F0312 18:13:06.189031 31210 db_lmdb.hpp:15] Check failed: mdb_status == 0 (2 vs. 0) No such file or directory
*** Check failure stack trace: ***
@ 0x7f8d26f785cd google::LogMessage::Fail()
@ 0x7f8d26f7a433 google::LogMessage::SendToLog()
@ 0x7f8d26f7815b google::LogMessage::Flush()
@ 0x7f8d26f7ae1e google::LogMessageFatal::~LogMessageFatal()
@ 0x7f8d2784b770 caffe::db::LMDB::Open()
@ 0x7f8d2768a396 caffe::DataReader<>::Body::InternalThreadEntry()
@ 0x7f8d2767c465 caffe::InternalThread::entry()
@ 0x7f8d1cc4f5d5 (unknown)
@ 0x7f8d159ee6ba start_thread
@ 0x7f8d25fcf3dd clone
@ (nil) (unknown)
Aborted (core dumped)

Any help would be appreciated. All steps before running the training command were successful.

what is the difference between DSOD300_pascal and DSOD300_pascal++?

@szq0214
what is the difference between DSOD300_pascal and DSOD300_pascal++?

Why not public caffe train_val.prototxt?

Video Detection

Hey guys!

is it also possible to do Video Detection with this model, like in the SSD which is implemented by Wei Liu?

Best Wishes

Have you tried large input images (600 * 1000)?

First of all, the idea of training from scratch is awesome. I have a question. Have you tried large input images (600 * 1000)?

cannot agree faster-rcnn cannot converge when training from scratch

Or do you mean when training from scratch , performance is much worse than fine-tuning ?

And tks for sharing the code.

The net that python script created is diffrent from any pre-trained model offered in this README.md?

Hi, I was trying to fine tune a pre-trained model with my dataset and I need to change number of classes from 21 to 2. So I planed to modify the python script instead of making it in prototxt files. But I found the model python script created is about "28.6M", which is different from anyone offered in this repository. If I want to train a model with 2 classes, I should train it without pre-trained model?

Many thanks!

Any Dockerfile?

Could you provide a Dockerfile? Does anyone have a Dockerfile?

No module named model_libs

When I run DSOD300_pascal.py, there is an error: No module named model_libs? So, what can I do?

How to prepare voc12 test lmdb

I want to know how to prepare voc12 test lmdb to run training on the voc07++12 dataset. Anyone can help me? thanks a lot.

run detection test with one input image

Any python script to run detection test with one input image using one of your pretrained models?

Thanks,

About 1 Channel images

Hi,
I would like to learn from my own dataset composed of only gray level images. Could you tell me how I could adapt DSOD to work using only 1 channel. Thanks !!!

training on new dataset

Hello,

First of all I want to say thank you for releasing the code.
Can you please tell me if I can train with my own custom dataset?
Because it is not clear to me.

Thank you

Check failed: error == cudaSuccess (2 vs. 0) out of memory.How to deal with it?

I find the DSOD300_pascal.py can't breakpoint，When I stop running, it can not find the latest snapshoot

@szq0214 @liuzhuang13

Couldn't find any detections

When I run python DSOD300_pascal.py, I get many information like: I0809 19:00:06.018213 8332 detection_output_layer.cu:113] Couldn't find any detections.
What should I do?

Have you tried DSOD512?

Hi,

Recently, I tried to train DSOD512 version which follows origin-SSD512 settings except for backbone dsod.

But, the accuracy was not good as dsod300.

Have you tried to train dsod512?

Thanks :)

Why the speed of DSOD is too slow in the process of training?

@szq0214
Hi!
I trained the DSOD by using my datasets,but time is 10 times the time of the SSD.
Why DSOD is so slow?

no advantage with VGG-ssd ?

SSD300S† 07+12 ✗ VGGNet Plain 46 26.3M 300 ×300 69.6
SSD300S† 07+12 ✗ VGGNet Dense 37 26.0M 300 ×300 70.4

in the table 4 of your paper, Dense-ssd seems to be no advantage with VGG-ssd. similar precision but slower

small batchsize has a lower mAP.

Hi, @szq0214:
I only have two GTX 1080 GPUs. I want to reproduce you GRP-DSOD. When I change the batch_size and accum_batch_size to 6 and 30, the mAP is just 63%. What I should do to get the results as you paper?
Thanks.

NameError ：name ‘DSOD300_V3_Body’ is not defined. How to deal with it?

When I‘m training a DSOD model on VOC 07+12 by python examples/dsod/DSOD300_pascal.py，I encounter

Traceback (most recent call last):
File “examples/dsod/DSOD300_pascal.py”, line 380, in
DSOD300_V3_Body(net, from_layer=‘data’)
NameError: name ‘DSOD300_V3_Body’ is not defined

What should I do to deal with it? Thank you~

mean_values_.size() == 1 || mean_values_.size() == img_channels Specify either 1 mean_value or as many as channels: 1

I use my own data,but it reports that check failed :mean_values_.size() == 1 || mean_values_.size() == img_channels Specify either 1 mean_value or as many as channels: 1
Could you help me ?

The training time

How long is your training time based on one TitanX GPU or 8 GPUs?

How to get VOC12 annotations?

Just wonder how did you produce
VOC0712Plus_test_lmdb
We could download images for official website but not VOC12 annotations.
How did you compute the VOC12 mAP outside the evaluation platform without annotation?

DSOD Visualization Problem on Video Test

I have download the DSOD_voc+coco model and modify the corresponding prototxt according to the video test in SSD project. While it works well in SSD project, the test failed when setting up the DSOD network, throwing the following error:

F0922 17:37:27.110465 13992 bbox_util.cpp:2197] Check failed: label < colors.size() (2 vs. 0)
*** Check failure stack trace: ***
    @     0x7f6b48c805cd  google::LogMessage::Fail()
    @     0x7f6b48c82433  google::LogMessage::SendToLog()
    @     0x7f6b48c8015b  google::LogMessage::Flush()
    @     0x7f6b48c82e1e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f6b493cb414  caffe::VisualizeBBox<>()
    @     0x7f6b49765844  caffe::DetectionOutputLayer<>::Forward_gpu()
    @     0x7f6b494689e1  caffe::Net<>::ForwardFromTo()
    @     0x7f6b49468ad7  caffe::Net<>::Forward()
    @           0x4199a3  test()
    @           0x415aa5  main
    @     0x7f6b47688830  __libc_start_main
    @           0x416679  _start
    @              (nil)  (unknown)

And here is the modified part in DSOD prototxt: (I mainly modify the input layer and detection output layer according to the SSD settings)
The input layer is:

layer {
  name: "data"
  type: "VideoData"
  top: "data"
  transform_param {
    mean_value: 104
    mean_value: 117
    mean_value: 123
    resize_param {
      prob: 1
      resize_mode: WARP
      height: 300
      width: 300
      interp_mode: LINEAR
    }
  }
  data_param {
    batch_size: 1
  }
  video_data_param {
    video_type: VIDEO
    video_file: "examples/videos/ILSVRC2015_train_00755001.mp4"
    skip_frames: 1
  }
}

And the detection layer is:

layer {
  name: "detection_out"
  type: "DetectionOutput"
  bottom: "mbox_loc"
  bottom: "mbox_conf_flatten"
  bottom: "mbox_priorbox"
  bottom: "data"
  top: "detection_out"
  include {
    phase: TEST
  }
  transform_param {
    mean_value: 104
    mean_value: 117
    mean_value: 123
    resize_param {
      prob: 1
      resize_mode: WARP
      height: 576
      width: 1024
      interp_mode: LINEAR
    }
  }
  detection_output_param {
    num_classes: 21
    share_location: true
    background_label_id: 0
    nms_param {
      nms_threshold: 0.449999988079
      top_k: 400
    }
    save_output_param {
      output_directory: "data/VOC0712/dsod_labelmap_voc.prototxt"
}
    code_type: CENTER_SIZE
    keep_top_k: 200
    confidence_threshold: 0.00999999977648
    visualize: true
    visualize_threshold: 0.3
  }
}

interestingly, when I close the visualize process by setting visualize: false, the network could work well but I can't tell if the result is right without visualize video. I wonder if anyone met the same problem like this and how do you deal with it?

How to measure the inference time?

Hi,

I want to know how to measure the inference time?

Did you use caffe time operator ? or Did you measure full time when VOC 4952 test images are tested ?

Thanks in advance :)

about how to batch test many image files？

I copy to my all test files to a path , and i want to batch test this image files,and to get the annotations of test images files ?Could you tell me a method？

grp-dsod pretrained models

May be you can supported grp-dsod pretrained mdoles like dsod-300

Test on VOC2012

Hi, @szq0214. Sorry for bothering you again. Can you tell me what I should change to test on VOC2012, the default is 2007.

How long does model take when I have 20 000 images, 1 class with Quadro P4000

Hi everyone,
How long does DSOD take when I have 20 000 images for 1 class
My GPU is Quadro P4000, Computational Capacity = 6.1

Sixth_norm_mbox_priorbox step parameter is wrong

Hi, with your changes to the SSD model, the last layer has 2x2 spatial size, not 1x1 anymore. This stems from the fact that the last 3×3×128 conv layer has padding 1 and also the parallel pooling branch, having kernel size 2, will output a 2x2 feature, instead of 1x1. You can double check this by reading Caffe's code of conv_layer.cpp:

const int output_dim = (input_dim + 2 * pad_data[i] - kernel_extent) / stride_data[i] + 1;

output_dim = (3 + 2 * 1 - 3) / 2 + 1 = 2 / 2 + 1 = 1 + 1 = 2

Also, Caffe's output reflects this:

I1023 13:46:00.587738    43 net.cpp:100] Creating Layer Sixth
I1023 13:46:00.587746    43 net.cpp:434] Sixth <- Convolution77
I1023 13:46:00.587751    43 net.cpp:434] Sixth <- Convolution79
I1023 13:46:00.587757    43 net.cpp:408] Sixth -> Sixth
I1023 13:46:00.587786    43 net.cpp:150] Setting up Sixth
I1023 13:46:00.587792    43 net.cpp:157] Top shape: 2 256 2 2 (2048)

Given this, I think the step size in the Sixth_norm_mbox_priorbox should be 150 (= 300/2) instead of 300 (=300/1).

EDIT: I should also point out that I have made NO modification whatsoever to the source code.

Couldn't find any detections

I am getting the following message when running train command
python examples/dsod/DSOD300_pascal.py

32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections
32554 detection_output_layer.cu:113] Couldn't find any detections

The result after relu activation function isn't used in grp-dsod.

Thx for your sharing code of grp-dsod.I read the code and I find that result after relu function isn't used in this part.

def global_level(net, from_layer, relu_name):
    fc = L.InnerProduct(net[relu_name], num_output=1)
    sigmoid = L.Sigmoid(fc, in_place=True)
    att_name = "{}_att".format(from_layer)
    sigmoid = L.Reshape(sigmoid, reshape_param=dict(shape=dict(dim=[-1])))
    scale = L.Scale(net[att_name], sigmoid, axis=0, bias_term=False, bias_filler=dict(value=0))
    relu = L.ReLU(scale, in_place=True)
    residual = L.Eltwise(net[from_layer], scale)
    gatt_name = "{}_gate".format(from_layer)
    net[gatt_name] = residual
    return net

relu = L.ReLU(scale, in_place=True)
Is it a mistake?Or,is it discarded?

will release the trianing code?

hi @szq0214,
will you release the training code?

thanks.