sfzhang15 / sfd Goto Github PK

View Code? Open in Web Editor NEW

519.0 519.0 134.0 2.07 MB

S³FD: Single Shot Scale-invariant Face Detector, ICCV, 2017

Python 47.71% MATLAB 52.29%

face-detector

sfd's People

Contributors

Stargazers

Watchers

Forkers

kixiang s0302102 elegantgod runauto liuguoyou anguoyang anazou xiufranklin zgsxwsdxg marvis bygreencn hdjsjyl tzhang2014 walkoncross clcarwin laycoding vsooda zhangxujinsh shubhampachori12110095 yaokeepmoving zhaowwenzhong armstrongyang wang-mengjiao oylz chop2 10183308 liuxuvip rkshuai luciagan sinexue ubenz55555 airbernard starstylesky 1784266476 ctgushiwei jjdbear geekrick88 lynnw123 baucheng wolkedu mai00dou bhuwendongchao dreadlord1984 liyuming1978 trantorrepository satchelwu chomolungma sandmanup liubo0902 nirbenz junfengcao shaomang wywywy01 dengshuo lakehui hxl1990 liyuanyaun kaiyang0804 jwmneu yuechengyin pbdahzou afcarl alexliyang alexwinters-i songyaqi ethanyhzhang arasharchor xncaffe wuxiaolianggit sumsuddin banglh dihong blessinglrq yoyokitartora wuyx meitianjinbu cbsr-casia xuhuaze707313 1093842024 sputnikav tsyjwct ieyer amirunpri2018 sunnyln wangyangneu htjacky zavierhan tuanml vincentwei0919 blackabc changerzz xinxin12345 leipang0817 bdalal ryzejiang kk52099 peterzs simon5u yingmuying nehaparakh95

sfd's Issues

Question about data augmentation

Hi,
You mentioned about modifying the data augmentation code of SSD to make sure that it does not change the image ratio. I have some questions about this point; first, during training process each image is resized to 640x640, this itself changes the image size ratio as input images are not all square images as far as I know. Are you doing any trick here to avoid this, or it is true that image ratio changes while resizing ?

Second question, Which code should we change to do this modification in augmentation ? Do we just do the change at the beginning of train.prototxt in the sampler part and specify:
min_aspect_ratio: 1.0
max_aspect_ratio: 1.0

or we need to change any C++ function in Caffe ?

Thanks in advance.

About training

I have trained the SFD code with/without modifying sample.cpp, and with/without modifying bbox_util.cpp, but they are all failed. The loss reduced to 1.5~2, however, the mAP on Pascal face is about 0.55. What may lead to these?

FDDB lower detection accuracy

Hi, zhang
I use your caffe model test on the FDDB datasets, got DiscROC=95.0 detection accuracy rate at the FalsePositive=400,
but you give a 98.3 on the paper, can you give a reason?

Training code release

Hi, is it possible to release the training source code. The points 2 and 3 in the README are not very precise, and I am not sure of implementing the code exactly as was done in your paper.
It would make it much easier to compare and modify your SOTA model.

bounding box exceeds image boundary

Hi Shifeng, thank you for the great work on SFD. While I am creating the lmdb for training images, some bounding boxes exceeds the image boundary.

W0416 18:42:23.386387 25345 io.cpp:330] /home/user/Annotations/500017.xml bounding box exceeds image boundary.
W0416 18:42:23.407441 25345 io.cpp:330] /home/user/Annotations/500019.xml bounding box exceeds image boundary.
W0416 18:42:23.433470 25345 io.cpp:330] /home/user/Annotations/500021.xml bounding box exceeds image boundary.
etc....

May I know in these cases, do you remove the bounding boxes from lmdb creation? or ignore the warning message. Thank you very much.

8GB memory is too small?

i tried this model on my pc,with 8G ram，it raw usage is full when it goes to
[det2, det3] = multi_scale_test(net, image, max_im_shrink)
how much memory do you use?

pacal faces

@sfzhang15 HI!
能否给一个具体的Pascal face数据集具体的下载地址．您给的链接是Pasacal数据集的链接！
麻烦了！

Trained model precision on WIDERface

Hello, Zhang,
I have trained the model according to the setting in the paper, apart from the same prototxts, there are also modied cpp code, including the sampler.cpp and the MatchBBox function, the only difference is the batch size, my is 24, the GPU is K80. After 150k iterations training, using your testing code. Get the score on the hard-set: 83.
I thought for long time, but can't figure out why it is lower than yours which is 85.9.
Could you give some instructions?
Thanks!

Changes necessary?

Hello together,
I'm running SFD inside a Docker Container. To run wider_test.py I had to make the following changes:
L. 10,11

# import sys
# sys.path.insert(0, 'python')

L. 138

change:
model_weights = caffe_root + '/models/VGGNet/WIDER_FACE/SFD/SFD.caffemodel'
to:
model_weights = caffe_root + '/models/VGGNet/WIDER_FACE/SFD_trained/SFD.caffemodel'

L. 160

change:
Image_Path = Path + im_name[:] + '.jpg'
to:
Image_Path = Path + event[0][0] + '/' + im_name[:] + '.jpg'

Now my questions are:
Did anyone face this issue two? Or did I just catch an old version of wider_test.py?
Why did I have to deactivate the part with "import sys"?

Any help is appreciated. Thanks in advance!

Evaluating of FDDB

Hello. I am trying to evaluate the code on the datasets proposed, but the evaluation on FDDB gives errors like "Line #33380 (got 6 columns instead of 1)" even after changing the format using the matlab script provided.

Do you have any idea why is this happening and how can it be solved?

Thank you!

some questions about ur "Max-out background label"

i m confused of ur Max-out background label in the paper, where you mean randomly assign each pixel of background different label or according to the type of different pixels, such as tree, bicycle and so on ?

about eval_tools for widerface

@sfzhang15
what does "gt_list" in "wider_easy_val.mat" mean?

Where are codes that description、 the stage 2 in anchor matching strategy that reflect in the code

Unable to merge SFD with official caffe

Hi, I am unable to merge SFD(SSD) with the original Caffe. Do you happen to know any way that I can do that? Thanks

关于faceboxes

你好，关于FaceBoxes有几个问题请教下：

论文中在RDCL块都用了bn层，那在MSCL块是否也加了bn层
2.在Inception3后的21个感受野，是不是只要设置3个anchor（32,64,128）就可以了，还有其他操作没

question about multi-scale testing

hello,
when you apply multi-scale testing, how do you change the anchor setting , for example, the input size is twice as the size of training, I double the size of feature maps (generating anchors), to ensure the number of anchors is equal to the number of channels of the network, however, i have tried both remaining the setting of ''min_size'' and double the value of "min_size" , they all do not work well(map drops a lot),

Can't find the codes that related to stage2 of anchor matching strategy

Hi@sfzhang15, thanks for your amazing contribution! In section 3.2, stage 2, this paper says "firstly picking
out anchors whose jaccard overlap with this face are higher than 0.1, then sorting them to select top-N as
matched anchors of this face". But I didn't find the corresponding code, could you please tell me how to implement the stage2 with code？

About max_im_shrink in wider test (Cuda memory error)

Hello, while trying to reproduce the results for WIDER FACE I got CUDA memory error.
Exploring a little, I discovered that it is related to the maximum size of the image that Caffe supports.

Currently, the test script for wider sets the shrinking parameter like this:

 max_im_shrink = (0x7fffffff / 577.0 / (image.shape[0] * image.shape[1])) ** 0.5

I can't find anywhere in Caffe documentation how to get the maximum image size, could you explain what those numbers means? I could run the experiment in my GPU by dramatically reducing max_im_shrink but I am not sure how it will affect the final scores if the images are that small.

As a side note, the experiment failed on both 8GB (GTX 1080) and 16 GB (Tesla V100-SXM2) GPUs

Kind regards

论文代码

error == cudaSuccess(2 vs 0) out of memory

When I run the wider_test.py, it meet the problem of out of memory, but I use single picture to run,it works.

The different result

Why are the results I run your model smaller about 50 pixels than your results (I used your provided model parameters).

Train for Person Detection

Dear @sfzhang15,
Thank you for your fantastic work. Have you ever trained your model for person detection?

The performance of the pretrained model you provided is somewhat different from?

Hi, i tested the pretrained model you provided, and found that the performance is somewhat different from shown on the official website? is this model the best of your algorithm?
The performance i got is 93.8/ 92.5/ 84.8 ...

How about the training speed of SFD?

Simply add conv3_3 will makes the training speed drop significant, could you explain what's the solution about this issue?

尺寸大小问题

非常感谢您分享的成果。在研究过程中遇到了一些问题需要请教一下。我没有修改代码直接进行训练的，从测试的效果来看还不错。在研究代码的过程中我发现一个问题，代码中将网络reshape到了输入图像的大小，而不是将图像resize到网络输入的大小。我对两种情况进行了对比测试，发现效果差不多，但后者速度是前者的5倍左右。愿闻其详，谢谢！

The model_libs.py problem

Hello, Shifeng, sorry abut disturbing you again.

When I check the train.prototxt again, I find the dilation param of your fc6 conv layer is 1, i.e. delation:1.
But when I checking the model_libs.py/VGGNetBody, according to your setting, I find that the dilation = dilation * 3 ==3,
So, the doubt is： have you changed the formulation to dilation = dilation * 1 ?

Thanks very much.

Training instructions

In the README you state:

Modify the anchor match code of SSD to implement the 'scale compensation anchor matching strategy'.

Can you elaborate on how to do this?

FDDB test

@sfzhang15 HI

关于FDDB的结果，用您的模型得到的结果如下：
1表示：新标签，没转换椭圆，DiscROC=0.982252，ContROC=0.75709
2表示：旧标签，没转换椭圆，DiscROC=0.979888，ContROC=0.754134
3表示：新标签，已转换椭圆，DiscROC=0.982437，ContROC=0.855419
4表示：旧标签，已转换椭圆，DiscROC=0.979888，ContROC=0.848032
以上结果均基于 FalsePositive=1000

在#5 中，关于FDDB的结果也给出了详细的讨论，

有1个问题：
网络一般输出的都是rectangle box，是不是在递交检测结果之前，都需要将网络的rectangle box转换为ellipse box？？

Inference phase

@sfzhang15 您好，

在测试时，是按照paper中的参数：

we first filter out most boxes by a confidence threshold of 0.05 and keep the top 400 boxes before applying NMS, then we perform NMS with jaccard overlap of 0.3 and keep the top 200 boxes.

还是直接按照deploy.prototxt中的参数：

layer {
  name: "detection_out"
  type: "DetectionOutput"
  bottom: "mbox_loc"
  bottom: "mbox_conf_flatten"
  bottom: "mbox_priorbox"
  top: "detection_out"
  include {
    phase: TEST
  }
  detection_output_param {
    num_classes: 2
    share_location: true
    background_label_id: 0
    nms_param {
      nms_threshold: **0.3**
      top_k: **5000**
    }
    code_type: CENTER_SIZE
    keep_top_k: **750**
    confidence_threshold: **0.05**
  }
}

貌似paper中的参数（设置的很小（500/200））为了加速用的

max-out background label

Hi!
In test phase，do you apply the max-out background label for the conv3 3 detection layer?thank you!

when trained the model

hi zhang

i trained the model use your train.prototxt on original ssd, there is no bug, but you said :

Modify the data augmentation code of SSD to make sure that it does not change the image ratio.

Modify the anchor match code of SSD to implement the 'scale compensation anchor matching strategy'.

how to modify the code,can you give your code modified based on ssd?

Bug in SFD_trained/deploy.prototxt

There seems to be a small bug in the trained model's prototxt file.

On SFD_trained/deploy.prototxt line 498, convolution layer fc6 is configured to use padding: 3, which is different than SFD/deploy.prototxt:

layer {
  name: "fc6"
  ...
  convolution_param {
    num_output: 1024
    pad: 3    ### This seems wrong
    kernel_size: 3
  ...

and

$ diff SFD/deploy.prototxt SFD_trained/deploy.prototxt
498c498
<     pad: 1
---
>     pad: 3

This would also cause the feature map dimensions to expand after fc6, which is also different than what the paper would suggest.

balance the loss of classification and regression

  multibox_loss_param {
    loc_loss_type: SMOOTH_L1
    conf_loss_type: SOFTMAX
    loc_weight: 1.0
    num_classes: 2
    share_location: true
    match_type: PER_PREDICTION
    overlap_threshold: 0.35
    use_prior_for_matching: true
    background_label_id: 0
    use_difficult_gt: true
    neg_pos_ratio: 3.0
    neg_overlap: 0.35
    code_type: CENTER_SIZE
    ignore_cross_boundary_bbox: false
    mining_type: MAX_NEGATIVE
  }
}

In the paper, "lambda= 4 to balance the loss of classification and regression." May I know where do you set lambda = 4? do we set loc_weight: 1.0 to 0.25?
Or modify conf in multibox_loss_layer.cpp?
Thank you~

FaceBoxes

Hi @sfzhang15, sorry but can I ask questions about your FaceBoxes work here? Else where else can I contact you? Thank you!

questions about the anchor matching

hi,
I am now reproducing the paper, and now confused about the section 3.2. Scale compensation anchor matching strategy in your paper.
the paper says that at stage two,

firstly picking out anchors whose jaccard overlap with this face are higher than 0:1, then sorting them to select top-N as matched anchors of this face. We set N as the average number from stage one.

 I think chances are that some anchor may match more that one face bounding boxes, how do you deal with this.
And the other question is the matched anchor at stage two may include the anchors matched  at stage one,  i want to know how you select the top-N matched anchors at stage two.
                                                                                                                           thanks

Some details about ur matching strategy

@sfzhang15 hi, i have implemented the code of data augmentation & maxout in your training process, and just use these(change some codes in SSD), our model can achieve 93.6/ 92.3/ 84.0. But when i implemented anchor matching in function matchbbox(), the performance drops. Can you guide me how to solve this problem?
I just add step 2 after Bipartite matching(step 1 is the source code of ssd, i donot changed it). What surprised me is that i found after step 1, above 10 anchors of each gt faces is chosed (not below 5 as you said). Maybe i should change some code in step 1?

My gpu is TitanX, memory is 12G, but when I test the code in wider, it shows 'out of memory'????

My gpu is TitanX, memory is 12G, but when I test the code in wider, it shows 'out of memory'???? Anyone meets the same problem?

For which reason SSD is needed?

Hello together,

I don't understand why SSD needs to be downloaded to run the S3FD...
can anyone help me out with an explanation?

Thanks in advance!

padding size of fc6

Hi @sfzhang15 ,

I just found that the padding size of fc6 is 3, which will cause the feature map of fc7 to be 24x24 instead of 20x20 (as paper described). I am curious whether this will affect the matching between prior box and ground truth?

Best,

training own data set

Hi, thanks your code.

I am new to ssd, and i want train a model on myself data set which contain many small objects, but the loss is also 6.xx.
when I use the trained model predict the test image, the score is all 0.2xx. I'll appreciate it if you give me some help(I used your settings on 'train.prototxt').
thanks

Logic of choosing N

Hi @sfzhang15 ,
In the top- N selection is it safe to assume to take N as 6? All the issues you have mentioned to take it as 6. So for VGG SFD trained on WIDER Face has this parameter as 6 or is there any other logic for it?

Bounding boxes with negative x-values

Hello together,
while testing the sfd I recognized that some bounding boxes x-values are negative.
Am I the only one facing this issue?
Could the reason be the mapping of the different sized pictures while detection?

Any help appreciated! Thanks in advance. :)

how to get square patch when testing

In the paper, "one is the biggest square patch, and the size of the other four square patches range between [0.3, 1] of the short size of the original image."

When training, you can get square patch, but how to get square patch when testing.
WARP?FIT_SMALL_SIZE?or FIT_LARGE_SIZE_AND_PAD?

Very high loss

Hi,
I tried converting code from caffe to tensorflow. I didn't do the 'subsampling of fc layers to conv', rest all is same your paper implementation.
The loss which I am getting is very high and diverse, in range of 1 to 2000. So just wanted to ask if you faced any high loss values during training.
The resultant is giving 0.2 mAP in widerface_val for 12k iterations (Batch size 1 as GPU is pretty small). Will train more to get better results.

Can not converge

I modified the net struct to be same to your train.prototxt and the modified sampler.cpp and match_bbox.cpp.
Now begin to train, set the lr=0.001 but not converge, always displays NAN. Then change lr to 0.0001. it still NAN. Can you give some instructs? Thank you very much

On computing mAP

This might be a silly question, but could you provide details on how did you compute mAP for WIDER? (Table 3 of the paper). AFAIK one approach could be the proposed in the Pascal Challenge (interpolating precision values from recall values above certain thresholds), but I am not sure if the same methodology was applied in the paper.

Or did you use the eval_tools provided by WIDER?

Kind regards

Change Anchor Boxes Aspect Ratio

Dear @sfzhang15,
As you have mentioned in your paper Our anchors are 1:1 aspect ratio (_i.e.,_ square anchor), the aspect ratios of all anchor boxes are 1:1. However, if one wants to change the aspect ratios of anchor boxes for example, to 1:1.4 (width:height) the network structure must be changed or just the training code & decoding strategy must be changed? Would you please explain how one can change the aspect ratios of anchor boxes?

About fddb_from_rectangle_to_ellipse

Can you explain the parameters in 'Fit.mat' in fddb_from_rectangle_to_ellipse? and how to train the w? Thank you very much

A clear explanation on matching strategy

Thank you Shifeng for your great work and big congratulations on your new CVPR2018 and IJCAI work. I just looked around your website and I'm truly impressed what you did so far. I believe that you are and will be an awesome scientist who has a great impact to our society. Definitely!

Back to the SFD, it is now more than one year since your publication but I still find it useful and one of state-of-art on Face Detection. Though, I'm confused somehow with the matching strategy of SFD. I found your answer from previous issue:

We just directly use IoU threshold = 0.35
choose them all
After step 1, for these GT whose matched anchors are less than 6, we use step 2 to choose its top6 > anchor to match

As my understanding:

One anchor_box can be assigned to only one ground truth label (or 1 face).
After step 1, some faces might have more than or equal to 6, some might have less than 6. But none of faces have the same anchor_box because of the fact #1. At step 1, each anchor_box is assigned to the best match (to the face with the highest iou).
Now we move the step 2, there is some confusion for me here:
To make it simple, for example, just say we need to find N = 4. There is a list of faces which has less than 4 anchor_boxes list_faces = [ face1, face2, face3], and some [available_anchor_boxes = box1, box2, box3, box4, box5]still available (weren't assigned to any faces in step 1). and the number of matched anchor boxes for face1, face2, face3 is [2, 3, 1]. All of them < topN (4).
______|box1| box2 | box3 | box4 | box5|
face1| 0.12 | 0.21 | 0.06 | 0.13 | 0.24 |
face2| 0.24| 0.08 | 0.23 | 0.1 | 0.34 |
face3| 0.33| 0.22 | 0.01 | 0.2 | 0.02 |

My question is:
A. So we have to find topN more anchor boxes for face1, face2, face3 -> so there new # of anchor boxes should be [2 + 4, 3 + 4, 1 + 4] or just find topK = topN - # of already matched anchor boxes, like [2 + 2, 3 + 1, 1 + 3]`.

B. In case of [2 + 2, 3 + 1, 1 + 3], so we will iterate list_faces sequentially from face1 -> face2 -> face3 to find topK ?

C. If B is correct, for face1, we will find the topK = 2, say, [box5, box2]. It means that we will mark box5 and box2 as matched anchors (then they will not be assigned for any other face anymore). However, the best match for box5 is face2, and box2 is face3. After assigning topK anchor boxes for face1, the available boxes now are [box1, box3, box4] If this method is performed, face2 will be assigned with box1. The available boxes now is [box3, box4], and we can just assign box4 for face3 even we need to assign topK = 3.

I know that the probability for this overlapping C case happening is very low when we have the huge number anchor boxes, say here 33125, not 5 boxes like the above example. However, I just want to make sure that your idea is correctly implemented like this or different?

I hope the above example is simple and intuitive enough to point out my confustion and make your idea clearer to us. I would really appreciate if you can review my understanding 1, 2, 3 and my question A, B, C.
Many thanks and have a nice day.

Will share the train code?

Hi, will share the train code?