sfzhang15 / sfd Goto Github PK
View Code? Open in Web Editor NEWS³FD: Single Shot Scale-invariant Face Detector, ICCV, 2017
S³FD: Single Shot Scale-invariant Face Detector, ICCV, 2017
Hi,
You mentioned about modifying the data augmentation code of SSD to make sure that it does not change the image ratio. I have some questions about this point; first, during training process each image is resized to 640x640, this itself changes the image size ratio as input images are not all square images as far as I know. Are you doing any trick here to avoid this, or it is true that image ratio changes while resizing ?
Second question, Which code should we change to do this modification in augmentation ? Do we just do the change at the beginning of train.prototxt in the sampler part and specify:
min_aspect_ratio: 1.0
max_aspect_ratio: 1.0
or we need to change any C++ function in Caffe ?
Thanks in advance.
I have trained the SFD code with/without modifying sample.cpp, and with/without modifying bbox_util.cpp, but they are all failed. The loss reduced to 1.5~2, however, the mAP on Pascal face is about 0.55. What may lead to these?
Hi, is it possible to release the training source code. The points 2 and 3 in the README are not very precise, and I am not sure of implementing the code exactly as was done in your paper.
It would make it much easier to compare and modify your SOTA model.
Hi Shifeng, thank you for the great work on SFD. While I am creating the lmdb for training images, some bounding boxes exceeds the image boundary.
W0416 18:42:23.386387 25345 io.cpp:330] /home/user/Annotations/500017.xml bounding box exceeds image boundary.
W0416 18:42:23.407441 25345 io.cpp:330] /home/user/Annotations/500019.xml bounding box exceeds image boundary.
W0416 18:42:23.433470 25345 io.cpp:330] /home/user/Annotations/500021.xml bounding box exceeds image boundary.
etc....
May I know in these cases, do you remove the bounding boxes from lmdb creation? or ignore the warning message. Thank you very much.
i tried this model on my pc,with 8G ram,it raw usage is full when it goes to
[det2, det3] = multi_scale_test(net, image, max_im_shrink)
how much memory do you use?
@sfzhang15 HI!
能否给一个具体的Pascal face数据集具体的下载地址.您给的链接是Pasacal数据集的链接!
麻烦了!
Hello, Zhang,
I have trained the model according to the setting in the paper, apart from the same prototxts, there are also modied cpp code, including the sampler.cpp and the MatchBBox function, the only difference is the batch size, my is 24, the GPU is K80. After 150k iterations training, using your testing code. Get the score on the hard-set: 83.
I thought for long time, but can't figure out why it is lower than yours which is 85.9.
Could you give some instructions?
Thanks!
Hello together,
I'm running SFD inside a Docker Container. To run wider_test.py I had to make the following changes:
L. 10,11
# import sys
# sys.path.insert(0, 'python')
L. 138
change:
model_weights = caffe_root + '/models/VGGNet/WIDER_FACE/SFD/SFD.caffemodel'
to:
model_weights = caffe_root + '/models/VGGNet/WIDER_FACE/SFD_trained/SFD.caffemodel'
L. 160
change:
Image_Path = Path + im_name[:] + '.jpg'
to:
Image_Path = Path + event[0][0] + '/' + im_name[:] + '.jpg'
Now my questions are:
Did anyone face this issue two? Or did I just catch an old version of wider_test.py?
Why did I have to deactivate the part with "import sys"?
Any help is appreciated. Thanks in advance!
Hello. I am trying to evaluate the code on the datasets proposed, but the evaluation on FDDB gives errors like "Line #33380 (got 6 columns instead of 1)" even after changing the format using the matlab script provided.
Do you have any idea why is this happening and how can it be solved?
Thank you!
i m confused of ur Max-out background label in the paper, where you mean randomly assign each pixel of background different label or according to the type of different pixels, such as tree, bicycle and so on ?
@sfzhang15
what does "gt_list" in "wider_easy_val.mat" mean?
Hi, I am unable to merge SFD(SSD) with the original Caffe. Do you happen to know any way that I can do that? Thanks
你好,关于FaceBoxes有几个问题请教下:
hello,
when you apply multi-scale testing, how do you change the anchor setting , for example, the input size is twice as the size of training, I double the size of feature maps (generating anchors), to ensure the number of anchors is equal to the number of channels of the network, however, i have tried both remaining the setting of ''min_size'' and double the value of "min_size" , they all do not work well(map drops a lot),
Hi@sfzhang15, thanks for your amazing contribution! In section 3.2, stage 2, this paper says "firstly picking
out anchors whose jaccard overlap with this face are higher than 0.1, then sorting them to select top-N as
matched anchors of this face". But I didn't find the corresponding code, could you please tell me how to implement the stage2 with code?
Hello, while trying to reproduce the results for WIDER FACE I got CUDA memory error.
Exploring a little, I discovered that it is related to the maximum size of the image that Caffe supports.
Currently, the test script for wider sets the shrinking parameter like this:
max_im_shrink = (0x7fffffff / 577.0 / (image.shape[0] * image.shape[1])) ** 0.5
I can't find anywhere in Caffe documentation how to get the maximum image size, could you explain what those numbers means? I could run the experiment in my GPU by dramatically reducing max_im_shrink
but I am not sure how it will affect the final scores if the images are that small.
As a side note, the experiment failed on both 8GB (GTX 1080) and 16 GB (Tesla V100-SXM2) GPUs
Kind regards
When I run the wider_test.py, it meet the problem of out of memory, but I use single picture to run,it works.
Why are the results I run your model smaller about 50 pixels than your results (I used your provided model parameters).
Dear @sfzhang15,
Thank you for your fantastic work. Have you ever trained your model for person detection?
Simply add conv3_3 will makes the training speed drop significant, could you explain what's the solution about this issue?
非常感谢您分享的成果。在研究过程中遇到了一些问题需要请教一下。我没有修改代码直接进行训练的,从测试的效果来看还不错。在研究代码的过程中我发现一个问题,代码中将网络reshape到了输入图像的大小,而不是将图像resize到网络输入的大小。我对两种情况进行了对比测试,发现效果差不多,但后者速度是前者的5倍左右。愿闻其详,谢谢!
Hello, Shifeng, sorry abut disturbing you again.
When I check the train.prototxt again, I find the dilation param of your fc6 conv layer is 1, i.e. delation:1.
But when I checking the model_libs.py/VGGNetBody, according to your setting, I find that the dilation = dilation * 3 ==3,
So, the doubt is: have you changed the formulation to dilation = dilation * 1 ?
Thanks very much.
In the README you state:
Modify the anchor match code of SSD to implement the 'scale compensation anchor matching strategy'.
Can you elaborate on how to do this?
@sfzhang15 HI
关于FDDB的结果,用您的模型得到的结果如下:
1表示:新标签,没转换椭圆,DiscROC=0.982252,ContROC=0.75709
2表示:旧标签,没转换椭圆,DiscROC=0.979888,ContROC=0.754134
3表示:新标签,已转换椭圆,DiscROC=0.982437,ContROC=0.855419
4表示:旧标签,已转换椭圆,DiscROC=0.979888,ContROC=0.848032
以上结果均基于 FalsePositive=1000
在#5 中,关于FDDB的结果也给出了详细的讨论,
有1个问题:
网络一般输出的都是rectangle box,是不是在递交检测结果之前,都需要将网络的rectangle box转换为ellipse box??
@sfzhang15 您好,
在测试时,是按照paper中的参数:
we first filter out most boxes by a confidence threshold of 0.05 and keep the top 400 boxes before applying NMS, then we perform NMS with jaccard overlap of 0.3 and keep the top 200 boxes.
还是直接按照deploy.prototxt中的参数:
layer {
name: "detection_out"
type: "DetectionOutput"
bottom: "mbox_loc"
bottom: "mbox_conf_flatten"
bottom: "mbox_priorbox"
top: "detection_out"
include {
phase: TEST
}
detection_output_param {
num_classes: 2
share_location: true
background_label_id: 0
nms_param {
nms_threshold: **0.3**
top_k: **5000**
}
code_type: CENTER_SIZE
keep_top_k: **750**
confidence_threshold: **0.05**
}
}
貌似paper中的参数(设置的很小(500/200))为了加速用的
Hi!
In test phase,do you apply the max-out background label for the conv3 3 detection layer?thank you!
hi zhang
i trained the model use your train.prototxt on original ssd, there is no bug, but you said :
Modify the data augmentation code of SSD to make sure that it does not change the image ratio.
Modify the anchor match code of SSD to implement the 'scale compensation anchor matching strategy'.
how to modify the code,can you give your code modified based on ssd?
There seems to be a small bug in the trained model's prototxt file.
On SFD_trained/deploy.prototxt
line 498, convolution layer fc6
is configured to use padding: 3
, which is different than SFD/deploy.prototxt
:
layer {
name: "fc6"
...
convolution_param {
num_output: 1024
pad: 3 ### This seems wrong
kernel_size: 3
...
and
$ diff SFD/deploy.prototxt SFD_trained/deploy.prototxt
498c498
< pad: 1
---
> pad: 3
This would also cause the feature map dimensions to expand after fc6
, which is also different than what the paper would suggest.
multibox_loss_param {
loc_loss_type: SMOOTH_L1
conf_loss_type: SOFTMAX
loc_weight: 1.0
num_classes: 2
share_location: true
match_type: PER_PREDICTION
overlap_threshold: 0.35
use_prior_for_matching: true
background_label_id: 0
use_difficult_gt: true
neg_pos_ratio: 3.0
neg_overlap: 0.35
code_type: CENTER_SIZE
ignore_cross_boundary_bbox: false
mining_type: MAX_NEGATIVE
}
}
In the paper, "lambda= 4 to balance the loss of classification and regression." May I know where do you set lambda = 4? do we set loc_weight: 1.0 to 0.25?
Or modify conf in multibox_loss_layer.cpp?
Thank you~
Hi @sfzhang15, sorry but can I ask questions about your FaceBoxes work here? Else where else can I contact you? Thank you!
hi,
I am now reproducing the paper, and now confused about the section 3.2. Scale compensation anchor matching strategy in your paper.
the paper says that at stage two,
firstly picking out anchors whose jaccard overlap with this face are higher than 0:1, then sorting them to select top-N as matched anchors of this face. We set N as the average number from stage one.
I think chances are that some anchor may match more that one face bounding boxes, how do you deal with this.
And the other question is the matched anchor at stage two may include the anchors matched at stage one, i want to know how you select the top-N matched anchors at stage two.
thanks
@sfzhang15 hi, i have implemented the code of data augmentation & maxout in your training process, and just use these(change some codes in SSD), our model can achieve 93.6/ 92.3/ 84.0. But when i implemented anchor matching in function matchbbox(), the performance drops. Can you guide me how to solve this problem?
I just add step 2 after Bipartite matching(step 1 is the source code of ssd, i donot changed it). What surprised me is that i found after step 1, above 10 anchors of each gt faces is chosed (not below 5 as you said). Maybe i should change some code in step 1?
My gpu is TitanX, memory is 12G, but when I test the code in wider, it shows 'out of memory'???? Anyone meets the same problem?
Hello together,
I don't understand why SSD needs to be downloaded to run the S3FD...
can anyone help me out with an explanation?
Thanks in advance!
Hi @sfzhang15 ,
I just found that the padding size of fc6 is 3, which will cause the feature map of fc7 to be 24x24 instead of 20x20 (as paper described). I am curious whether this will affect the matching between prior box and ground truth?
Best,
Hi, thanks your code.
I am new to ssd, and i want train a model on myself data set which contain many small objects, but the loss is also 6.xx.
when I use the trained model predict the test image, the score is all 0.2xx. I'll appreciate it if you give me some help(I used your settings on 'train.prototxt').
thanks
Hi @sfzhang15 ,
In the top- N selection is it safe to assume to take N as 6? All the issues you have mentioned to take it as 6. So for VGG SFD trained on WIDER Face has this parameter as 6 or is there any other logic for it?
Hello together,
while testing the sfd I recognized that some bounding boxes x-values are negative.
Am I the only one facing this issue?
Could the reason be the mapping of the different sized pictures while detection?
Any help appreciated! Thanks in advance. :)
In the paper, "one is the biggest square patch, and the size of the other four square patches range between [0.3, 1] of the short size of the original image."
When training, you can get square patch, but how to get square patch when testing.
WARP?FIT_SMALL_SIZE?or FIT_LARGE_SIZE_AND_PAD?
Hi,
I tried converting code from caffe to tensorflow. I didn't do the 'subsampling of fc layers to conv', rest all is same your paper implementation.
The loss which I am getting is very high and diverse, in range of 1 to 2000. So just wanted to ask if you faced any high loss values during training.
The resultant is giving 0.2 mAP in widerface_val for 12k iterations (Batch size 1 as GPU is pretty small). Will train more to get better results.
I modified the net struct to be same to your train.prototxt and the modified sampler.cpp and match_bbox.cpp.
Now begin to train, set the lr=0.001 but not converge, always displays NAN. Then change lr to 0.0001. it still NAN. Can you give some instructs? Thank you very much
This might be a silly question, but could you provide details on how did you compute mAP for WIDER? (Table 3 of the paper). AFAIK one approach could be the proposed in the Pascal Challenge (interpolating precision values from recall values above certain thresholds), but I am not sure if the same methodology was applied in the paper.
Or did you use the eval_tools provided by WIDER?
Kind regards
Dear @sfzhang15,
As you have mentioned in your paper Our anchors are 1:1 aspect ratio (_i.e.,_ square anchor)
, the aspect ratios of all anchor boxes are 1:1. However, if one wants to change the aspect ratios of anchor boxes for example, to 1:1.4 (width:height) the network structure must be changed or just the training code & decoding strategy must be changed? Would you please explain how one can change the aspect ratios of anchor boxes?
Can you explain the parameters in 'Fit.mat' in fddb_from_rectangle_to_ellipse? and how to train the w? Thank you very much
Thank you Shifeng for your great work and big congratulations on your new CVPR2018 and IJCAI work. I just looked around your website and I'm truly impressed what you did so far. I believe that you are and will be an awesome scientist who has a great impact to our society. Definitely!
Back to the SFD, it is now more than one year since your publication but I still find it useful and one of state-of-art on Face Detection. Though, I'm confused somehow with the matching strategy of SFD. I found your answer from previous issue:
We just directly use IoU threshold = 0.35
choose them all
After step 1, for these GT whose matched anchors are less than 6, we use step 2 to choose its top6 > anchor to match
As my understanding:
anchor_box
can be assigned to only one ground truth label (or 1 face).anchor_box
because of the fact #1. At step 1, each anchor_box
is assigned to the best match (to the face with the highest iou
).N = 4
. There is a list of faces which has less than 4 anchor_boxes list_faces = [ face1, face2, face3]
, and some [available_anchor_boxes = box1, box2, box3, box4, box5]
still available (weren't assigned to any faces in step 1). and the number of matched anchor boxes for face1, face2, face3
is [2, 3, 1]
. All of them < topN
(4).My question is:
A. So we have to find topN more anchor boxes for face1, face2, face3
-> so there new # of anchor boxes should be [2 + 4, 3 + 4, 1 + 4]
or just find topK = topN - # of already matched anchor boxes, like
[2 + 2, 3 + 1, 1 + 3]`.
B. In case of [2 + 2, 3 + 1, 1 + 3]
, so we will iterate list_faces
sequentially from face1 -> face2 -> face3 to find topK
?
C. If B is correct, for face1
, we will find the topK = 2
, say, [box5, box2]
. It means that we will mark box5
and box2
as matched anchors (then they will not be assigned for any other face anymore). However, the best match for box5
is face2
, and box2
is face3
. After assigning topK
anchor boxes for face1
, the available boxes now are [box1, box3, box4]
If this method is performed, face2
will be assigned with box1
. The available boxes now is [box3, box4]
, and we can just assign box4
for face3
even we need to assign topK = 3
.
I know that the probability for this overlapping C
case happening is very low when we have the huge number anchor boxes, say here 33125
, not 5
boxes like the above example. However, I just want to make sure that your idea is correctly implemented like this or different?
I hope the above example is simple and intuitive enough to point out my confustion and make your idea clearer to us. I would really appreciate if you can review my understanding 1, 2, 3 and my question A, B, C.
Many thanks and have a nice day.
Hi, will share the train code?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.