kpzhang93 / mtcnn_face_detection_alignment Goto Github PK

View Code? Open in Web Editor NEW

2.8K 2.8K 1.0K 32.04 MB

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

License: MIT License

HTML 13.81% JavaScript 0.06% CSS 18.56% MATLAB 67.57%

mtcnn_face_detection_alignment's People

Contributors

Stargazers

Watchers

Forkers

happynear mornydew twinsyssy1018 jwmneu kingvision mosesyuan runauto wyw636 poisonbox amos-zq caomw joyhuang9473 tornadomeet leezivin tybxiaobao mowangphy pierrehao winjia darwin-lu dlyshare benjamesbabala wanjinchang linzhineng ghgggg ominux xclshwd 123chengbo michaelbbtiger lijian8 trantorznh chrisyang fishman2008 kixiang yaqilyu pinglforever facegen yaochx cupwater jeocloud shownx bigmai222 kissyzhou facear superyangwenwen ilovecv seanlinx wshenx devy001 geekrick88 sunxingxingtf darengking xhhong joyivan ray201 azorescabbage matrixplayer wangxianliang goodluckcwl yogsin jimmy-ng ctgushiwei yiiwood sosong dingjianfei iscas-lee wanke15 jay2002 milestonesvn zhengfangwu edwardzeng hassyma kekedan xtringer guitaryourself ustbliubo2014 gzzgz 2php walkoncross hahnyuan jizexuan java2man shirleyyim wjgaas lovelyboy1 ilibx cltforever kelly-tlz hexiangquan shuangseu hxl1990 zymcool ckrunauto playif ddx10000 manureghukumar simonzeus tinyloop mbilasco zengjianping xwang0415

mtcnn_face_detection_alignment's Issues

Questions about training datasets

Hello,
thanks for sharing your code. I have some questions about training.

which dataset did you use for training?
how did you select positive samples? for example the Widerface dataset is labeled by rectangular, but in MTCNN, the bounding box should be square, how you adjust the groundtruth?
Thank you

人脸检测与包围盒回归之间的关系是什么？人脸检测与配准之间的关系又是什么？求解答

文章摘要和引言中多次提到人脸特征点定位于包围盒回归之间的关系，以及人脸检测与配准之间的关系。
我不是十分明白这两者之间的关系是什么，或者说文章中这个关系是什么。
另外，我不理解文章中又是怎么利用这个关系的。
谢谢

用MTCNN进行方形人脸裁剪,得到的部分结果为什么有一个像素的误差？

比如说得到150x151这种裁剪出的人脸图像，怎么用代码解决

Can we use the detection algorithm in Java?

It will be great if there available a wrapper for Java, if yes, then how can we get? Thanks!

A question on generateBoundingBox() function for generating the bounding boxes from the heatmaps

Hi ya,

At test, given an image, the PNet outputs the heatmaps of classification scores (i.e. HW2) and the offsets
(i.e. HW4) for each position. Then the python function generate_bbox is used to produce the bounding box for the image. I was confused how this generate_bbox works. What's the meaning of the offsets (i.e. HW4) for each position.

Many thanks

Running with python3

Not an Issue.
I forked your repo and modified a little so it can run with python3. If anyone want to use or contribute. Here is the link https://github.com/vplentz/mxnet_mtcnn_face_detection .

speed up mtcnn

@kpzhang93 mtcnn is slow by CPU, is there any solution to speed it up in real time?

Can the model detect and align once one batch?

It is too slow to do the face detection and alignment when dataset is large if process images one by one. So I wander if it is possible to change your script a little bit to make it work when the input is a batch?

about caffemodel？

there are 4 caffemodel in code/codes/MTCNNv1/model/,one of them is about reset101 ,where the model come from
if i want use reset50 ,how can i do?
where can i download form except train by myself?

Understanding cascading of sizes in mtcnn

Hi,
Im trying to follow through the code and understand how mtcnn works. I understand that for each image, for each scale the detection comes from each of the networks. In particular I am talking about the Pnet right now.

The image is rescaled according to the scales produced earlier and the rescaled image goes into the Pnet as follows in the code:

%Code file: detect_face.m
if fastresize
			im_data=imResample(im_data,[hs ws],'bilinear');
		else 
			im_data=(imResample(img,[hs ws],'bilinear')-127.5)*0.0078125;
		end
		PNet.blobs('data').reshape([hs ws 3 1]);
		out=PNet.forward({im_data});

For reference I have printed out the original size and the rescaled size:
ORIGINAL Height: 340
ORIGINAL Width: 151
SCALE USED (were computed before): 0.107493555074
RESCALED Height: 37
RESCALED Width: 17

The net corresponds to Pnet and in det1.prototxt (PNet) the input size should have h=12 and w=12.

% Code file: det1.prototxt 
input_dim: 1
input_dim: 3
input_dim: 12
input_dim: 12

What I don't understand is where is the size going from size of image to 12x12?

the question about the way to generate train datas

why is used the pnet to generate train datas for rnet? Whether can I use the same way to gererate train datas in pnet to generate train datas fot rnet?

Minor detection performance when using greyscale images

I have the impression that using greyscale images has a bad impact on the detection performance. I made an experiment: I used an RGB image and found a face, afterwards I converted to greyscale and the detector was no longer able to find a face in this image. In my opinion this behaviour could occur as the detector was trained with more RGB than greyscale images (or images where R=G=B). Could this be the case?

Missing CaffeBinding.h

I find it in your caffe library

请问作者是否开源训练代码呢

P net was very bad? why?

i have ready 1 million pictures. 03 is pos,0.3 is part,0.4 is neg.
I trained the Pnet, the result is very bad.

Does the training data have the gray image, it's difficult to detect the gray image?

Undefined function or variable 'rectgenpathangle'.

When I run demo.m in MTCNNv2 , there occurs an error:
Undefined function or variable 'rectgenpathangle'.
How can I fix it ?

Regarding licence of model file and dataset.

hi @kpzhang93 ,
I have gone through the repository and found that the license type used in this repository is "MIT".
1.Is the model file also covered under MIT license?
2.With your knowledge, any idea about the licensing of datasets you used?

Paper have a typo?

Equation 1 from paper have a typo? Looks like it should be cross-entropy loss which one is -(y_true*log(y_pred)+(1-y_true)*log(1-pred_y) and not -(y_true*log(y_pred)+(1-y_true)*(1-log(pred_y))) as in paper.

in generateBoundingBox function, why plus 1?

in generateBoundingBox function of the file generateBoundingBox.m, there is "boundingbox=[fix((stride*(boundingbox-1)+1)/scale) fix((stride*(boundingbox-1)+cellsize-1+1)/scale) score reg]". Then my question is why plus 1.

Thank you in advance.

请问boundbox_regression输出的四个值，是关于【左上角坐标，宽，高】的，还是【左上角坐标，右下角坐标】的？代码貌似和论文不一致，求教。

the gaussian pyramid operation is necessary?

Before face detection operation, why do we need the gaussian pyramid images?

How to produce 68 face landmark points?

Hello I am new in face detection and landmark detection. My question is based on this network how can I produce 68 points of landmark, because in thisdemo it just can produce 5 points. Do I need to train again with 68 landmark?
-Thank you-

face ROLL is a weakness of MTCNN?

yaw, pitch, roll for the three most common problems in face detection.
According to my testing,
Roll is a weakness of MTCNN, almost more than 20 degrees will not recognize it.
I try using mtcnn' landmark detection to do the face alignment in face roll situation, but it's not work.
Is there any method to improve this situation, like increase the rotation of the dataset to train or something?

How to understand LNet in MTCNNv2

How to understand LNet in MTCNNv2 ?

training code

Hi,

Thanks for sharing the code. I want to reproduce the result and maybe make some modifications on the network architecture.Is it possible to share the training code with us? especially, how to prepare the dataset. Thanks.

Detection Confidence

Hi, is there any variable in the detection function that we can use as face detection confidence ? Please inform. Thanks in advance.

ffe

Loss value for invalid samples when training

Hi, for this multi-task network, if there is no valid samples for some task in one batch, such as no -2 label for landmark features or no -1 and 1 labels for bbox features. when this happens, what the corresponding loss value should be? nan or 0?
Thanks in advance!

Face Alignment

After running demo.m on MTCNNv1 or v2, how can I align that face with respect to the eye points? Is there code to do so?

Can I know why 12 in m=12/minsize?

Some questions about generateBoundingBox() function.

Hey, thanks for your excellent job! I have one question about generateBoundingBox.m

    [y x]=find(map>=t);
    a=find(map>=t); 
    if size(y,1)==1
        y=y';x=x';score=map(a)';dx1=dx1';dy1=dy1';dx2=dx2';dy2=dy2';
    else
        score=map(a);
    end

When size(y,1)==1, there is only one point found in map. So why do you transpose those variables?

BTW, I implement mtcnn in python. Here is the repo.

Thanks.

如何在linux中部署该工程

我在linux已经成功安装了caffe，想在cpu模式下测试一下该工程，但是没有找到相关教程，编译一直出错

Model - param has no field named "transpose" and "predict_box_param"

Hello,

When I execute MTCNN on linux, some fields are not recognizing by the caffe library and I have the followings errors :

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 12:12: Message type "caffe.MemoryDataParameter" has no field named "transpose".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1112 10:53:38.052264 3463 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: det1-memory.prototxt
*** Check failure stack trace: ***

This error is related to the proto "memory_data_param" which has no "transpose" field

When I remove this field of the model "det1-memory.prototxt", a new error appear :

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 196:21: Message type "caffe.LayerParameter" has no field named "predict_box_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1112 10:55:39.076514 3523 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: det1-memory.prototxt
*** Check failure stack trace: ***

This time, caffe don't know the layer of type "PredictBox".

If I remove this layer of the file, the model success to load but at the runtime, I have the following error :

F1112 10:37:54.543467 3165 data_transformer.cpp:290] Check failed: img_height == height (360 vs. 256)

I think to resolve the problem I need to modify the proto layers of caffe but I don't know exactly how.

I use :

source code : MTCNN_face_detection_alignment/code/codes/vs/CascadeFaceDetection/
model : MTCNN_face_detection_alignment/code/codes/MTCNNv2/model/
+ det1-memory.prototxt and det1.caffemodel
+ det1-memory-stitch.prototxt and det1.caffemodel
+ det2-memory.prototxt and det2.caffemodel
+ det3-memory.prototxt and det3.caffemodel
+ det4-memory.prototxt and det4.caffemodel

train solver

could you share your solver of Onet?

evaluation result is different from the paper

I evaluated the author's model on WIDER FACE dataset. The result is different from the paper. I set minsize as 10, scale factor as 0.79 and threshold as [0.5 0.5 0.3].
paper's result evaluation result using author's model
easy set: 85.1 83.3
medium set: 82 80.9
hard set: 60.7 62.2

Has anyone reproduced the author's performance on WIDER FACE?

why resizing with a minus 127.5 and multiply 0.0078125

in the detect_face.m line 34:
im_data=(imResample(img,[hs ws],'bilinear')-127.5)*0.0078125;

how to understand this magic number?

question about Alignment in third stage

why input windows into O-Net two times and the windows are the same? and i do not understand points2=[1-points([2,1,3,5,4],:);points([7,6,8,10,9],:)]? could anybody answer? thank you very much @ @kpzhang93

WIDERface thresholds

Dear sir:
what are the thresholds when you test your algorithm on the WIDERface validation set ?
and do you use the same models that you released in this project ?
thank you!

参数设置问题

您好，非常感谢您开源的模型。我现在想用您的模型进行人脸检测和关键点定位。但是我有一些问题需要请教，主要是关于下面这个函数：

[total_boxes points] = detect_face(img,minsize,PNet,RNet,ONet,threshold,fastresize,factor)

参数 minsize , threshold , ,factor 该依据什么准则进行设置？虽然您在demo里提供了一些默认参数，但是我并不知道您是根据什么来设置的。
再次感谢。

How to change boundingbox size?

Hi, I'm new to matlab. I don't know how to change the boundingbox size. For I want to detect face and crop it with a margin, what can I do to change the boundingbox size with a margin? I use MTCNNv2 ...Any help will be appreciated!

matlab代码中的疑问

您好，我有两个问题，
1.第三层网络输出的10个点，前5个是x 后5个是y 的偏移比例对吧？
2.在pad中box是有微调的，total_boxes是否也应该调整一样？
谢谢

loss layers when training?

I want to train the data by myself.
My method:
input data: four kinds of training data( includes positive, negative, part, landmarks)
for different data, only backward specific loss layers.(eg. For positive update regression Loss and landmark loss, for negative update classification loss , for landmark face update landmark loss)
However, in the paper, We use det:box:landmark = 1:0.5:0.5 in P-Net and R-Net, how can i implement it?

change loss layer Weigh_Loss to 2:1:1?
make training data pos:neg:part:land = 3:2:1:1?

I wish to know the two method above is it correct?
What is your method?
What the data proportion（pos:neg:part:land = ?:?:?:?）

Thanks a lot

will you share the training code?

Reproduce WIDER score

Hello
I try to reproduce MTCNNv1 WIDER score on validation data
I have not exactly same resault as in paper or mat files eval/plot/baselines/Val/setting_int/multitask-cascade-cnn/wider_pr_info_multitask-cascade-cnn_easy_val.mat
I set parameters:
threshold=[0.5 0.5 0.3];
factor=0.79;
And have:
0.834 0.810 0.624
vs
0.848 0.825 0.598
Is difference more then 1% mAP significantly?

为什么特征点定位效果好

论文里Evaluation on AFLW for face alignment
和其他方法进行了比较，请问本方法特征点定位比其他方法好的原因是什么呢？

与TCDCN方法相比本方法Onet48网络结构更简单，为何效果更好呢？
与人脸框回归有关系？如果有，那是什么关系呢

scans location recover in function about generateBoundingBox() ?

i have been reimplementing this work presently, but i have a problem about the "P-net"，we need recover original scans location from output feature maps, but why the stride was set 2, i could not understand it clearly, the code is follow:
stride=2; boundingbox=[fix((stride*(boundingbox-1)+1)/scale) fix((stride*(boundingbox-1)+cellsize-1+1)/scale) score reg];
could you explain it simplely? thanks.

what size of img cannot send into the net?

i found if the img is too big, eg. 3000*2000, the detection function is corrupt. so i want to know why

Is there a TensorFlow version?

i need a tensorflow version, thx.