Coder Social home page Coder Social logo

kpzhang93 / mtcnn_face_detection_alignment Goto Github PK

View Code? Open in Web Editor NEW
2.8K 2.8K 1.0K 32.04 MB

Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Neural Networks

License: MIT License

HTML 13.81% JavaScript 0.06% CSS 18.56% MATLAB 67.57%

mtcnn_face_detection_alignment's People

Contributors

kpzhang93 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mtcnn_face_detection_alignment's Issues

Questions about training datasets

Hello,
thanks for sharing your code. I have some questions about training.

  1. which dataset did you use for training?
  2. how did you select positive samples? for example the Widerface dataset is labeled by rectangular, but in MTCNN, the bounding box should be square, how you adjust the groundtruth?
    Thank you

Can the model detect and align once one batch?

It is too slow to do the face detection and alignment when dataset is large if process images one by one. So I wander if it is possible to change your script a little bit to make it work when the input is a batch?

about caffemodel?

there are 4 caffemodel in code/codes/MTCNNv1/model/,one of them is about reset101 ,where the model come from
if i want use reset50 ,how can i do?
where can i download form except train by myself?

Understanding cascading of sizes in mtcnn

Hi,
Im trying to follow through the code and understand how mtcnn works. I understand that for each image, for each scale the detection comes from each of the networks. In particular I am talking about the Pnet right now.

The image is rescaled according to the scales produced earlier and the rescaled image goes into the Pnet as follows in the code:

%Code file: detect_face.m
if fastresize
			im_data=imResample(im_data,[hs ws],'bilinear');
		else 
			im_data=(imResample(img,[hs ws],'bilinear')-127.5)*0.0078125;
		end
		PNet.blobs('data').reshape([hs ws 3 1]);
		out=PNet.forward({im_data});

For reference I have printed out the original size and the rescaled size:
ORIGINAL Height: 340
ORIGINAL Width: 151
SCALE USED (were computed before): 0.107493555074
RESCALED Height: 37
RESCALED Width: 17

The net corresponds to Pnet and in det1.prototxt (PNet) the input size should have h=12 and w=12.

% Code file: det1.prototxt 
input_dim: 1
input_dim: 3
input_dim: 12
input_dim: 12

What I don't understand is where is the size going from size of image to 12x12?

Minor detection performance when using greyscale images

I have the impression that using greyscale images has a bad impact on the detection performance. I made an experiment: I used an RGB image and found a face, afterwards I converted to greyscale and the detector was no longer able to find a face in this image. In my opinion this behaviour could occur as the detector was trained with more RGB than greyscale images (or images where R=G=B). Could this be the case?

P net was very bad? why?

i have ready 1 million pictures. 03 is pos,0.3 is part,0.4 is neg.
I trained the Pnet, the result is very bad.

Regarding licence of model file and dataset.

hi @kpzhang93 ,
I have gone through the repository and found that the license type used in this repository is "MIT".
1.Is the model file also covered under MIT license?
2.With your knowledge, any idea about the licensing of datasets you used?

Paper have a typo?

Equation 1 from paper have a typo? Looks like it should be cross-entropy loss which one is -(y_true*log(y_pred)+(1-y_true)*log(1-pred_y) and not -(y_true*log(y_pred)+(1-y_true)*(1-log(pred_y))) as in paper.

in generateBoundingBox function, why plus 1?

in generateBoundingBox function of the file generateBoundingBox.m, there is "boundingbox=[fix((stride*(boundingbox-1)+1)/scale) fix((stride*(boundingbox-1)+cellsize-1+1)/scale) score reg]". Then my question is why plus 1.

Thank you in advance.

How to produce 68 face landmark points?

Hello I am new in face detection and landmark detection. My question is based on this network how can I produce 68 points of landmark, because in thisdemo it just can produce 5 points. Do I need to train again with 68 landmark?
-Thank you-

face ROLL is a weakness of MTCNN?

yaw, pitch, roll for the three most common problems in face detection.
According to my testing,
Roll is a weakness of MTCNN, almost more than 20 degrees will not recognize it.
I try using mtcnn' landmark detection to do the face alignment in face roll situation, but it's not work.
Is there any method to improve this situation, like increase the rotation of the dataset to train or something?
fig-1-orientation-of-the-head-in-terms-of-pitch-roll-and-yaw-movements-describing

training code

Hi,

Thanks for sharing the code. I want to reproduce the result and maybe make some modifications on the network architecture.Is it possible to share the training code with us? especially, how to prepare the dataset. Thanks.

Detection Confidence

Hi, is there any variable in the detection function that we can use as face detection confidence ? Please inform. Thanks in advance.

Loss value for invalid samples when training

Hi, for this multi-task network, if there is no valid samples for some task in one batch, such as no -2 label for landmark features or no -1 and 1 labels for bbox features. when this happens, what the corresponding loss value should be? nan or 0?
Thanks in advance!

Face Alignment

After running demo.m on MTCNNv1 or v2, how can I align that face with respect to the eye points? Is there code to do so?

Some questions about generateBoundingBox() function.

Hey, thanks for your excellent job! I have one question about generateBoundingBox.m

    [y x]=find(map>=t);
    a=find(map>=t); 
    if size(y,1)==1
        y=y';x=x';score=map(a)';dx1=dx1';dy1=dy1';dx2=dx2';dy2=dy2';
    else
        score=map(a);
    end

When size(y,1)==1, there is only one point found in map. So why do you transpose those variables?

BTW, I implement mtcnn in python. Here is the repo.

Thanks.

如何在linux中部署该工程

我在linux已经成功安装了caffe,想在cpu模式下测试一下该工程,但是没有找到相关教程,编译一直出错

Model - param has no field named "transpose" and "predict_box_param"

Hello,

When I execute MTCNN on linux, some fields are not recognizing by the caffe library and I have the followings errors :

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 12:12: Message type "caffe.MemoryDataParameter" has no field named "transpose".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1112 10:53:38.052264 3463 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: det1-memory.prototxt
*** Check failure stack trace: ***

This error is related to the proto "memory_data_param" which has no "transpose" field

When I remove this field of the model "det1-memory.prototxt", a new error appear :

[libprotobuf ERROR google/protobuf/text_format.cc:245] Error parsing text-format caffe.NetParameter: 196:21: Message type "caffe.LayerParameter" has no field named "predict_box_param".
WARNING: Logging before InitGoogleLogging() is written to STDERR
F1112 10:55:39.076514 3523 upgrade_proto.cpp:79] Check failed: ReadProtoFromTextFile(param_file, param) Failed to parse NetParameter file: det1-memory.prototxt
*** Check failure stack trace: ***

This time, caffe don't know the layer of type "PredictBox".

If I remove this layer of the file, the model success to load but at the runtime, I have the following error :

F1112 10:37:54.543467 3165 data_transformer.cpp:290] Check failed: img_height == height (360 vs. 256)

I think to resolve the problem I need to modify the proto layers of caffe but I don't know exactly how.

I use :

  • source code : MTCNN_face_detection_alignment/code/codes/vs/CascadeFaceDetection/
  • model : MTCNN_face_detection_alignment/code/codes/MTCNNv2/model/
    + det1-memory.prototxt and det1.caffemodel
    + det1-memory-stitch.prototxt and det1.caffemodel
    + det2-memory.prototxt and det2.caffemodel
    + det3-memory.prototxt and det3.caffemodel
    + det4-memory.prototxt and det4.caffemodel

evaluation result is different from the paper

I evaluated the author's model on WIDER FACE dataset. The result is different from the paper. I set minsize as 10, scale factor as 0.79 and threshold as [0.5 0.5 0.3].
paper's result evaluation result using author's model
easy set: 85.1 83.3
medium set: 82 80.9
hard set: 60.7 62.2

Has anyone reproduced the author's performance on WIDER FACE?

question about Alignment in third stage

why input windows into O-Net two times and the windows are the same? and i do not understand points2=[1-points([2,1,3,5,4],:);points([7,6,8,10,9],:)]? could anybody answer? thank you very much @ @kpzhang93

WIDERface thresholds

Dear sir:
what are the thresholds when you test your algorithm on the WIDERface validation set ?
and do you use the same models that you released in this project ?
thank you!

参数设置问题

您好,非常感谢您开源的模型。我现在想用您的模型进行人脸检测和关键点定位。但是我有一些问题需要请教,主要是关于下面这个函数:

[total_boxes points] = detect_face(img,minsize,PNet,RNet,ONet,threshold,fastresize,factor)

参数 minsize , threshold , ,factor 该依据什么准则进行设置?虽然您在demo里提供了一些默认参数,但是我并不知道您是根据什么来设置的。
再次感谢。

How to change boundingbox size?

Hi, I'm new to matlab. I don't know how to change the boundingbox size. For I want to detect face and crop it with a margin, what can I do to change the boundingbox size with a margin? I use MTCNNv2 ...Any help will be appreciated!

matlab代码中的疑问

您好,我有两个问题,
1.第三层网络输出的10个点,前5个是x 后5个是y 的偏移比例对吧?
2.在pad中box是有微调的,total_boxes是否也应该调整一样?
谢谢

loss layers when training?

I want to train the data by myself.
My method:
input data: four kinds of training data( includes positive, negative, part, landmarks)
for different data, only backward specific loss layers.(eg. For positive update regression Loss and landmark loss, for negative update classification loss , for landmark face update landmark loss)
However, in the paper, We use det:box:landmark = 1:0.5:0.5 in P-Net and R-Net, how can i implement it?

  1. change loss layer Weigh_Loss to 2:1:1?
  2. make training data pos:neg:part:land = 3:2:1:1?

I wish to know the two method above is it correct?
What is your method?
What the data proportion(pos:neg:part:land = ?:?:?:?)

Thanks a lot

Reproduce WIDER score

Hello
I try to reproduce MTCNNv1 WIDER score on validation data
I have not exactly same resault as in paper or mat files eval/plot/baselines/Val/setting_int/multitask-cascade-cnn/wider_pr_info_multitask-cascade-cnn_easy_val.mat
I set parameters:
threshold=[0.5 0.5 0.3];
factor=0.79;
And have:
0.834 0.810 0.624
vs
0.848 0.825 0.598
Is difference more then 1% mAP significantly?

为什么特征点定位效果好

论文里Evaluation on AFLW for face alignment
和其他方法进行了比较,请问本方法特征点定位比其他方法好的原因是什么呢?

与TCDCN方法相比 本方法Onet48网络结构更简单,为何效果更好呢?
与人脸框回归有关系?如果有,那是什么关系呢

scans location recover in function about generateBoundingBox() ?

i have been reimplementing this work presently, but i have a problem about the "P-net",we need recover original scans location from output feature maps, but why the stride was set 2, i could not understand it clearly, the code is follow:
stride=2; boundingbox=[fix((stride*(boundingbox-1)+1)/scale) fix((stride*(boundingbox-1)+cellsize-1+1)/scale) score reg];
could you explain it simplely? thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.