nenadmarkus / pico Goto Github PK

View Code? Open in Web Editor NEW

642.0 60.0 206.0 2.54 MB

A minimalistic framework for real-time object detection (with a pre-trained face detector)

Home Page: https://arxiv.org/abs/1305.4537

License: MIT License

Makefile 0.56% C 79.87% Python 11.36% Shell 0.95% HTML 7.26%

face detection decision trees

pico's Introduction

Pixel Intensity Comparison-based Object detection (pico)

This repository contains the code for real-time face detection. Check out a demo video at http://www.youtube.com/watch?v=1lXfm-PZz0Q to get the better idea.

The pico framework is a modifcation of the standard Viola-Jones method. The basic idea is to scan the image with a cascade of binary classifers at all reasonable positions and scales. An image region is classifed as an object of interest if it successfully passes all the members of the cascade. Each binary classifier consists of an ensemble of decision trees with pixel intensity comparisons as binary tests in their internal nodes. This enables the detector to process image regions at very high speed. The details are given in http://arxiv.org/abs/1305.4537.

Some highlights of pico are:

high processing speed;
there is no need for image preprocessing prior to detection;
there is no need for the computation of integral images, image pyramid, HOG pyramid or any other similar data structure;
all binary tests in internal nodes of the trees are based on the same feature type (not the case in the V-J framework);
the method can easily be modified for fast detection of in-plane rotated objects.

It can be said that the main limitation of pico is also its simplicity: pico should be avoided when there is a large variation in appearance of the object class. This means, for example, that pico should not be used for detecting pedestrians. Large, modern convolutional neural networks are suitable for such cases.

However, pico can be used for simple object classes (e.g., faces or templates) when real-time performance is desired.

Detecting objects in images and videos

The folder rnt/ contains all the needed resources to perform object detection in images and video streams with pre-trained classification cascades. Specifically, sample applications that perform face detection can be found in the folder rnt/samples/.

Note that the library also enables the detection of rotated objects without the need of image resampling or classification cascade retraining. This is achieved by rotating the binary tests in internal tree nodes, as described in the paper.

Embedding the runtime within your application

To use the runtime in your own application, you have to:

Include a prototype of find_objects(...) in your code (for example, by adding #include picornt.h)
Compile picornt.c with your code
Load/include a classification cascade generated with picolrn (e.g., rnt/cascades/facefinder)
Invoke find_objects(...) with appropriate parameters

Notice that there are no specific library dependencies, i.e., the code can be compiled out-of-the-box with a standard C compiler.

To get a feel for how the library works, we recommend that you look at rnt/samples/native/sample.c as is was specifically written to be used as documentation.

Learning custom object detectors

The program picolrn.c (available in the folder gen/) enables you to learn your own (custom) object detectors. The training data has to be provided in a specific format. The details are printed to the standard output when picolrn is invoked without parameters. It is often convenient to pipe this information into a text file:

$ ./picolrn > howto.txt

A tutorial that guides you through the process of learning a face detector can be found in the folder gen/sample/.

Citation

If you use the provided code/binaries for your work, please cite the following paper:

N. Markus, M. Frljak, I. S. Pandzic, J. Ahlberg and R. Forchheimer, "Object Detection with Pixel Intensity Comparisons Organized in Decision Trees", http://arxiv.org/abs/1305.4537

Contact

For any additional information contact me at [email protected].

pico's People

Contributors

Stargazers

Watchers

Forkers

kod3r biotrump leizi007 joshmh liujiantong amos-zq jwdai weilunzhong zmxu menpo blackball arrmac thuanvh henryl zhoushiwei geraint0923 twyt bullud williamtang ak4900 larics ydm2011 avtomaton brightown berak albertofernandezvillan twnming mrgloom cmxnono leezivin ydnaol facegen alfredtofu samuel--hu miaomiao1989 mfzhang liuguoyou matrixplayer shuaihuang xhniu mcanthony nanyomy zzuo1 caomw abreheret wait1988 jac578 bryanbocao hopef xiaotie walkoncross dreadlord1984 zuoshaobo wincle aiilab hyang428 alexanderpu kensun0 keeganren protez llerrito vincentliubuaa catree ebuty twinsyssy1018 tianboguangding zeyuan1987 chagge bikong2 rokirt bigtree23 liuhuiwisdom arasharchor xjsxujingsong hardold honghucode priestd09 sunxingxingtf sherlocktowne jwmneu containerz peterzs yjhgithub jay2002 iscas-lee lijian8 hxl1990 zymcool coocoky venkatarajasekhar viperlab zgsxwsdxg mikalv taherahmadi templeblock elegantgod huihui891 usernamegemaoa geogreff yangustc

pico's Issues

can't detect rotated face

I can't detect rotated face, why?

train data problem

Hi, i process the geki.py and background.py and gain a file, then, i process the picolrn to train my data. However, it seemed to be unworkable, which only has 2 KB data and cannot detect anything. And also the training process went through fast. Did i miss any steps? or Which operations would make this promblem happen?

Thank you for your help!

Problem in training my

Scanner logic should be from largest to smallest

I actually thought about doing a PR to suggest this improvement, but I could not reverse the logic without affecting the parameters sent to the methods.

Face detection starts from the smallest face to the largest, but with the limit of detections that can be manipulated. This can cause the larger face to be undetected (because the detection limit has burst) and I believe the correct one would be to start from the largest to the smallest.

Sorry my English, I'm going to put a photo to show what happens.

Serializing Models

Hi,

I'm part of a small team at Imperial College London called Menpo, and we are focusing on creating some tools in Python that make certain areas of computer vision easier. I've read your paper about object detection and obviously found the Pico repository via that.

I have wrapped Pico in Cython for use in Python over at cypico and I'm currently using a forked version of Pico that correctly compiles in Visual Studio. I was wondering if you would be interested in doing some collaboration so that we might make training and serializing Pico models more friendly from within Python.

Is this something you might be interested in?

Cheers,

Patrick and The Menpo Team

Advice on training custom detectors?

Hi there, I'm trying to use picolrn to train my own custom detectors (to detect heads, not just faces).

While it does train, I haven't been able to get it to converge - there are still far too many false positives (and it never reaches that 10E-6 threshold you set in the original training script).

With this in mind, I was wondering whether you had any advice as to how to facilitate training? I'm (trying) to use the wider dataset (http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) but to little avail.

Interestingly, using your genki.py and genki script seems to work - what's special about this dataset in particular? Can pico not handle more variation of valid detections?

I've looked at images in the 'export' method of both my preproc script and genki.py and they seem to output the same sort of images (obviously, my images tend to have slightly more variety).

I'm more than happy to post my preproc scripts here and would love to hear any suggestions from yourself (if you have any!)

I look forward to hearing from you soon - and many thanks in advance!

i have a qustion in the function sample_training_data();

hi
thank you for sharing the code. i had red the train code. i have a confused question.
in the function sample_training_data(), first , you are sampling the positive samples in object class by the function classify_region(). i see that you sample a positive sample if the classify_region return 1.
second , you sample a false positive sample if the classify_region function return 1 in non-object class , that is where that caused me comfused. in my mean that in non-object class there is no positive sample.why is the function classify_region return 1?
thanks!

Could I ask what does s = 2.01.5eyedist means in caltechfaces.py?

The same as the Title.

the method is fast!

the problem has been found, and I run the code successfully.

it is surprise that the method can detect face in a 1080*720 with in 8ms ,using 3.2G computer.

thanks!

understanding the lut parameter

what dose the lut mean ? It will be used when detect faces, however I don't understand what it actually mean in your algorithm and I cannot see it explained in your paper

model can't converge

Hi there, I'm trying to use picolrn to train my own face detectors (using AFLW dataset).
first i use genki.py to preprocess my positive samples(24384faces),then i choose around 15G background images.
When i train the model, I haven't been able to get it to converge.In the first stage,there are 4 trees.In every stage,there are max trees as picolrn has setted.
With this in mind, I was wondering whether you had any advice as to how to solve this problem?
Sorry for my poor English!
Thanks!

genki.py error and background dataset question

Hi
question one:
I run the genki.py , got the follow error:
python genki.py path/to/genki > trdata
Traceback (most recent call last):
File "genki.py", line 166, in
export(im, r, c, s)
File "genki.py", line 116, in export
write_rid(im)
File "genki.py", line 54, in write_rid
sys.stdout.buffer.write(hw)
AttributeError: 'file' object has no attribute 'buffer'

question two:
I can not download the background dataset using the following link:
https://googledrive.com/host/0Bw4IT5ZOzJj6NXlJUFh0UGZCWmc
can you give me another link?

The details in Training

I've read your paper, and run your code. It's really great and fast.

But I have some questions here.

It seems like that you train the decision tree using the grey level image. However, because of the different skin color, comparison on grey level may be not the good idea? Have you tried trained on the gradient image(e.g. processed by sobel operator, ignore the gradient angle)?
In each internal node, you randomly do 256 binary tests. Why not just calculated the weighted average face of samples, and compare the max to min. If you want to introduced randomness, you can randomly pick a pair from top N maxs and top N mins.

The idea is great.

Understanding tsr tsc parameters

What tsr tsc parameters actually mean?

what should I set ldim to?

(cross posting from nenadmarkus/picojs#33, since this has more to do with the underlying implementation, and less to do with the javascript framework)

Hi, I want to use pico to find faces on images of arbitrary width/height. I am hoping there is some sort of general rule I can uses for setting the ldim parameter based on image width & height. I have read over the explanation doc https://nenadmarkus.com/p/picojs-intro/

The parameter ldim tells us how to move from one row of the image to the next (known as stride in some other libraries, such as OpenCV).

From what I gather from the examples, it is usually set to the ncols parameter (a.k.a image width). However, I have an example image that is 400x400 pixels. Setting these params:

// where image.width === 400 and image.height === 400
  const image = {
    pixels: rgba_to_grayscale(image_data, image_data.height, image_data.width),
    nrows: image_data.height,
    ncols: image_data.width,
    ldim: image_data.width
  }

This results in zero detections. Setting ldim to the seemingly arbitrary value of 419 results in one detection (which is the desired result). Setting ldim to anything higher results in several detections all correspoding to the same face.

All the other parameters have values taken from examples/image.html
ldim: 400 (the image width):

ldim: 419 (the desired result):

ldim: 420:

ldim: 480:

Problems about accelerating the speed of inference stage

Hi,
I'd like to run the inference model at embed device. Due to the limitation of computing resources, I have some questions as followed:

I've tried to convert the float computing int integer computing. That is to say, after parsing the cascade binary file, the luts and thresholds matrix will be converted to integer.

`int i,j;
FILE* file;
file = fopen("jst_headcascade", "rb");
if(!file)
return 0;

fread(&version, sizeof(int32_t), 1, file);
fread(&bbox[0], sizeof(int8_t), 4, file);
fread(&tdepth, sizeof(int), 1, file);
fread(&ntrees, sizeof(int), 1, file);

for(i=0; i<ntrees; ++i)
{
	fread(&tcodes[i][0], sizeof(int32_t), (1<<tdepth)-1, file);
	fread(&luts[i][0], sizeof(float), 1<<tdepth, file);
	fread(&thresholds[i], sizeof(float), 1, file);
}
fclose(file);

//convert lut and thr to int data
for(i=0;i<ntrees;i++)
{
	int_thresholds[i] = (int)(thresholds[i]*PERCISON);
	for(j=0;j<(1<<tdepth);j++)
	{
		int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);
	}
}`

However, I've met some problems in the "run_cascade" function. The index of vppixels array is out of range.
So I'd like to know the structure of cascade file. The declarations of these matrixs are not fully used during the inference stage.

`int32_t version = 3;

int tdepth;
int ntrees=0;

int8_t bbox[4]; // (r_min, r_max, c_min, c_max)

int32_t tcodes[4096][1024];
float luts[4096][1024];

float thresholds[4096];`

I cant understand the parsing of following code, especially the tcodes and lut
`offset = ((1<<tdepth)-1)sizeof(int32_t) + (1<<tdepth)sizeof(float) + 1sizeof(float);
ptree = (int8_t)cascade + 2sizeof(float) + 2sizeof(int);

*o = 0.0f;

for(i=0; i<ntrees; ++i)
{
	//
	tcodes = ptree - 4;
	lut = (float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t));
	thr = *(float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t) + (1<<tdepth)*sizeof(float));

	//
	idx = 1;

	for(j=0; j<tdepth; ++j)
		idx = 2*idx + (pixels[(r+tcodes[4*idx+0]*s)/256*ldim+(c+tcodes[4*idx+1]*s)/256]<=pixels[(r+tcodes[4*idx+2]*s)/256*ldim+(c+tcodes[4*idx+3]*s)/256]);

	*o = *o + lut[idx-(1<<tdepth)];

	//
	if(*o<=thr)
		return -1;
	else
		ptree = ptree + offset;
}

//
*o = *o - thr;`

Any response is helpful.
Thanks

Background dataset

Dear @nenadmarkus. The provided link from which the background dataset should be downloaded, doesn't work (https://googledrive.com/host/0Bw4IT5ZOzJj6NXlJUFh0UGZCWmc). Is there an alternative link to download it? Otherwise, how should the background dataset images be? Should they have the same resolution as those with the faces? How many images should they be? The same number as those of the faces or more?

Running pico on nope.c

Hi,

We are running pico on nope.c to show how easy it is to web enable cool projects like yours:

http://nopedotc.com/gallery/facedetector

Please let us know what you think!

Thanks and keep up the great work,

Rohana

tree tdepth

1.why the tdepth is 5,can i chang it to other value?
every train cycle,we do choose more background images for training,when trainning result "fpr>maxfpr",we train another tree for this cycle,so every cycle may have more trees,So can we add tree's depth for every cycle when the fpr>maxfpr?
2.when the tree result not fill "fpr>maxfpr",why set the threhold value to -1337?
thank u~

Problem in training my own detectors

I've read your paper, and run the detect demo, using the facefinder which you provided，really fast and good behavior。
Now i‘d love to train my own detectors 、detecting pedestrian，as you say in the instruction，download the dataset（images and background），but i lived in china、i cannot download that。 I‘ve got some pictures and background，what should i do with them？what‘s the format of the training datasets？

Thanks。

$ ./picolrn > howto.txt

does nothing....

there are no details of the format that the training data must be in.