Coder Social home page Coder Social logo

nenadmarkus / pico Goto Github PK

View Code? Open in Web Editor NEW
642.0 60.0 206.0 2.54 MB

A minimalistic framework for real-time object detection (with a pre-trained face detector)

Home Page: https://arxiv.org/abs/1305.4537

License: MIT License

Makefile 0.56% C 79.87% Python 11.36% Shell 0.95% HTML 7.26%
face detection decision trees

pico's Introduction

Pixel Intensity Comparison-based Object detection (pico)

This repository contains the code for real-time face detection. Check out a demo video at http://www.youtube.com/watch?v=1lXfm-PZz0Q to get the better idea.

The pico framework is a modifcation of the standard Viola-Jones method. The basic idea is to scan the image with a cascade of binary classifers at all reasonable positions and scales. An image region is classifed as an object of interest if it successfully passes all the members of the cascade. Each binary classifier consists of an ensemble of decision trees with pixel intensity comparisons as binary tests in their internal nodes. This enables the detector to process image regions at very high speed. The details are given in http://arxiv.org/abs/1305.4537.

Some highlights of pico are:

  • high processing speed;
  • there is no need for image preprocessing prior to detection;
  • there is no need for the computation of integral images, image pyramid, HOG pyramid or any other similar data structure;
  • all binary tests in internal nodes of the trees are based on the same feature type (not the case in the V-J framework);
  • the method can easily be modified for fast detection of in-plane rotated objects.

It can be said that the main limitation of pico is also its simplicity: pico should be avoided when there is a large variation in appearance of the object class. This means, for example, that pico should not be used for detecting pedestrians. Large, modern convolutional neural networks are suitable for such cases.

However, pico can be used for simple object classes (e.g., faces or templates) when real-time performance is desired.

Detecting objects in images and videos

The folder rnt/ contains all the needed resources to perform object detection in images and video streams with pre-trained classification cascades. Specifically, sample applications that perform face detection can be found in the folder rnt/samples/.

Note that the library also enables the detection of rotated objects without the need of image resampling or classification cascade retraining. This is achieved by rotating the binary tests in internal tree nodes, as described in the paper.

Embedding the runtime within your application

To use the runtime in your own application, you have to:

  • Include a prototype of find_objects(...) in your code (for example, by adding #include picornt.h)
  • Compile picornt.c with your code
  • Load/include a classification cascade generated with picolrn (e.g., rnt/cascades/facefinder)
  • Invoke find_objects(...) with appropriate parameters

Notice that there are no specific library dependencies, i.e., the code can be compiled out-of-the-box with a standard C compiler.

To get a feel for how the library works, we recommend that you look at rnt/samples/native/sample.c as is was specifically written to be used as documentation.

Learning custom object detectors

The program picolrn.c (available in the folder gen/) enables you to learn your own (custom) object detectors. The training data has to be provided in a specific format. The details are printed to the standard output when picolrn is invoked without parameters. It is often convenient to pipe this information into a text file:

$ ./picolrn > howto.txt

A tutorial that guides you through the process of learning a face detector can be found in the folder gen/sample/.

Citation

If you use the provided code/binaries for your work, please cite the following paper:

N. Markus, M. Frljak, I. S. Pandzic, J. Ahlberg and R. Forchheimer, "Object Detection with Pixel Intensity Comparisons Organized in Decision Trees", http://arxiv.org/abs/1305.4537

Contact

For any additional information contact me at [email protected].

Copyright (c) 2013, Nenad Markus. All rights reserved.

pico's People

Contributors

nenadmarkus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pico's Issues

train data problem

Hi, i process the geki.py and background.py and gain a file, then, i process the picolrn to train my data. However, it seemed to be unworkable, which only has 2 KB data and cannot detect anything. And also the training process went through fast. Did i miss any steps? or Which operations would make this promblem happen?

Thank you for your help!

Scanner logic should be from largest to smallest

I actually thought about doing a PR to suggest this improvement, but I could not reverse the logic without affecting the parameters sent to the methods.

Face detection starts from the smallest face to the largest, but with the limit of detections that can be manipulated. This can cause the larger face to be undetected (because the detection limit has burst) and I believe the correct one would be to start from the largest to the smallest.

Sorry my English, I'm going to put a photo to show what happens.

Screen Shot 2019-06-24 at 17 35 02

Serializing Models

Hi,

I'm part of a small team at Imperial College London called Menpo, and we are focusing on creating some tools in Python that make certain areas of computer vision easier. I've read your paper about object detection and obviously found the Pico repository via that.

I have wrapped Pico in Cython for use in Python over at cypico and I'm currently using a forked version of Pico that correctly compiles in Visual Studio. I was wondering if you would be interested in doing some collaboration so that we might make training and serializing Pico models more friendly from within Python.

Is this something you might be interested in?

Cheers,

Patrick and The Menpo Team

Advice on training custom detectors?

Hi there, I'm trying to use picolrn to train my own custom detectors (to detect heads, not just faces).

While it does train, I haven't been able to get it to converge - there are still far too many false positives (and it never reaches that 10E-6 threshold you set in the original training script).

With this in mind, I was wondering whether you had any advice as to how to facilitate training? I'm (trying) to use the wider dataset (http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) but to little avail.

Interestingly, using your genki.py and genki script seems to work - what's special about this dataset in particular? Can pico not handle more variation of valid detections?

I've looked at images in the 'export' method of both my preproc script and genki.py and they seem to output the same sort of images (obviously, my images tend to have slightly more variety).

I'm more than happy to post my preproc scripts here and would love to hear any suggestions from yourself (if you have any!)

I look forward to hearing from you soon - and many thanks in advance!

i have a qustion in the function sample_training_data();

hi
thank you for sharing the code. i had red the train code. i have a confused question.
in the function sample_training_data(), first , you are sampling the positive samples in object class by the function classify_region(). i see that you sample a positive sample if the classify_region return 1.
second , you sample a false positive sample if the classify_region function return 1 in non-object class , that is where that caused me comfused. in my mean that in non-object class there is no positive sample.why is the function classify_region return 1?
thanks!

the method is fast!

the problem has been found, and I run the code successfully.

it is surprise that the method can detect face in a 1080*720 with in 8ms ,using 3.2G computer.

thanks!

understanding the lut parameter

what dose the lut mean ? It will be used when detect faces, however I don't understand what it actually mean in your algorithm and I cannot see it explained in your paper

model can't converge

Hi there, I'm trying to use picolrn to train my own face detectors (using AFLW dataset).
first i use genki.py to preprocess my positive samples(24384faces),then i choose around 15G background images.
When i train the model, I haven't been able to get it to converge.In the first stage,there are 4 trees.In every stage,there are max trees as picolrn has setted.
With this in mind, I was wondering whether you had any advice as to how to solve this problem?
Sorry for my poor English!
Thanks!

genki.py error and background dataset question

Hi
question one:
I run the genki.py , got the follow error:
python genki.py path/to/genki > trdata
Traceback (most recent call last):
File "genki.py", line 166, in
export(im, r, c, s)
File "genki.py", line 116, in export
write_rid(im)
File "genki.py", line 54, in write_rid
sys.stdout.buffer.write(hw)
AttributeError: 'file' object has no attribute 'buffer'

question two:
I can not download the background dataset using the following link:
https://googledrive.com/host/0Bw4IT5ZOzJj6NXlJUFh0UGZCWmc
can you give me another link?

The details in Training

I've read your paper, and run your code. It's really great and fast.

But I have some questions here.

  1. It seems like that you train the decision tree using the grey level image. However, because of the different skin color, comparison on grey level may be not the good idea? Have you tried trained on the gradient image(e.g. processed by sobel operator, ignore the gradient angle)?
  2. In each internal node, you randomly do 256 binary tests. Why not just calculated the weighted average face of samples, and compare the max to min. If you want to introduced randomness, you can randomly pick a pair from top N maxs and top N mins.

The idea is great.

what should I set ldim to?

(cross posting from nenadmarkus/picojs#33, since this has more to do with the underlying implementation, and less to do with the javascript framework)

Hi, I want to use pico to find faces on images of arbitrary width/height. I am hoping there is some sort of general rule I can uses for setting the ldim parameter based on image width & height. I have read over the explanation doc https://nenadmarkus.com/p/picojs-intro/

The parameter ldim tells us how to move from one row of the image to the next (known as stride in some other libraries, such as OpenCV).

From what I gather from the examples, it is usually set to the ncols parameter (a.k.a image width). However, I have an example image that is 400x400 pixels. Setting these params:

// where image.width === 400 and image.height === 400
  const image = {
    pixels: rgba_to_grayscale(image_data, image_data.height, image_data.width),
    nrows: image_data.height,
    ncols: image_data.width,
    ldim: image_data.width
  }

This results in zero detections. Setting ldim to the seemingly arbitrary value of 419 results in one detection (which is the desired result). Setting ldim to anything higher results in several detections all correspoding to the same face.

All the other parameters have values taken from examples/image.html
ldim: 400 (the image width):
preview
ldim: 419 (the desired result):
preview
ldim: 420:
preview
ldim: 480:
preview

Problems about accelerating the speed of inference stage

Hi,
I'd like to run the inference model at embed device. Due to the limitation of computing resources, I have some questions as followed:

I've tried to convert the float computing int integer computing. That is to say, after parsing the cascade binary file, the luts and thresholds matrix will be converted to integer.

`int i,j;
FILE* file;
file = fopen("jst_headcascade", "rb");
if(!file)
return 0;

fread(&version, sizeof(int32_t), 1, file);
fread(&bbox[0], sizeof(int8_t), 4, file);
fread(&tdepth, sizeof(int), 1, file);
fread(&ntrees, sizeof(int), 1, file);

for(i=0; i<ntrees; ++i)
{
	fread(&tcodes[i][0], sizeof(int32_t), (1<<tdepth)-1, file);
	fread(&luts[i][0], sizeof(float), 1<<tdepth, file);
	fread(&thresholds[i], sizeof(float), 1, file);
}
fclose(file);

//convert lut and thr to int data
for(i=0;i<ntrees;i++)
{
	int_thresholds[i] = (int)(thresholds[i]*PERCISON);
	for(j=0;j<(1<<tdepth);j++)
	{
		int_luts[i][j] = (int)(*(&luts[0][0]+i*1024+j)*PERCISON);
	}
}`

However, I've met some problems in the "run_cascade" function. The index of vppixels array is out of range.
So I'd like to know the structure of cascade file. The declarations of these matrixs are not fully used during the inference stage.

`int32_t version = 3;

int tdepth;
int ntrees=0;

int8_t bbox[4]; // (r_min, r_max, c_min, c_max)

int32_t tcodes[4096][1024];
float luts[4096][1024];

float thresholds[4096];`

I cant understand the parsing of following code, especially the tcodes and lut
`offset = ((1<<tdepth)-1)sizeof(int32_t) + (1<<tdepth)sizeof(float) + 1sizeof(float);
ptree = (int8_t
)cascade + 2sizeof(float) + 2sizeof(int);

*o = 0.0f;

for(i=0; i<ntrees; ++i)
{
	//
	tcodes = ptree - 4;
	lut = (float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t));
	thr = *(float*)(ptree + ((1<<tdepth)-1)*sizeof(int32_t) + (1<<tdepth)*sizeof(float));

	//
	idx = 1;

	for(j=0; j<tdepth; ++j)
		idx = 2*idx + (pixels[(r+tcodes[4*idx+0]*s)/256*ldim+(c+tcodes[4*idx+1]*s)/256]<=pixels[(r+tcodes[4*idx+2]*s)/256*ldim+(c+tcodes[4*idx+3]*s)/256]);

	*o = *o + lut[idx-(1<<tdepth)];

	//
	if(*o<=thr)
		return -1;
	else
		ptree = ptree + offset;
}

//
*o = *o - thr;`

Any response is helpful.
Thanks

tree tdepth

1.why the tdepth is 5,can i chang it to other value?
every train cycle,we do choose more background images for training,when trainning result "fpr>maxfpr",we train another tree for this cycle,so every cycle may have more trees,So can we add tree's depth for every cycle when the fpr>maxfpr?
2.when the tree result not fill "fpr>maxfpr",why set the threhold value to -1337?
thank u~

Problem in training my own detectors

I've read your paper, and run the detect demo, using the facefinder which you provided,really fast and good behavior。
Now i‘d love to train my own detectors 、detecting pedestrian,as you say in the instruction,download the dataset(images and background),but i lived in china、i cannot download that。 I‘ve got some pictures and background,what should i do with them?what‘s the format of the training datasets?

Thanks。

it's not faster than OpenCV 3.1

I just competed the PICO with OpenCV 3.1 method(haar).
The speed of the 2 methods are similar, i strongly recomand you to do something to improve the speed.

Can i improve the code for eye detction

Hi,
Since i'm using raspberry pi 2 , this project seems very interesting because it detect faces fastly but how can i improve it for eye detection , is there a way to do so ?
thanks 🚶

can not download face image data

Hi:
thank you for your sharing code!
I want to train a face classifier using your code. As you said , i shoud download the GENKI dataset and background dataset, But i can not download the date.
Can you download the date on github?
thans

Rectangular detection

Awesome library! I'm trying to recognize rectangular objects. Is there anything that I need to change in Pico to make this successful for training and recognition? Any guidance will be greatly appreciated.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.