davisking / dlib-models Goto Github PK

View Code? Open in Web Editor NEW

1.4K 1.4K 360.0 584.31 MB

Trained model files for dlib example programs.

License: Creative Commons Zero v1.0 Universal

C++ 100.00%

dlib-models's People

Contributors

Stargazers

Watchers

Forkers

liuhuiwisdom zhangsirm luthrabhomik caomw bill-adora viperlab 2php cmxnono wangxianliang zhencang hopkings2008 templeblock pombredanne walkoncross chasedream2015 arasharchor spacefan epratheeban mackenbaron ai-face michaelshing glnfbyhes8w8 zhongxingpeng saikswaroop dntai grseb9s linmaris howarwang lj8296311 soccergame tinyloop hoardboard justadudewhohacks zgsxwsdxg vish25v npuchenbowen wwfzs1990 sxq2004123 daxiafresh armstrongyang milindmore heedoo lezan nmber5 yanshui177 davidishere dagiopia kenh1991 halalyon jorgefdt wadv1232 shunsunsun trungvinhbui jackyuanli sainipray developer69k mahmudul-hasan-sreejon ppvastar zghzdxs jariojose marktsai0316 mingdongcheng jinshubai phongaster muriloaj amandayeyan wangfeng-skymind thzll2001 xujin8 hbohra98 cranky-cyborg pablofernandezorg joseph-zhong fjfabz ligua ccdump amirunpri2018 sanwan yaonaiming the-butterfly fiberhead ksachdeva prabhanjan215 zone-7 qqlwqiao sherwinterunez vic4key lim-0 faceny nomorningstar safae96 sidriaz yazici awesome-docs hoangtan1994 kamingli1st andyeyeye tjxy2011 kravi2018 jack19861225

dlib-models's Issues

Would you be interested in a pretrainted ResNet50?

Over the last week, I trained a ResNet50 on imagenet.
I saved the model with the bn_con layers instead of affine, so people could use it as a pretrained model to learn other tasks.
I could put it in my own repository, but here it will have more visibility :)

I used the original code of dlib to train the network, but replaced the ResNet34 implementation with my ResNet50 implementation.

Here are the results:

# crops	top-1 acc	top-5 acc
1	0.77308	0.93352
10	0.77426	0.93310

To put the results into perspective:

Original paper (10 crops)
- top-1 acc: 0.7926
- top-5 acc: 0.9475
PyTorch Models (1 crop)
- top-1 acc: 0.7615
- top-5 err: 0.9287
Keras Applications (unknown crops, but I guess 1)
- top-1 acc: 0.749
- top-5 acc: 0.921

So, this implementation gives better results than both PyTorch and Keras.
If you're interested, I will make a PR with the resnet.h file that contains all ResNet{18,34,50,101,152}.

shape_predictor_68_face_landmarks parameters used for training?

Hello,
I have a few questions regarding shape_predictor_68_face_landmarks:

what was the parameter used for training shape_predictor_68_face_landmarks? In this Link there is a default parameter, is it the same parameters used or different.
How many images were used for training? I trained for 11k images but the accuracy of the model is not good in low light conditions.

Any suggestion on handling low light conditions.

How to train not face models?

Hi! I've looked through Python example shape predictor and noticed that it uses separate face detector to get bounding box around face. But what if I want to train a model to predict not face but some other landmarks? How then do I use shape predictor? Where do I get neccessary bounding box? And can it be predicted by shape predictor itself?

Size of Image used in dlib_face_recognition_resnet_model_v1

What is the input size of image(resolution) that the model dlib_face_recognition_resnet_model_v1.dat.bz2 takes as input?

Also, you mentioned that network was trained from scratch on a dataset which was derived from a number of different datasets. While processing the data to be used for training, what was the resolution that the face images from the different datasets were cropped to and what face detector was used to crop faces from the background?

Is it possible to convert shape_predictor_68_face_landmarks dlib model to an onnx model?

Is this something feasible? Better yet can an amateur pull it off?

I've tried using net_to_xml on the .dat file and wasn't able to get it working as i ran into the following error:
error: ‘num_layers’ is not a member of ‘const dlib::shape_predictor’

would this repo be able to get the job done?
https://github.com/ksachdeva/dlib-to-tf-keras-converter/tree/master

Decrease in recognition rate

When recognizing 10 white middle aged people with "dnn_face_recognition_ex.cpp" of dlib, all were correct. However, even for white middle aged people, the recognition rate gradually decreases as the number of white middle aged people is increased to 100, 200, and 300.
Is this my way of testing wrong?
Or is it a limitation of resnet_model?
does anyone know the answer to this?
thank you.

How to train gender model?

I am trying to train the gender model (apart from the pretrained model) myself. According to the network definitions on the gender example, I have prepared a training program like this which is a combination of the network definitions of gender network and imagenet training.

#include <dlib/dnn.h>
#include <iostream>
#include <dlib/data_io.h>
#include <dlib/image_transforms.h>
#include <dlib/dir_nav.h>
#include <iterator>
#include <thread>

using namespace std;
using namespace dlib;


template <int N, template <typename> class BN, int stride, typename SUBNET>
using block = BN<con<N, 3, 3, stride, stride, relu<BN<con<N, 3, 3, stride, stride, SUBNET>>>>>;

template <int N, typename SUBNET> using res_ = relu<block<N, bn_con, 1, SUBNET>>;
template <int N, typename SUBNET> using ares_ = relu<block<N, affine, 1, SUBNET>>;

template <typename SUBNET> using alevel1 = avg_pool<2, 2, 2, 2, ares_<64, SUBNET>>;
template <typename SUBNET> using alevel2 = avg_pool<2, 2, 2, 2, ares_<32, SUBNET>>;

using net_type = loss_multiclass_log<fc<2, multiply<relu<fc<16, multiply<alevel1<alevel2< input_rgb_image_sized<32>>>>>>>>>;

#define PBSTR "||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||"
#define PBWIDTH 40
 

rectangle make_random_cropping_rect_resnet(
    const matrix<rgb_pixel>& img,
    dlib::rand& rnd
)
{
    // figure out what rectangle we want to crop from the image
    double mins = 0.466666666, maxs = 0.875;
    auto scale = mins + rnd.get_random_double()*(maxs-mins);
    auto size = scale*std::min(img.nr(), img.nc());
    rectangle rect(size, size);
    // randomly shift the box around
    point offset(rnd.get_random_32bit_number()%(img.nc()-rect.width()),
                 rnd.get_random_32bit_number()%(img.nr()-rect.height()));
    return move_rect(rect, offset);
}

// ----------------------------------------------------------------------------------------

void randomly_crop_image (
    const matrix<rgb_pixel>& img,
    matrix<rgb_pixel>& crop,
    dlib::rand& rnd
)
{
    auto rect = make_random_cropping_rect_resnet(img, rnd);

    // now crop it out as a 227x227 image.
    extract_image_chip(img, chip_details(rect, chip_dims(32,32)), crop);

    // Also randomly flip the image
    if (rnd.get_random_double() > 0.5)
        crop = fliplr(crop);

    // And then randomly adjust the colors.
    apply_random_color_offset(crop, rnd);
}

void randomly_crop_images (
    const matrix<rgb_pixel>& img,
    dlib::array<matrix<rgb_pixel>>& crops,
    dlib::rand& rnd,
    long num_crops
)
{
    std::vector<chip_details> dets;
    for (long i = 0; i < num_crops; ++i)
    {
        auto rect = make_random_cropping_rect_resnet(img, rnd);
        dets.push_back(chip_details(rect, chip_dims(32,32)));
    }

    extract_image_chips(img, dets, crops);

    for (auto&& img : crops)
    {
        // Also randomly flip the image
        if (rnd.get_random_double() > 0.5)
            img = fliplr(img);

        // And then randomly adjust the colors.
        apply_random_color_offset(img, rnd);
    }
}

// ----------------------------------------------------------------------------------------

struct image_info
{
    string filename;
    string label;
    long numeric_label;
};

std::vector<image_info> get_imagenet_train_listing(
    const std::string& images_folder
)
{
    std::vector<image_info> results;
    image_info temp;
    temp.numeric_label = 0;
    // We will loop over all the label types in the dataset, each is contained in a subfolder.
    auto subdirs = directory(images_folder).get_dirs();
    // But first, sort the sub directories so the numeric labels will be assigned in sorted order.
    std::sort(subdirs.begin(), subdirs.end());
    for (auto subdir : subdirs)
    {
        // Now get all the images in this label type
        temp.label = subdir.name();
        for (auto image_file : subdir.get_files())
        {
            temp.filename = image_file;
            results.push_back(temp);
        }
        ++temp.numeric_label;
    }
    return results;
}

std::vector<image_info> get_imagenet_val_listing(
    const std::string& imagenet_root_dir,
    const std::string& validation_images_file 
)
{
    ifstream fin(validation_images_file);
    string label, filename;
    std::vector<image_info> results;
    image_info temp;
    temp.numeric_label = -1;
    while(fin >> label >> filename)
    {
        temp.filename = imagenet_root_dir+"/"+filename;
        if (!file_exists(temp.filename))
        {
            cerr << "file doesn't exist! " << temp.filename << endl;
            exit(1);
        }
        if (label != temp.label)
            ++temp.numeric_label;

        temp.label = label;
        results.push_back(temp);
    }

    return results;
}

void display_progressbar(float percentage)
{
    uint32_t val = (int)(percentage * 100);
    uint32_t lpad = (int)(percentage * PBWIDTH);
    uint32_t rpad = PBWIDTH - lpad;
    printf("\r%3d%% [%.*s%*s]", val, lpad, PBSTR, rpad, "");
    fflush(stdout);
}

// ----------------------------------------------------------------------------------------

int main(int argc, char** argv) try
{
    if (argc != 3)
    {
        cout << "To run this program you need a copy of the imagenet ILSVRC2015 dataset and" << endl;
        cout << "also the file http://dlib.net/files/imagenet2015_validation_images.txt.bz2" << endl;
        cout << endl;
        cout << "With those things, you call this program like this: " << endl;
        cout << "./dnn_imagenet_train_ex /path/to/ILSVRC2015 imagenet2015_validation_images.txt" << endl;
        return 1;
    }

    cout << "\nSCANNING IMAGENET DATASET\n" << endl;

    auto listing = get_imagenet_train_listing(string("./men-women-classification/data"));
    cout << "images in dataset: " << listing.size() << endl;
    const auto number_of_classes = listing.back().numeric_label+1;
    if (listing.size() == 0 || number_of_classes != 2)
    {
        cout << "Didn't find the imagenet dataset. " << endl;
        return 1;
    }
        
    set_dnn_prefer_smallest_algorithms();
	

    const double initial_learning_rate = 0.1;
    const double weight_decay = 0.1;
    const double momentum = 0.9;

    net_type net;
    dnn_trainer<net_type> trainer(net,sgd(weight_decay, momentum));
    trainer.be_verbose();
    trainer.set_learning_rate(initial_learning_rate);
    trainer.set_synchronization_file("genderNET_trainer_state_file.dat", std::chrono::minutes(5));

    // This threshold is probably excessively large.  You could likely get good results
    // with a smaller value but if you aren't in a hurry this value will surely work well.
    trainer.set_iterations_without_progress_threshold(20);
    // Since the progress threshold is so large might as well set the batch normalization
    // stats window to something big too.
    set_all_bn_running_stats_window_sizes(net, 2);  

    std::vector<matrix<rgb_pixel>> samples;
    std::vector<unsigned long> labels;



    // Start a bunch of threads that read images from disk and pull out random crops.  It's
    // important to be sure to feed the GPU fast enough to keep it busy.  Using multiple
    // thread for this kind of data preparation helps us do that.  Each thread puts the
    // crops into the data queue.
    dlib::pipe<std::pair<image_info,matrix<rgb_pixel>>> data(200);
    auto f = [&data, &listing](time_t seed)
    {
        dlib::rand rnd(time(0)+seed);
        matrix<rgb_pixel> img;
        std::pair<image_info, matrix<rgb_pixel>> temp;
        while(data.is_enabled())
        {
            temp.first = listing[rnd.get_random_32bit_number()%listing.size()];
            load_image(img, temp.first.filename);
            randomly_crop_image(img, temp.second, rnd);
            data.enqueue(temp);
        }
    };


    std::thread data_loader1([f](){ f(1); });
    std::thread data_loader2([f](){ f(2); });
    std::thread data_loader3([f](){ f(3); });
    std::thread data_loader4([f](){ f(4); });
	
    // The main training loop.  Keep making mini-batches and giving them to the trainer.
    // We will run until the learning rate has dropped by a factor of 1e-3.
	int j=0;
    while(trainer.get_learning_rate() >= initial_learning_rate*1e-03)
    {
        samples.clear();
        labels.clear();
	int i=0;
        // make a 160 image mini-batch
        std::pair<image_info, matrix<rgb_pixel>> img;
        while(samples.size() < 160)
        {
            data.dequeue(img);

            samples.push_back(std::move(img.second));
            labels.push_back(img.first.numeric_label);
         
		i++;
        

        }
       		 trainer.train_one_step(samples, labels);
		j++;




    }
	

cout << " dnn_prefer_smallest_algorithms: EXECUTED 03 Mini Batch" << endl;
    // Training done, tell threads to stop and make sure to wait for them to finish before
    // moving on.
    data.disable();
    data_loader1.join();
    data_loader2.join();
    data_loader3.join();
    data_loader4.join();

    // also wait for threaded processing to stop in the trainer.
    trainer.get_net();

    net.clean();
    cout << "saving network" << endl;
    serialize("resnet_genderNET32.dat") << net;
 << endl;
}
catch(std::exception& e)
{
    cout << e.what() << endl;
}

The training process runs fine, but average loss seems consistent.
step#: 113 learning rate: 0.0001 average loss: 0.672652 steps without apparent progress: 12

I am not sure if I'm doing anything wrong. Please help.

GPU Accerlation For Age Estimation

In regards to the Age Estimation Model you mention, model is thus an age predictor leveraging a ResNet-10 architecture

But, I can't seem to get it running on my gpu.

I've added your example code for age estimation into the examples directory of DLIB, as per the documentation, and built using cmake.

cmake detects cuda and cudnn and enables then in the project.

-- Found cuDNN: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/lib/x64/cudnn.lib
-- Enabling CUDA support for dlib.  DLIB WILL USE CUDA

And build with standard, cmake --build . --config Release

But my GPU sits at 0%

Are there some additional switches that need to be set?

I can call things like dlib::cuda::set_device(0); and such, and in the build I can see it targeting and building .cu

Perhaps the MSVC compatibility issue or something. (I remember reading somewhere that NVIDIA recommends certain versions)
Or am I just missing something here?

Please add a package release/tag with all models

Plase make new release/tag for the current version of repo.
It's very useful for use with uscan tool (debian) to automate packaging system

Related:
#2

Thanks!

Convert facial recognition model to a TFLite or ml core model?

I want to convert the facial recognition .dat.bz2 file to a TFlite or a ML Core model (for Android/iOS). What's the structure of the model so I can convert it to those file types?

Automatic tools annotations for my own dataset

Hello Sir,
I worked on iris detection ( in the human eye ) in real-time and I want to train my data using a dataset that annotating to specific landmarks.
I want to know did you know how can I find specific annotation for my own dataset, in other means, how can I find an automatic annotation tool that can annotate my dataset for iris region detection
Thanks

How to train those models?

such as this model dlib_face_recognition_resnet_model_v1.dat.bz2.

Thanks

What is dlib_face_recognition_resnet_model_v1's format?

Hi,
I'd like to convert the model dlib_face_recognition_resnet_model_v1.dat to ios' CoreML format.
Apple is kind enough to provide a pip for converting many pre-built formats (caffe, keras, libsvm, sklearn, xgboost) as well as custom ones*.
Could you provide any pointers in how I'd describe the format of dlib_face_recognition_resnet_model_v1.dat?

Any help is much appreciated!

[*] https://apple.github.io/coremltools/coremltools.converters.html

CNN Face Detection with tensor as input

I am generating some face images using Conv2DTranspose. I'd like to output the 68 landmarks as the final output rather than the generated image. So, I used lambda layer to define my custom function. However, cnn_face_detection_model_v1 says it only accepts either list or array and not tensors. How Can I bring cnn_face_detection_model_v1 into my custom function? Here's the full error description:

TypeError: call(): incompatible function arguments. The following argument types are supported:
1. (self: dlib.cnn_face_detection_model_v1, imgs: list, upsample_num_times: int=0, batch_size: int=128) -> std::vector<std::vector<dlib::mmod_rect,std::allocatordlib::mmod_rect >,std::allocator<std::vector<dlib::mmod_rect,std::allocatordlib::mmod_rect > > >
2. (self: dlib.cnn_face_detection_model_v1, img: array, upsample_num_times: int=0) -> std::vector<dlib::mmod_rect,std::allocatordlib::mmod_rect >

Invoked with: <dlib.cnn_face_detection_model_v1 object at 0x000001D00FC0FF48>, <tf.Tensor 'conv2d/BiasAdd:0' shape=(?, 160, 160, 3) dtype=float32>, 1

Did you forget to #include <pybind11/stl.h>? Or <pybind11/complex.h>,
<pybind11/functional.h>, <pybind11/chrono.h>, etc. Some automatic
conversions are optional and require extra headers to be included
when compiling your pybind11 module.

[Query] FDDB dataset and frontal face detector

Hi @davisking

Did you use the FDDB dataset for training the HoG based frontal_face_detector? I am aware that the mmod_face_detector doesn't use FDDB.

Tag the current version of the model repository

Could you please tag (like v1.0) the repository in its current state, for packaging purposes?

Thanks!

dlib_face_recognition_resnet_model_v1.dat dataset

Hi all,

Would it be possible to share the dataset used for dlib_face_recognition_resnet_model_v1.dat . I understand that some images came from VGG and others from FaceScrub, but then others were scrapped from the Internet.

The reason I am asking this is we need to evaluate and prove there is no gender, race or age bias in the model, and having the dataset used for training would help a lot in this.

Thank you,
Joao

Android app crash

Hi, i would like to run your age predictor on android. I am able to compile and run app but when i put this line
deserialize("/sdcard/dnn_age_predictor_v1.dat") >> age_net;
app is crushing. File is available and deserializing is ok bcs. when i comment ">> age_net" its not crushing.
I also use all your implementation to apredictor_t type. Its look like
const unsigned long number_of_age_classes = 81;

// The resnet basic block.
template<
int num_filters,
template class BN, // some kind of batch normalization or affine layer
int stride,
typename SUBNET

using basicblock = BN<con<num_filters, 3, 3, 1, 1, relu<BN<con<num_filters, 3, 3, stride, stride, SUBNET>>>>>;

// A residual making use of the skip layer mechanism.
template<
template<int, template class, int, typename> class BLOCK, // a basic block defined before
int num_filters,
template class BN, // some kind of batch normalization or affine layer
typename SUBNET

// adds the block to the result of tag1 (the subnet)
using residual = add_prev1<BLOCK<num_filters, BN, 1, tag1>>;

// A residual that does subsampling (we need to subsample the output of the subnet, too).
template<
template<int, template class, int, typename> class BLOCK, // a basic block defined before
int num_filters,
template class BN,
typename SUBNET

using residual_down = add_prev2<avg_pool<2, 2, 2, 2, skip1<tag2<BLOCK<num_filters, BN, 2, tag1>>>>>;

// Residual block with optional downsampling and batch normalization.
template<
template<template<int, template class, int, typename> class, int, templateclass, typename> class RESIDUAL,
template<int, template class, int, typename> class BLOCK,
int num_filters,
template class BN,
typename SUBNET

using residual_block = relu<RESIDUAL<BLOCK, num_filters, BN, SUBNET>>;

template<int num_filters, typename SUBNET>
using aresbasicblock_down = residual_block<residual_down, basicblock, num_filters, affine, SUBNET>;

// Some useful definitions to design the affine versions for inference.
template using aresbasicblock256 = residual_block<residual, basicblock, 256, affine, SUBNET>;
template using aresbasicblock128 = residual_block<residual, basicblock, 128, affine, SUBNET>;
template using aresbasicblock64 = residual_block<residual, basicblock, 64, affine, SUBNET>;

// Common input for standard resnets.
template
using aresnet_input = max_pool<3, 3, 2, 2, relu<affine<con<64, 7, 7, 2, 2, INPUT>>>>;

// Resnet-10 architecture for estimating.
template
using aresnet10_level1 = aresbasicblock256<aresbasicblock_down<256, SUBNET>>;
template
using aresnet10_level2 = aresbasicblock128<aresbasicblock_down<128, SUBNET>>;
template
using aresnet10_level3 = aresbasicblock64;
// The resnet 10 backbone.
template
using aresnet10_backbone = avg_pool_everything<
aresnet10_level1<
aresnet10_level2<
aresnet10_level3<
aresnet_input>>>>;

using apredictor_t = loss_multiclass_log<fc<number_of_age_classes, aresnet10_backbone<input_rgb_image>>>;

apredictor_t age_net;

Don't have any idea. Has you?

Any trained model for ID card shape detector?

Hi @davisking,

Thank you for the wonderful work.
I would like to know if any trained model for ID card shape detector available? Basically Model should identify Bank Card or National ID card etc or any Card shape. Please let me know if I want to train a new model how to do.
My apologies if it is a basic question.

Thanks in advance.

Saved model format conversion

Is there any way to convert .dat file to .pb format or a .pb file to .dat format? Can you guide me please

Details regarding mmod_human_face_detector

Hi!

I'm trying to find information about the mmod_human_face_detector model, but I'm having a hard time finding the information I need.
I'm looking for stuff like network structure and/or some papers describing the method it's based on.
I'm also wondering if it is a regional based CNN with region proposals, or if it is an implementation of YOLO (You Only Look Once).

Is this information available somewhere?

Thanks.

Need more tuning of mmod_face_detector.dat's parameters

Dear @davisking ,
My name is Kevin Patel, I am 3rd Year B.Tech student in Computer Science program. I and my team are building a system for video-base dynamic human face recognition as a part of my mid term project. We have used pre-trained CNN model (MMOD) which is fast and accurate than other models (like MTCNN which is accurate but very slow for live applications) that we have tried, but the problem with this is that this model can detect the human faces with good accuracy up to some threshold distance only. If the person is standing away from that threshold than his/her face is not detected. And moreover we are trying to detect and recognize face irrespective of angle, distance & amount of light. So for producing better optimized results, we are trying to tune the hyper parameters. So on searching on web, we found your paper on Max Margin Object Detection (MMOD) published on 2015, in which you have mentioned MMOD Optimizer (Algorithm 2). But we don't know from where to start from can you guide us for the same. Sir can you provide python code or more detailed algorithm for parameters tuning which we can use. Sir for your reference I have attached the pre-trained model file mmod_human_face_detector.dat that we are currently using.

It would be very helpful for us if you look and help us with this parameters tuning, waiting for positive reply from your side.

How to generate my own model like shape_predictor_68_face_landmarks.dat.bz2?

How to generate model like shape_predictor_68_face_landmarks.dat using personal dataset?

Two boxes, same points?

In the dlib_faces_5points.tar there are two boxes for each picture in the xml files with the same 5 landmark points. Why are there two boxes? Did you run two different face detectors? Maybe dlib's and OpenCV's face detectors?

An example of the two boxes from the third image in the train set:

Details about training dnn_age_predictor_v1.dat.bz2

Hi, team, may I ask, you said about dnn_age_predictor_v1.dat.bz2 model: "This model is thus an age predictor leveraging a ResNet-10 architecture", so you just use ResNet-10 architecture (dnn_age_predictor_v1.dat.bz2 model has been trained from scratch, no pretrained weights from other models) or use pretrained ResNet-10 model (transfer learning)?

It would be very good if there are some document, describing the detailed process of training this model and metrics on test dataset.

I ask about that, because ResNet-10 model (description about this model I found here: https://github.com/BVLC/caffe/wiki/Model-Zoo) has been trained on ImageNet. As ImageNet does not allow commercial use, therefore models, which are fitted on this dataset, do not allow commercial use (I am actually not a lawyer, I just think about that like this). But dnn_age_predictor_v1.dat.bz2 model might use ResNet-10 pretrained model and is released under CC0-1.0, which allows commercial use. So, there might be a conflict. (I just think about that like this)

How to config shape_predictor_68_face_landmarks.dat to use GPU

Hi team, may I ask is it possible to config my environment to let shape_predictor_68_face_landmarks.dat only use GPU? Or is there a way to check whether this model supports GPU?

how to get dlib_face_live_detector_v1.dat

dlib_face_live_detector_v1.dat is not exist

need info

Other then this model "shape_predictor_68_face_landmarks.dat.bz2" can i use other models for commercial purpose which is mentioned in dlib-models?

mmod_human_face_detector.dat and WIDER dataset

Hi. I tested DLIB facial detection using CNN (zero upscale) and mmod_human_face_detector.dat against WIDER style dataset type video using a selfie mask filter (tiktokapp) and the results look pretty good.

I think the WIDER dataset definitely represents some of the new frontiers/challenges of facial detection/recognition. Do you think its possible to perform facial recognition/identification given occlusion like in the WIDER dataset (http://mmlab.ie.cuhk.edu.hk/projects/WIDERFace/) using simple RGB monocam imagery or do you think that different sensors will be needed?

mmod_human_face_detector.dat:
"This is trained on this dataset: http://dlib.net/files/data/dlib_face_detection_dataset-2016-09-30.tar.gz.
I created the dataset by finding face images in many publicly available image datasets (excluding the FDDB dataset). In particular, there are images from ImageNet, AFLW, Pascal VOC, the VGG dataset, WIDER, and face scrub." - https://github.com/davisking/dlib-models

Computer specs required when training

Thanks for the wonderful library.
I always thank "dlib".

One thing I want you to teach today.
I want to know the computer specs when you trained "dlib_face_recognition_resnet_model_v1.dat".
I especially want to know about memory capacity.

I am trying "dnn_metric_learning_on_images_ex", but the processing is interrupted with "EXCEPTION IN LOADING DATA".
I have 800,000 face images and 2,000 people.
By the way, my computer has 16GB memory and GPU used GTX1070.
I think it is obviously out of memory.

I want to know memory capacity when I trained "dlib_face_recognition_resnet_model_v1.dat" before going to buy a new computer.
Thanks in advance.
Please give me a hint.

Free alternative to shape_predictor_68_face_landmarks.dat.bz2

I have noticed that the shape_predictor_68_face_landmarks.dat.bz2 model file is not available for commercial use. The dataset that was used to train the model is owned by the Imperial College London.

Do you know of any decent free alternatives that can be used for commercial projects?

So far I have found no way other than creating your own collection of photographs plus annotations.

How to reduce size of the model?

Hello,
I have trained shape_predictor_68_face_landmarks with the params mentioned the paper. The problem is for my application i need the model size to less the 10mb. i was able to get reduce model size by keeping less params, by doing so accuracy became very low.
Is there any way to save the model with less model size. In TensorFlow they use quantization for reducing model size, similarly is there any thing that i can follow.

(sorry for asking this kind of questions)

accuracy

Hey ..
please how can I know the accuracy of( shape_predictor_68_face_landmarks.dat.bz2)

Trained HOG detector, but variation(jitter and shake) in bounding box output result.

Trained HOG detector with custom hand datasets and configurations are mentioned below.

epsilon: 0.05
detection window size: 80 100
c: 800

I inference trained HOG detector for live video of hand and observed issues as like below:

But output bounding box has jitter/shake when hand is in stagnant or fixed potion.

Even output bounding box has jitter/shake when hand moves little side ways, up and down.

(FYI: The dlib's in-built face HOG detector works well and it has stabilized output when face is stagnant, moves sideways,up and down).

Please guide me how to make HOG(SVM) trained model stabilization in custom dataset as like dlib's face HOG detector.

(Dataset's image is attached for your reference).

Fatal error when trying to detect faces in very small images

Hi,

I am using the net_type as you've proposed here, in conjunction with mmod_human_face_detector.dat.

Everything works perfect, except for very small images. One real example is an image with dimension (width 7 x height 9), that throws the error described bellow:

Error detected at line 626
Error detected in file ../dlib/cuda/tensor.h
Error detected in function dlib::alias_tensor_instance dlib::alias_tensor::operator()(dlib::tensor&, size_t) const

Failing expression was offset+size() <= t.size()
offset: 18446744073709551395
size(): 189

Well... I compiled dlib using CMAKE without CUDA support (I think so), so I am inclined to believe the problem is really the size of the image and some size limitation of the model.

cmake .. -DDLIB_USE_CUDA=0

Is there any lower limit for the image dimensions within the context I tried to describe?

SO : CentOS 7
Language: C++

How to use Age/Gender model?

Hey,

There are two Age & Gender trained model files. I want to use them in Python. I searched the whole internet, there is no documentation or example, how to use them.

body pose model?

Hi, maybe this does not go here but is it possible to have something similar to 68 face landmarks but with full body or upper body? I am developing on unity and face detection is pretty much covered, however for body pose(my target is android) I haven't had any luck. I have tried ARcore(not compatible) AR foundation, OpenCV python, barracuda using an onnx model (not compatible) tried openpose, posenet, etc but I haven't found any pretrained body model similar to faces yet.

In this asset:
https://assetstore.unity.com/packages/tools/integration/opencv-plus-unity-85928

there is an example that use face recognition, and by using the face and eye haar cascades plus the 68 face annotation landmarks it creates a real time recognition example. I want to replicate this for a full body, or upper body using body annotations with the fullbody and upper body haar cascades, is it possible? my goal is to create a fitting room using AR for android

sorry if it's not okay to post it here, I've been researching about it for 2 weeks and it's driving me crazy.
it doesn't have to be that many landmarks, I'm ok with 13 or something

thank you

98 face landmarks

Somehow I found Training and Testing Images of 98 face landmarks. Yet I am using 68 face landmarks in my current project but after some testing I found that this 98 face landmarks is more stable than 68 face landmarks. But I don't know how to use this as ".dat" file so can you please help me or guide me how can I creat .dat file with this. Or can you make this dataset in ".dat" file please.

Link of 98 landmarks
https://wywu.github.io/projects/LAB/WFLW.html

There are more stable 68 landmarks module but can you guide me how to use these as ".dat" module.

https://github.com/braindotai/Facial-Landmarks-Detection-Pytorch

https://github.com/facebookresearch/supervision-by-registration

About training landmark model

Hi,
Thank you very much for applying the trained landmark model.
I want to use the code supplied by dlib library to train my own model using I-BUG dataset, but I don't know how to convert those pts files to xml which used in the original code. Would you please explain how to train a model like you published here?
I would appreciate any of you answers. Thank U.

Verification rate of face recognition model

I tested your model (dlib_face_recognition_resnet_model_v1) using LFW, but result is 85%, while you told us it has accuracy 99.38%. I wonder if any recommendation to boost the accuracy. I think the problem in face preparation before feature extractor. After we determine the five points of landmarks, what is the next steps (the alignment method, centered method for the faces).
thanks in advance.

How to use dnn_gender_classifier_v1.dat in Python?

Hello, @davisking !

How to load the dnn_gender_classifier_v1.dat in Python ?
I can't find alternative for C++ deserialize in Python.

simultaneous multithreading processor dnn_metric_learning_on_images.cpp

Hi ,

the program dnn_metric_learning_on_images.cpp , can do the training on a computer less than 5 processors , and if it can influence of bad results of training ?

thank you :) ,

How to train my own dlib_face_recognition_resnet_model_v1.dat

How to train my own dlib_face_recognition_resnet_model_v1.dat. Please help and if possible share the python api for the same.