Coder Social home page Coder Social logo

opencv_zoo's Introduction

OpenCV Zoo and Benchmark

A zoo for models tuned for OpenCV DNN with benchmarks on different platforms.

Guidelines:

  • Install latest opencv-python:
    python3 -m pip install opencv-python
    # Or upgrade to latest version
    python3 -m pip install --upgrade opencv-python
  • Clone this repo to download all models and demo scripts:
    # Install git-lfs from https://git-lfs.github.com/
    git clone https://github.com/opencv/opencv_zoo && cd opencv_zoo
    git lfs install
    git lfs pull
  • To run benchmarks on your hardware settings, please refer to benchmark/README.

Models & Benchmark Results

Hardware Setup:

x86-64:

  • Intel Core i7-12700K: 8 Performance-cores (3.60 GHz, turbo up to 4.90 GHz), 4 Efficient-cores (2.70 GHz, turbo up to 3.80 GHz), 20 threads.

ARM:

  • Khadas VIM3: Amlogic A311D SoC with a 2.2GHz Quad core ARM Cortex-A73 + 1.8GHz dual core Cortex-A53 ARM CPU, and a 5 TOPS NPU. Benchmarks are done using per-tensor quantized models. Follow this guide to build OpenCV with TIM-VX backend enabled.
  • Khadas VIM4: Amlogic A311D2 SoC with 2.2GHz Quad core ARM Cortex-A73 and 2.0GHz Quad core Cortex-A53 CPU, and 3.2 TOPS Build-in NPU.
  • Khadas Edge 2: Rockchip RK3588S SoC with a CPU of 2.25 GHz Quad Core ARM Cortex-A76 + 1.8 GHz Quad Core Cortex-A55, and a 6 TOPS NPU.
  • Atlas 200 DK: Ascend 310 NPU with 22 TOPS @ INT8. Follow this guide to build OpenCV with CANN backend enabled.
  • Atlas 200I DK A2: SoC with 1.0GHz Quad-core CPU and Ascend 310B NPU with 8 TOPS @ INT8.
  • NVIDIA Jetson Nano B01: a Quad-core ARM A57 @ 1.43 GHz CPU, and a 128-core NVIDIA Maxwell GPU.
  • NVIDIA Jetson Nano Orin: a 6-core Arm® Cortex®-A78AE v8.2 64-bit CPU, and a 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores (max freq 625MHz).
  • Raspberry Pi 4B: Broadcom BCM2711 SoC with a Quad core Cortex-A72 (ARM v8) 64-bit @ 1.5 GHz.
  • Horizon Sunrise X3: an SoC from Horizon Robotics with a quad-core ARM Cortex-A53 1.2 GHz CPU and a 5 TOPS BPU (a.k.a NPU).
  • MAIX-III AXera-Pi: Axera AX620A SoC with a quad-core ARM Cortex-A7 CPU and a 3.6 TOPS @ int8 NPU.
  • Toybrick RV1126: Rockchip RV1126 SoC with a quard-core ARM Cortex-A7 CPU and a 2.0 TOPs NPU.

RISC-V:

  • StarFive VisionFive 2: StarFive JH7110 SoC with a RISC-V quad-core CPU, which can turbo up to 1.5GHz, and an GPU of model IMG BXE-4-32 MC1 from Imagination, which has a work freq up to 600MHz.
  • Allwinner Nezha D1: Allwinner D1 SoC with a 1.0 GHz single-core RISC-V Xuantie C906 CPU with RVV 0.7.1 support. YuNet is tested for now. Visit here for more details.

Important Notes:

  • The data under each column of hardware setups on the above table represents the elapsed time of an inference (preprocess, forward and postprocess).
  • The time data is the mean of 10 runs after some warmup runs. Different metrics may be applied to some specific models.
  • Batch size is 1 for all benchmark results.
  • --- represents the model is not availble to run on the device.
  • View benchmark/config for more details on benchmarking different models.

Some Examples

Some examples are listed below. You can find more in the directory of each model!

Face Detection with YuNet

largest selfie

Face Recognition with SFace

sface demo

Facial Expression Recognition with Progressive Teacher

fer demo

Human Segmentation with PP-HumanSeg

messi

Image Segmentation with EfficientSAM

sam_present

License Plate Detection with LPD_YuNet

license plate detection

Object Detection with NanoDet & YOLOX

nanodet demo

yolox demo

Object Tracking with VitTrack

webcam demo

Palm Detection with MP-PalmDet

palm det

Hand Pose Estimation with MP-HandPose

handpose estimation

Person Detection with MP-PersonDet

person det

Pose Estimation with MP-Pose

pose_estimation

QR Code Detection and Parsing with WeChatQRCode

qrcode

Chinese Text detection PPOCR-Det

mask

English Text detection PPOCR-Det

gsoc

Text Detection with CRNN

crnn_demo

License

OpenCV Zoo is licensed under the Apache 2.0 license. Please refer to licenses of different models.

opencv_zoo's People

Contributors

aser-abdelfatah avatar charles-258 avatar crywang avatar daniaffch avatar fengyuentau avatar keanep avatar labeeb-7z avatar laurentberger avatar lpylpy0514 avatar peters avatar ruoningyu avatar ryan1288 avatar satgoy152 avatar shiqiyu avatar sidd1609 avatar swenkel avatar the-star-sea avatar tim-siu avatar wanlizhong avatar wwupup avatar xperroni avatar zhang-yang-sustech avatar zihaomu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opencv_zoo's Issues

The match function of SFace in C++ returns incorrect results

I tested the face comparison function of SFace on my Mac M1 machine, and when I tested it using face_recognition_sface/demo.py, it returned the correct results. However, when I compiled and tested it using C++ or Go, the match function returned incorrect results. After analysis, I suspect that the normalization result in the match function is incorrect. Is there a solution to this problem?

here is c++ code:

int main(){
    double ret=SFace_Test("face_detection_yunet_2022mar.onnx","face_recognition_sface_2021dec.onnx");
    printf("ret=%.10lf\n",ret);

    // images of same person. c++ ret=0.9999999593      if use demo.py it returns 0.9504736052325597
    // images of different person. c++ ret=0.9999998840.  if use demo.py it returns 0.026877811336817103
    // printf("ret=%.10f\n",ret);  use '%.10f'  return same result
}

double SFace_Test(const char* detect_model,const char* recog_model){

    cv::Mat image1 = cv::imread("./img/1.jpg");
    cv::Mat image2 = cv::imread("./img/3.jpg");

    cv::Ptr<cv::FaceDetectorYN> faceDetector = cv::FaceDetectorYN::create(detect_model, "", cv::Size(150, 150));
    cv::Ptr<cv::FaceRecognizerSF> faceRecognizer = cv::FaceRecognizerSF::create(recog_model, "");

    cv::Mat feature1=getFaceFeature(faceDetector,faceRecognizer,image1);
    cv::Mat feature2=getFaceFeature(faceDetector,faceRecognizer,image2);
    return faceRecognizer->match(feature1, feature2, cv::FaceRecognizerSF::DisType::FR_COSINE);
}

cv::Mat getFaceFeature( cv::Ptr<cv::FaceDetectorYN> faceDetector , cv::Ptr<cv::FaceRecognizerSF> faceRecognizer,cv::Mat image){
     faceDetector->setInputSize(image.size());
     cv::Mat faces;
     faceDetector->detect(image, faces);
     cv::Mat aligned_face;
     faceRecognizer->alignCrop(image, faces.row(0), aligned_face);
     cv::Mat feature;
     faceRecognizer->feature(aligned_face, feature);
     return feature;
 }

g++ -v
Apple clang version 14.0.0 (clang-1400.0.29.202)
Target: arm64-apple-darwin22.3.0
Thread model: posix

opencv version 4.7.0
Python 3.11.3

CXX = g++
CXXFLAGS += -std=c++11 -v -c -Wall $(shell pkg-config --cflags opencv4)
LDFLAGS += -lstdc++ $(shell pkg-config --libs opencv4)

Task tracker for GSOC'22 object detection project

This issue is related to the Google Summer of Code 2022 for the proposal of Object detection models for OpenCV zoo carried out by Sri Siddarth Chakaravarthy

google-summer-of-code

Adding support of Source Code Related Metrics to OpenCV Zoo for OPENCV Project

Project Abstract    Goals    Contributions    Weekly Summary    Would like to sync?    Acknowledgements    Links

Check out my GitHub Repo or follow me on LinkedIn


# Project Abstract

Contributor: 'Sri Siddarth Chakaravarthy'
Mentor: 'Yuantao Feng'
Organisation: 'OpenCV'
Project: 'Light-weight Object Detection Models for Resource-Restricted Usage'
Coding-Period: 'June 13th - September 12th'
  • OpenCV is an open-source library developed mainly for real-time computer vision operations such as object detection, object tracking, etc.
  • Currently, OpenCV supports trained models with benchmarked results on various datasets via its model_zoo. Some existing models include Yunet, Mobilenet, CRNN, etc.
  • OpenCV zoo model library is mainly focused on providing developers with trained model weights in .ONNX format and quantized models (light-weight models) that can be used on CPU-only machines, their model library contain trained model weights that can be used for real-time inference on systems that do not have high computation power (no GPU).
  • These models can also be directly deployed in applications and are quantized to int-8 versions using onnxruntime static_quantization module.

The aim of this project is to add object detection models such as Nanodet, EfficientDet, YOLOX, etc. to the list of existing models in the OpenCV model zoo to enable model inference using OpenCV python package


# Goals

🎯The goals of this GSoC project involve the following: 🎯

  • Addition of light-weight models such as NanoDet, Efficidet and YoloX to OpenCV model zoo library.
  • Model inference using OpenCV tools and frameworks.
  • Quantise FP16 models to INT8.
  • Update the model zoo library.

Getting Started

I started my work by implementing some examples and working around with OpenCV DNN. One of the OpenCV DNN module’s best things is that it is highly optimized for Intel processors. We can get good FPS when running inference on real-time videos for object detection and image segmentation applications. We often get higher FPS with the DNN module when using a model pre-trained using a specific framework. For example, let us take a look at image classification inference speed for different frameworks. Below, we can see the object detection inference performance when compared to other frameworks here.

Identifying problems

This tool only supports deep learning inference on images and videos. It does not support fine-tuning and training. Still, the OpenCV DNN module can act as a perfect starting point for any beginner to get into the field of deep-learning based computer vision and play around. The model zoo library in OpenCV acts as the working directory for developers to experiment with the tool and see its use cases with some examples. However, this library is in an incohate state which requires addition of more complex and light-weight models that can harness the OpenCV DNN module to leverage faster performance.


# Work Product

Demonstration of object detection models updated to OpenCV Zoo models library:
The final deliverable of this GSOC program was to help opencv_zoo support more light-weight object detection models so that developers will be able to infer models using cv.dnn framework, providing an alternative to existing model inference tools such as onnxruntime, openvino, tensorflow, etc. Towards the final timeline of this project we finalized models: NanoDet and YOLOX and have successfully added these model supports to opencv zoo library.


Here are some of the cv.dnn inference observed when testing ONNX formatted models on a CPU-only machine.

# Contributions

# Repository: opencv_zoo /working-branches

Pull requests created:

Completed

  1. #87 : [opencv_zoo] Added NanodetPlus model to the OpenCV Zoo models stack /cp1

  2. #86 : [opencv_zoo] Added YOLOX model to the OpenCV Zoo models stack /cp2

In Progress
3. #91 : [opencv_zoo] Added COCO_Evaluation support in OpenCV Zoo tools /cp2

Issues opened:

  1. #62 : [opencv_zoo] This issue directs to this page which consists of the detailed information about this project and all the contributions made by myself during the course of GSOC'22 /cb

  2. Megvii-BaseDetection/YOLOX#1464: [YOLOX] This issue was raised to inform an issue related to adding CPU evaluation support for YOLOX so that it can be easier to infer models and run benchmarks on CPU only devices /cp2

Tags:

community bonding period : /cb

coding period x - /cpx


# Weekly Summary

Community Bonding - May 20th to June 12th, 2022


### Coding Period 1 - June 13th to July 25th, 2022

Coding Period 2 - July 25th to September 4th, 2022

Final Evaluation Period - September 5th to September 12th, 2022

  • Week_12: Sumission of Final Evaluation Summary

# Acknowledgements

Google Summer of Code (GSoC) 2022 has been an amazing experience, the journey has taught me so many things not just technical skills but also how to work as an open source contributor, working on challenging problems, interacting with other developers around the world. I had a wonderful experience with the OpenCV community. The community is conducive, and all people are eager to help the newcomer, which I liked a lot about this community. Special thanks to my mentors Yuantao Feng, and Vadim, who saw me as a potential contributor. Without them, the work never would have been this joyful and rewarding. Interacting with them and working on this project together made this a great learning experience for me. Finally, thanks to the GSoC program, without which I wouldn’t be a part of this incredible journey and gain this memorable experience.

Future Work👨‍💻

  • I will write the COCO evaluation script for the opencv_zoo repository that can useful for running evaluations on COCO val2017 dataset on CPU.
  • I will also be working on the future work involving API creations for models and tools.
  • I will always be available for resolving community feedback and addressing bugs that may surface.

# Would like to sync?

  • We have planned to keep all the communication open 🎉 so that everyone can sync and is free to participate and help us grow! So if you have suggestions/comments about anything please do not hesitate to open up an issue ticket

# Links


Failed to load CRNN FP16 models with OpenCV 4.7

Reproducer:

import cv2 as cv

cv.dnn.readNet("text_recognition_CRNN_CH_2022oct_fp16.onnx")
cv.dnn.readNet("text_recognition_CRNN_EN_2022oct_fp16.onnx")

Error log:

[ WARN:[email protected]] global onnx_graph_simplifier.cpp:804 getMatFromTensor DNN: load FP16 model as FP32 model, and it takes twice the FP16 RAM requirement.
[ERROR:[email protected]] global onnx_importer.cpp:1054 handleNode DNN/ONNX: ERROR during processing node with 2 inputs and 1 outputs: [Conv]:(onnx_node!Conv_0) from domain='ai.onnx'
Traceback (most recent call last):
  File "/.../opencv_zoo/models/text_recognition_crnn/demo.py", line 63, in <module>
    recognizer = CRNN(modelPath=args.model)
  File "/.../opencv_zoo/models/text_recognition_crnn/crnn.py", line 16, in __init__
    self._model = cv.dnn.readNet(self._model_path)
cv2.error: OpenCV(4.7.0) /Users/opencv-cn/GHA-OCV-1/_work/opencv-python/opencv-python/opencv/modules/dnn/src/onnx/onnx_importer.cpp:1073: error: (-2:Unspecified error) in function 'handleNode'
> Node [[email protected]]:(onnx_node!Conv_0) parse error: OpenCV(4.7.0) /Users/opencv-cn/GHA-OCV-1/_work/opencv-python/opencv-python/opencv/modules/dnn/src/layers/convolution_layer.cpp:398: error: (-215:Assertion failed) inputs.size() != 0 in function 'getMemoryShapes'
>

[ WARN:[email protected]] global onnx_graph_simplifier.cpp:804 getMatFromTensor DNN: load FP16 model as FP32 model, and it takes twice the FP16 RAM requirement.
[ERROR:[email protected]] global onnx_importer.cpp:1054 handleNode DNN/ONNX: ERROR during processing node with 3 inputs and 1 outputs: [Conv]:(onnx_node!Conv_0) from domain='ai.onnx'
Traceback (most recent call last):
  File "/.../opencv_zoo/models/text_recognition_crnn/demo.py", line 63, in <module>
    recognizer = CRNN(modelPath=args.model)
  File "/.../opencv_zoo/models/text_recognition_crnn/crnn.py", line 16, in __init__
    self._model = cv.dnn.readNet(self._model_path)
cv2.error: OpenCV(4.7.0) /Users/opencv-cn/GHA-OCV-1/_work/opencv-python/opencv-python/opencv/modules/dnn/src/onnx/onnx_importer.cpp:1073: error: (-2:Unspecified error) in function 'handleNode'
> Node [[email protected]]:(onnx_node!Conv_0) parse error: OpenCV(4.7.0) /Users/opencv-cn/GHA-OCV-1/_work/opencv-python/opencv-python/opencv/modules/dnn/src/layers/convolution_layer.cpp:398: error: (-215:Assertion failed) inputs.size() != 0 in function 'getMemoryShapes'
>

Unable to load yunet facedetector using opencv

We are unable to create a face detector model in opencv using the onnx model. Attaching the sample code and error below. Code executed in google colab (CPU).

import cv2
detector = cv2.FaceDetectorYN.create("/content/yunet.onnx", "", (320, 320))

Onnx Model: https://github.com/ShiqiYu/libfacedetection.train/raw/a61a428929148171b488f024b5d6774f93cdbc13/tasks/task1/onnx/yunet.onnx

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
[<ipython-input-3-f97fee67c3d4>](https://localhost:8080/#) in <module>
      2 
      3 # Initialize detector
----> 4 detector = cv2.FaceDetectorYN.create("/content/yunet.onnx", "", (320, 320))
      5 # Read image
      6 #img = cv2.imread("res.png")

error: OpenCV(4.6.0) /io/opencv/modules/dnn/src/onnx/onnx_importer.cpp:1040: error: (-2:Unspecified error) in function 'handleNode'
> Node [[email protected]]:(onnx_node!Shape_70) parse error: OpenCV(4.6.0) /io/opencv/modules/dnn/src/onnx/onnx_importer.cpp:2846: error: (-215:Assertion failed) !isDynamicShape in function 'parseShape'

Which dataset to train model Sface onnx

Hi there,

I'm trying Sface. When I returned to the source code to train Sface, I found many versions of the training dataset: CASIA-WebFace, VGGFace2, and MS1MV2. So which training dataset for the onnx Sface in the model zoo of OpenCV?

face_detection_yunet c++ demo builds but coredumps.

I downloaded the latest OpenCV code yesterday, built and installed it on my Ubuntu 18.04 machine, then downloaded
the opencv_zoo code and built the face_detection_yunet demo.cpp code, unchanged, but when I run it it fails with the
following error:

$ ./demo
terminate called after throwing an instance of 'cv::Exception'
  what():  OpenCV(4.7.0-dev) /home/alamaral/src/opencv/modules/dnn/src/net_impl.cpp:275: error: (-204:Requested object was not found) Layer with requested id=-1 not found in function 'getLayerData'

Aborted (core dumped)

Anyone have any ideas as to what's going on? One strange thing is that if I run the python demo with python2 it fails in exactly
the same way:

python demo.py 
Traceback (most recent call last):
  File "demo.py", line 136, in <module>
    results = model.infer(frame) # results is a tuple
  File "/home/alamaral/src/opencv_zoo/models/face_detection_yunet/yunet.py", line 54, in infer
    faces = self._model.detect(image)
cv2.error: OpenCV(4.7.0-dev) /home/alamaral/src/opencv/modules/dnn/src/net_impl.cpp:275: error: (-204:Requested object was not found) Layer with requested id=-1 not found in function 'getLayerData'

but running it with python3 it works fine.

No module named 'models'

i try excute eval.py command out error

File "/data1/opencv_zoo/tools/eval/eval.py", line 15, in
from models import MODELS

python /data1/opencv_zoo/tools/eval/eval.py -m sface -d lfw -dr ../

text_detection_db/demo.py

There are two bugs
64 to 72,69 line needs a '',"

model = DB(modelPath=args.model,
               inputSize=[args.width, args.height],
               binaryThreshold=args.binary_threshold,
               polygonThreshold=args.polygon_threshold,
               maxCandidates=args.max_candidates,
               unclipRatio=args.unclip_ratio
               backendId=args.backend,
               targetId=args.target
    )

83 to 85,the score is a float in line 85

   print('{} texts detected.'.format(len(results[0])))
        for idx, (bbox, score) in enumerate(zip(results[0], results[1])):
            print('{}: {} {} {} {}, {:.2f}'.format(idx, bbox[0], bbox[1], bbox[2], bbox[3], score[0]))

extra key for `crnn` model in eval script

Running the evaluation script for text_recognition CRNN model as described here , gives the following error :

TypeError: CRNN.__init__() got an unexpected keyword argument 'charsetPath'

The CRNN constructor does not expect a charsetPath argument, so removing it fixes the issue.

Can I submit a PR for this @fengyuentau ?

A question about converting Python code into C + + code wehn calling onnx model in OpenCVDNN

Thanks the administrator for helping to convert u2net into onnx mode and successfully called it on Python using OpenCV(see https://github.com/opencv/opencv_ In zoo / issues / 13), but I have new problem when rewriting it to C + +, specifically

# Norm
   pred = normPred(d0[:, 0, :, :]) 
   # Save
   save_output('test_imgs/sky1.jpg', pred)”

Translated into C + + code, where


def normPred(d):
    ma = np.amax(d)
    mi = np.amin(d)
    return (d - mi)/(ma - mi)

def save_output(image_name, predict):
    img = cv.imread(image_name)
    h, w, _ = img.shape
    predict = np.squeeze(predict, axis=0)
    img_p = (predict * 255).astype(np.uint8)
    img_p = cv.resize(img_p, (w, h))
    print('{}-result-opencv_dnn.png-------------------------------------'.format(image_name))
    cv.imwrite('{}-result-opencv_dnn.png'.format(image_name), img_p)

Hope to give some help and advice, thank you!

Add a model file path collector in `models/__init__.py`

We now have lots of configs for benchmark and each of them has a hard written model path like the following:

Model:
  name: "YuNet"
  modelPath: "models/face_detection_yunet/face_detection_yunet_2022mar.onnx"

This is not convenient and bad for automation when it comes to upgrade a model and benchmark int8-quantized model. To address this problem, Add a model file path collector models/__init__.py and also add an option in benchmark configs to allow loading int8-quantized models.

DNN model used in SFace feature extractor?

The original implementation of the SFace feature extractor integrated to OpenCV can be instantiated from three different DNN models — ResNet50, Resnet101 and MobileFaceNet. It would be useful for research purposes if the model actually used was documented somewhere.

face_detection_yunet failed to load pretrained model as default

System information (version)
Detailed description

The explanation of models/face_detection_yunet/README.md is as follows.

# detect on camera input
python demo.py
# detect on an image
python demo.py --input /path/to/image

The face_detection_yunet/demo.py try to load face_detection_yunet.onnx as default.
But, this pretrained model was renamed to face_detection_yunet_2021sep.onnx by #7.

As a result, face_detection_yunet/demo.py failed to load pretrained model as default.

$ python3 demo.py
Traceback (most recent call last):
  File "demo.py", line 60, in <module>
    model = YuNet(modelPath=args.model,
  File "/home/opencv/opencv_zoo/models/face_detection_yunet/yunet.py", line 22, in __init__
    self._model = cv.FaceDetectorYN.create(
cv2.error: OpenCV(4.5.4-dev) /tmp/pip-req-build-tpkxoqhj/opencv/modules/dnn/src/onnx/onnx_importer.cpp:198: error: (-5:Bad argument) Can't read ONNX file: face_detection_yunet.onnx in function 'ONNXImporter'
Steps to reproduce
$ git lfs clone https://github.com/opencv/opencv_zoo.git
$ cd opencv_zoo/models/face_detection_yunet
$ python3 demo.py

Implementing multi-task CV models(GsoC-22)

Abstract

Multi-task learning (MTL) is a branch of machine learning, in which multiple tasks learn simultaneously through a shared model. It has the following advantages: improving data efficiency, reducing overfitting through shared representation, and using auxiliary information to learn quickly.
At present, the implementation of the multi-task CV model is not in OpenCV, so developers based on OpenCV cannot use the multi-task model to reduce the amount of computation and improve the accuracy.

Aim: One or more multi-task CV models trained (or borrowed, if the license is appropriate), and submitted to OpenCV model zoo.

SFace image input should be RGB instead BGR?

Hello, I'm testing the Sface implementation and in my tests I need to set a higher cosine distance instead the default one (I'm using 0.463 instead 0.363) because I was getting too much false positives. I started to explore why this could happening and I checked in the training script the author was using the mxnet library for reading the samples and accordingly with the mxnet documentation the default imdecode behavior is to load the image in the RGB format.
In the demo example, the images were read using BGR format
After feeding the Sface model with RGB images I started to get lower cosine distances that match with the standard threshold.

So it could be a mistake when the demo was wrote? Or I'm missing some step?

face_detection_yunet

os:windows
python: 3.6
opecv-python:4.5.4
face_detection_yunet/demo.py
cmd: python demo.py -i 3.jpg
error:
onnx_importer.cpp:203: error: (-210:Unsupported format or combination of formats) Failed to parse ONNX model: face_detection_yunet_2021sep.onnx in function 'cv::dnn::dnn4_v20211004::ONNXImporter::ONNXImporter'

Integrate u2net into model zoo

U2net is very typical in image segmentation. It is the implementation of pytorch. I think it is valuable to convert it to onnx and call opencv directly. I don't know whether the up master has considered it. Thank you!

u2net在图像分割这块已经非常典型,本身是pytorch的实现,我认为转换为onnx并且让opencv直接调用是有价值的工作,不知道up主是否有考虑做下,谢谢!

Add model for colorization

Model: Colorization
Topic: Colorization
Source:

Weight Size: 128.9MB
License: BSD-2-Clause License
Description: This model is used for colorization (e.g. gray -> auto-colorize), and can be port to ONNX format.

face_recognition_sface_2021dec output is always

Bug

output matrix is always
rows=1 cols=128
[-0.13746102154254913] null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null ]

Configuration

face_recognition_sface_2021dec.onnx
OpenCV arch x86_64
OpenCv V 4.3.0-1
Interface OpenCV Java
Mac v 12.6
OS MAC Monterey ( but same happens when I run this on windows machines and other Mac books).

Important: all my source faces are ALREADY ALIGNED and Cropped, using faceRecognizer.cropAlign(...) is NOT want I want to do.
and I don't know if that step is important,
it's NOT documented that the step faceRecognizer.cropAlign(...) is important, so it can be skipped in my case.
But all my images are already copped and aligned.

**Test Image **

I am passing the image at
https://facemri.com/Sarah0_aligned.jpg
( note its already aligned and face extracted).
But it happens with any image

-- start java code
FaceRecognizerSF faceRecognizer = FaceRecognizerSF.create("face_recognition_sface_2021dec.onnx", "");

Mat loadedImage = Imgcodecs.imread("Sarah0_aligned.jpg");

Mat colorMat = new Mat();
Imgproc.cvtColor(loadedImage, colorMat, Imgproc.COLOR_BGRA2BGR);
Mat resizeMat = new Mat();
Imgproc.resize(colorMat, resizeMat, new Size(128,128)); //

Mat featureCanidate = new Mat();
log("resizeMat= " + resizeMat.size());
faceRecognizer.feature(resizeMat, featureCanidate);

The current output
and the output matrix is always
rows=1 cols=128
[-0.13746102154254913] null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null null ]

The Expected Output

each value in the 128d array should be fulled with numbers.

Add Handpose from MediaPipe for palm detection and hand-skelton finger tracking

Model:

  • Hand detector: detects hand and return the bbox of palm & palm keypoints
  • Hand pose: detects keypoints from cropped hand image

Resource:

License: Apache 2.0

Note:

SFace validation

Hello I'm using the SFace implementation and my problem is that I need to use a much higher threshold (cosine distance) to it work properly. In my experiments I need to use something above 0.55 to get some reasonable results

I tried to make a very straight forward script for the validation. you can check it here: https://github.com/rodrigo2019/sface_validation/blob/master/validation.ipynb

First I thought that the problem was my data, but after some tests even with datasets like LFW and using or not tools like face align I keep having high score for different persons

Some models do not work with CANN backend

List of models not working with CANN backend:

  • lpd_yunet
  • nanodet
  • mobilenet v2
  • pphumanseg
  • dasiamrpn, cannot reproduce without the API, loading separately is good, not going to fix it.
  • handpose mp

Fixed by opencv/opencv#23319 except DaSiamRPN.

The following models need to be updated:

  • lpd_yunet, bump the opset version of Slice.
  • pphumanseg, replace with an newer and simpler one.

Reproducer of lpd_yunet:

$ python3 benchmark.py --cfg config/license_plate_detection_yunet.yaml --fp32
Benchmarking LPD_YuNet with ['license_plate_detection_lpd_yunet_2022may.onnx']
Traceback (most recent call last):
  File "benchmark.py", line 143, in <module>
    benchmark.run(model)
  File "benchmark.py", line 103, in run
    self._benchmark_results[filename][str(size)] = self._metric.forward(model, *data[1:])
  File "/home/test_user01/fytao/opencv_zoo/benchmark/utils/metrics/detection.py", line 21, in forward
    model.infer(img)
  File "/home/test_user01/fytao/opencv_zoo/models/license_plate_detection_yunet/lpd_yunet.py", line 56, in infer
    outputBlob = self.model.forward(self.output_names)
cv2.error: OpenCV(4.7.0-dev) /home/test_user01/fytao/opencv-opencv/modules/dnn/src/layers/slice_layer.cpp:640: error: (-215:Assertion failed) sliceSteps.size() == 1 in function 'initCann'

Looks like the issue is the out-of-date Slice in the model (starts and ends are attributes, and no steps). Should have a try with upgrading opset version of lpd_yunet.

Add model youtu_base_reid for person ReID

Model: youtu_base_reid
Topic: Person ReID
Source:

Weight Size: 106.9MB
License: Apache-2.0 License
Description: This model is one of the state-of-the-arts baseline models from Tencent Youtu Lab.

Add C++ demos (Updated on 2024-06-03)

We have provided at least one easy and clean Python demo for every model here in the zoo. As described in #132, Python demos sometimes can be too concise to be converted to other languages, such as C++. Hence, we decide to add C++ demos as well but they should be clean and simple enough to show how to run inference and get expected output with OpenCV.

We welcome contributions from community. Please take a look at the list below and leave comments for application or discussion before you start to dive in coding.

Status Task Models
✅ Done #138 Face Detection YuNet
✅ Done #259 Face Recognition SFace
✅ Done #177 Object Detection YOLOX
✅ Done #232 Object Detection NanoDet
✅ Done #175 Text Detection DB
✅ Done #176 Text Recognition CRNN (CN)
✅ Done #176 Text Recognition CRNN (EN)
✅ Done #176 Text Recognition CRNN (CH)
✅ Done #241 Image Classification PP-ResNet50
✅ Done #171 Image Classification MobileNet V1
✅ Done #171 Image Classification MobileNet V2
✅ Done #243 Human Segmentation PP-HumanSeg
❗️ Need Contribution QR Code Detection / Parsing WeChatQRCode
❗️ Need Contribution Person Re-Identification YoutuReID
❗️ Need Contribution Palm Detection MP-PalmDet
❗️ Need Contribution Hand Pose Estimation MP-HandPose
✅ Done #179 Person Detection MP-PersonDet
✅ Done #186 Pose Estimation MP-Pose
✅ Done #233 Facial Expression Recognition FER
✅ Done #240 Object Tracking VitTrack

Refactor benchmark configurations and the framework

Benchmark configuration files should be refactored as follows to improve robustness and readability:

Benchmark:
    name: str
    type: str 
    data:
      path: str # necessary
      files: List[str] # necessary and must be array of filenames
      sizes: List[Tuple[int, int]] # optional, but must be array of sizes (w, h); Omit to run withuout resizing
    metric:
      warmup: int
      repeat: int
      reduction: str # available reduction methods for now: median, gmean
    model:
      name: str # necessary and must match with the one in wrapper, such as 'YuNet' in face_detection_yunet/yunet.py
      backend: int
      target: int
      other_parameters: ...

Notes:

  • type should be case-insensitive and is used to initialize dataloader and metric. If current dataloaders or metrics do not support the selected type, BaseImageLoader and BaseMetric will be initialized.
  • data types are based on either _BaseImageLoader, or _BaseVideoLoader.
  • metric: warmup and repeat do not work for video-stream input data.
  • model is moved into the benchmark dictionary.
    • backend and target is moved into the model dictionary.

To do:

  • Make the initialization of dataloaders and metrics fall back to base dataloader and metric when type is not supported.
  • Set BaseImageLoader as base dataloder.
  • Set BaseMetric as base metric.
  • Instantiate model inside Benchmark class.
  • Remove setBackend and setTarget from wrapper class.
  • Add parameter backend and target in the constructor of wrapper class.
  • Add a argument parser to override Benchmark[model][backend] and Benchmark[model][target].
  • Add an example configuration file like the one above to show how to correctly construct a yaml for benchmarking.

Replace default YuNet with the one of fixed input shape to avoid 'parseShape' error

Related bug report: opencv/opencv#21340 (comment)

OpenCV does not support ONNX models that have dyanmic input shape and the 'Shape' operator for now. So running YuNet demo will get the following error message although the results are correct:

$ python demo.py
[ERROR:[email protected]] global /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/dnn/src/onnx/onnx_importer.cpp (2516) parseShape DNN/ONNX(Shape): dynamic 'zero' shapes are not supported, input 243 [ 0 0 0 51 ]
[ERROR:[email protected]] global /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/dnn/src/onnx/onnx_importer.cpp (2516) parseShape DNN/ONNX(Shape): dynamic 'zero' shapes are not supported, input 250 [ 0 0 0 34 ]
[ERROR:[email protected]] global /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/dnn/src/onnx/onnx_importer.cpp (2516) parseShape DNN/ONNX(Shape): dynamic 'zero' shapes are not supported, input 257 [ 0 0 0 34 ]
[ERROR:[email protected]] global /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/dnn/src/onnx/onnx_importer.cpp (2516) parseShape DNN/ONNX(Shape): dynamic 'zero' shapes are not supported, input 264 [ 0 0 0 51 ]
[ERROR:[email protected]] global /Users/xperience/actions-runner/_work/opencv-python/opencv-python/opencv/modules/dnn/src/onnx/onnx_importer.cpp (2516) parseShape DNN/ONNX(Shape): dynamic 'zero' shapes are not supported, input 297 [ 0 -4 ]

A patch is made to use YuNet of fixed input shape to bypass the error: opencv/opencv#21607. Although the input shape of ONNX model is fixed, OpenCV DNN infers on the actual input shape.

MPHandPose failed to infer with CUDA

Hi, I got an error when using the MPHandPose on Jetson Nano.

The error message:

---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
<ipython-input-10-ce99b873db77> in <module>
      1 for palm in palms:
----> 2     handpose = handpose_detector.infer(image, palm)
      3     if handpose is not None:
      4         hands = np.vstack((hands, handpose))

~/Tutorials/hand/mp_handpose.py in infer(self, image, palm)
    109         # Forward
    110         self.model.setInput(input_blob)
--> 111         output_blob = self.model.forward(self.model.getUnconnectedOutLayersNames())
    112 
    113         # Postprocess

error: OpenCV(4.6.0) /home/user/repos/opencv/modules/dnn/src/layers/../cuda4dnn/primitives/scale_shift.hpp:135: error: (-215:Assertion failed) 0 in function 'operator()'

Here is the code I wrote to test:

import cv2 as cv
import numpy as np

from mp_handpose import MPHandPose
from mp_palmdet import MPPalmDet

palm_detector = MPPalmDet(modelPath='palm_detection_mediapipe_2022may.onnx',
                          nmsThreshold=0.3,
                          scoreThreshold=0.8,
                          backendId=cv.dnn.DNN_BACKEND_CUDA,
                          targetId=cv.dnn.DNN_TARGET_CUDA)
handpose_detector = MPHandPose(modelPath='handpose_estimation_mediapipe_2022may.onnx',
                               confThreshold=0.8,
                               backendId=cv.dnn.DNN_BACKEND_CUDA,
                               targetId=cv.dnn.DNN_TARGET_CUDA)

image = cv.imread('002-b22d7cc8.jpg') # 有手的图片
palms = palm_detector.infer(image)
hands = np.empty(shape=(0, 47))

for palm in palms:
    handpose = handpose_detector.infer(image, palm)
    if handpose is not None:
        hands = np.vstack((hands, handpose))

Three problems found in the new object detection model YOLOX

  1. The wrapper class YoloX.py should be renamed to yolox.py for import.
  2. The multi-class nms potentially produces box duplicates. Should have determined the class for each box first.
  3. Since the input size is fixed, the anchors should be generated only once in the initialization of the wrapper.

bug in class PPHumanSeg

python:3.8.8
opencv:4.7.0.72

error: (-215:Assertion failed) _FX_WINO_IBLOCK == 3 && _FX_WINO_KBLOCK == 4 in function 'cv::dnn::_fx_winograd_accum_f32'
fix: add self._model.enableWinograd(False) in class PPHumanSeg

Errors when we try to use the latest .onnx model provided in libfacedetection.train

Hi feng,
Many thans for your useful work! When we try to use the latest .onnx model provided in libfacedetection.train, we find the following error, would you mind give us any advice how to use the latest .onnx model provided in libfacedetection.train?

os: Linux
c++ tool chain: gcc-linaro-7.5.0-2019.12-x86_64_aarch64-linux-gnu
opencv:4.7.0
cmd: /demo.cpp -m=yunet_s_320_320.onnx ( from https://github.com/ShiqiYu/libfacedetection.train/tree/master/onnx) -i=test2.png -s=true
error:
terminate called after throwing an instance of 'cv::Exception'
what(): OpenCV(4.7.0) /usr1/opencv-4.7.0/modules/dnn/src/net_impl.cpp:275: error: (-204:Requested object was not found) Layer with requested id=-1 not found in function 'getLayerData'

Add evaluation scripts (Updated on 2023-11-07)

We now have over 15 models covering more than 10 tasks in the zoo. Although most of the models are converted to ONNX straightly from its original format, such conversion may potentially lead to drop of accuracy, especially for FP16 and Int8-quantized models. To show the actual accuracy for our users, we now already have some evaluation scripts with the following conditions met in https://github.com/opencv/opencv_zoo/tree/master/tools/eval:

  1. Reproduce the claimed accuracy with the converted FP32 ONNX model using OpenCV DNN as inference framework. The claimed accuracy is either from the source repository or paper on the same dataset, which needs to be specified in the first comment of the pull request.
  2. Once it is reproduced, apply the same evaluation script on FP16 and Int8-quantized models.

Take a look at the task list below for current status. Feel free to leave a comment for application or discussion before you start to contribute.

Status Task Dataset Models Notes
✅ Done in #70 Face Detection WIDERFace YuNet -
✅ Done in #72 Face Recognition LFW SFace -
❗️ Need Contribution License Plate Detection ? LPD-YuNet -
❗️ Need Contribution Object Detection COCO YOLOX & NanoDet Refer to #91
❗️ Need Contribution Text Detection ? DB -
✅ Done in #71 Text Recognition ICDAR2003 & IIIT5K CRNN (EN & CN) -
✅ Done in #69 Image Classification ImageNet PP-ResNet50 & MobileNet V1 / V2
✅ Done in #130 Human Segmentation Mini Supervisely Persons PP-HumanSeg -
❗️ Need Contribution QR Code Detection / Parsing ? WeChatQRCode -
❗️ Need Contribution Person Re-identification ? YoutuReID -
❗️ Need Contribution Palm Detection ? MP-PalmDet -
❗️ Need Contribution Hand Pose Estimation ? MP-HandPose -
❗️ Need Contribution Person Detection ? MP-PersonDet -
❗️ Need Contribution Pose Estimation ? MP-Pose -
❗️ Need Contribution Facial Expression Recognition RAF-DB FER -
❗️ Need Contribution Object Tracking ? VitTrack Could be done via #205

image_classification_mobilenet: cannot load models

Hi, I have been playing round with the model examples provided and face_detection_yunet and human_segmentation_pphumanseg worked quite well.

I am using an up-to-date version of the opencv_zoo and OpenCV4.5.5 installed via pip on Windows

When running the demo.py example for image_classification_mobilenet I get the following error for all models, however

[ERROR:[email protected]] global D:\a\opencv-python\opencv-python\opencv\modules\dnn\src\o
nnx\onnx_importer.cpp (909) cv::dnn::dnn4_v20211220::ONNXImporter::handleNode DN
N/ONNX: ERROR during processing node with 3 inputs and 1 outputs: [Clip]:(317) f
rom domain='ai.onnx'
Traceback (most recent call last):
File "D:\Local\devel\Python\OpenCV\image_classification_mobilenet\demo.py", li
ne 41, in
'v2': MobileNetV2(modelPath='./image_classification_mobilenetv2_2022apr.onnx
', labelPath=args.label, backendId=args.backend, targetId=args.target),
File "D:\Local\devel\Python\OpenCV\image_classification_mobilenet\mobilenet_v2
.py", line 11, in init
self.model = cv.dnn.readNet(self.model_path)
cv2.error: OpenCV(4.5.5) D:\a\opencv-python\opencv-python\opencv\modules\dnn\src
\onnx\onnx_importer.cpp:928: error: (-2:Unspecified error) in function 'cv::dnn:
:dnn4_v20211220::ONNXImporter::handleNode'

Node [[email protected]]:(317) parse error: OpenCV(4.5.5) D:\a\opencv-python\opencv
-python\opencv\modules\dnn\src\onnx\onnx_importer.cpp:1613: error: (-2:Unspecifi
ed error) in function 'void __cdecl cv::dnn::dnn4_v20211220::ONNXImporter::parse
Clip(class cv::dnn::dnn4_v20211220::LayerParams &,const class opencv_onnx::NodeP
roto &)'

(expected: 'node_proto.input_size() == 1'), where
'node_proto.input_size()' is 3
must be equal to
'1' is 1

Can anybody point me to a solution, please?

Limit the combinations of targets and backends

Currently the combinations of targets and backends are too random and actually some combinations like DNN_BACKEND_CANN and DNN_TARGET_CPU are not valid. So I propose to do the following changes:

  • In demos, use a list of backendAndTargets in place of backends and targets.
  • In wrappers, drop setBackend and setTarget, use setBackendAndTarget instead.
  • In benchmark, use setBackendAndTarget in place of setBackend and setTarget.

Add another demo for hand pose classification for MP-HandPose

As of now we already have a demo extracting hand poses with MP-HandPose, but it does not provide the functionality of classifying hand poses, such as the common 👌🏻, ✌🏻 and so on. By adding this functionality, we can also show users how to customize their own hand poses to classify with MP-HandPose.

when using SFace model by c++ to match two faces , the output is always 1.0000

I write a c++ demo code using SFace model,but I find the match ouput is always 1.0000

The python code has correct output: ( the face image has already been aligned and croped from original big image to 112*112)

sface_model = cv.FaceRecognizerSF.create(
        model="./face_recognition_sface_2021dec.onnx",
        config="",
        backend_id=0,
        target_id=0
        )
    img1 = cv.imread("./tmp/face_112x112_0.bmp")
    feat1 = sface_model.feature(img1)
    print(feat1)

    img2 = cv.imread("./tmp/match/face_112x112_1.bmp")
    feat2 = sface_model.feature(img2)

    cosine_score = sface_model.match(feat1, feat2, 0)

    print("cosine_score=",cosine_score)

python output 's screenshot:
image

the c++ demo code is here:

std::string sface_model_file = "./face_recognition_sface_2021dec.onnx";
    cv::Ptr<cv::FaceRecognizerSF> sface_model = nullptr;
    sface_model = cv::FaceRecognizerSF::create(sface_model_file,"",cv::dnn::DNN_BACKEND_OPENCV,cv::dnn::DNN_TARGET_CPU);
    printf("Init SFace model ok\n");

    cv::Mat face_mat1 = cv::imread("./tmp/face_112x112_0.bmp");
    cv::Mat face_feat1;
    //face_feat1.convertTo(face_feat1, CV_32FC3, 1.0 / 255, 0);
    sface_model->feature( face_mat1,face_feat1);

    //print feature value, the values seems to be totaly different from the values output in Python :-)
    for(int j=0;j<face_feat1.cols;j++)
    {
        double *f = (double*)face_feat1.data + j;
        printf("%lf ",*f);
       
    }
    printf("\n");


    cv::Mat face_mat2 = cv::imread("./tmp/face_112x112_1.bmp");
    cv::Mat face_feat2;
    //face_feat2.convertTo(face_feat2, CV_32FC3, 1.0 / 255, 0);
    sface_model->feature( face_mat2,face_feat2);

    double  cosine_sim = sface_model->match( face_feat1,face_feat2,0);
    printf("sim=%.4lf\n",cosine_sim);

c++ output's :
image

Env: centos on x64/gcc 8.3/python 3.6.8/opencv-4.5.5

want help..... :-) :-) Thanks !

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.