Coder Social home page Coder Social logo

ncnn_python's Introduction

NCNN Python

Table of contents generated with markdown-toc

Overview

This is a collection of python programs using the NCNN framework. NCNN is an open source implementation of Convolutional Neural Networks by TENCENT. NCNN is optimized for mobile platforms; often it is faster to run the model on CPU than GPU.

Code was converted to Python and accelerated with numpy and cython. A basic framework to handle models, objects, bounding boxes, keypoints and transformations was developed. Python versions for Non Maximum Suppression and image manipulation was implemented. In general developing code in python does not accelerate existing C programs. The motivation for this work was to provide optimized examples for the Python platform.

There are several sites listing current implementations of CNN models that have been converted to NCNN. However, there is no single authoritative repository:

Requirements

To run the program you will need to install the following packages:

  • NCNN
  • numpy
  • opencv
  • cython is suggested

ncnn pip install ncnn

camera pip install camera-util

opencv pip install opencv-contrib-python

OpenCV on Raspi

sudo pip3 install opencv-contrib-python==4.5.3.56

As time progresses the version number might need to be increased, but many newer version have installation issues.

Optimizations

To increase performance, the following rules were observed:

  • Use OpenCV based algorithms whenever possible
  • OpenCV data manipulations are faster than Numpy
  • OpenCV is slightly faster than NCCN when manipulating images (e.g. resize)
  • Avoid indexing over tensor/matrix dimensions
  • Use Numpy boolean indexing when possible, avoid np.where
  • np.append, np.vstack or np.stack should be avoided when adding small amounts of data and regular lists and append function can be used
  • If np.concatenate is needed, apply it once to a list of all objects that need concatenation
  • If np.max and np.argmax are needed for a 3D array use examples shown below

Torch dependencies were removed. A useful guide is listed here: torch to numpy

Max Functions

Numpy does not provide a function that provides both the maximum and its indices in a data matrix. Its necessary to rearrange the matrix, find the maximum location and then convert it back to indices. Often maximum is needed to threshold and indices are needed for further location calculations.

3D Max Function to find max and location in each plane

Often needed to threshold score and find keypoints or bounding boxes.

out2D       = out3D.reshape(out3D.shape[0],-1)      # convert n,m,o array to n,m*o
idx         = out2D.argmax(1)                       # find max location in m*o range for each n
max_y,max_x = np.unravel_index(idx,out3D.shape[1:]) # unravel the location to m,o coordinates
max_out     = out2D[np.arange(len(idx)),idx]        # obtain max

3D Max Function to find max and location along axis

The location of the maxima along axis is needed when anchors are used in object detection.

k           = np.argmax(out3D,axis=0)               # max class score location along n axis
n,o         = out3D.shape[1:]                       # 
I,J         = np.ogrid[:n,:o]                       # there is max at each location of m,o
class_score = out_3D[k, I, J]                       # max class score

Implementations

Implementation Author Website Article Image Size Implementation Extraction [ms] Pipeline [ms]
Object Detection
fastestdet Xueha Ma FastestDet Artcile 352x352 anchor-free 8.5 11.4
yolo5s QEngineering 640x640 TBD TBD
yolo7-tiny Xiang Wu Original QEngineering Article 640, base=64 variable input size 40 51
yolo8 TBD TBD
yolox (nano) Megvii-BaseDetection & FeiGeChuanShu Original QEngineering Android 416x416
yolox (tiny) Original QEngineering Android 416x416 TBD TBD
yolox (small) Original QEngineering Android 640x640 TBD TBD
Hand Detector
blaze (palm-lite/full) Hand model Vidur Satija (blazepalm), FeiGeChuanShu (android) blazepalm Android 192x192 8.09(full) 7.1(light) 8.9(full)
nanodet (nanodet-hand) Hand model FeiGeChuanShu QEngineering
yolox (yolox_hand_relu/swish) Hand model FeiGeChuanShu QEngineering
pfld (handpose) FeiGeChuanShu QEngineering
Hand Skeleton
mediapipe (hand-lite/full-op) Skeleton model Vidur Satija (blazepalm), FeiGeChuanShu (android) blazepalm Android 224x224 3D 6.8 (full) 5.1(light) 7.1 (full)
Face Detector
blazeface
retinaface retinaface-R50 Tencent Tencent QEngineering Origin variable, 320x320 no scaling or padding needed 3.6 7
scrfd
ultraface
Face Detector Support
live yuan hao / Minivision_AI Website 80x80 dual stage 3.5 5.3
mask
Face Recognition
arcface mobilefacenet Xinghao Chen QEngineering Origin 112x112 no NMS, anchorfree 6 6.4
Person Detector
ultralightpose, Person Xueha Ma MobileNetV2-YOLOv3-Nano 320x320 no NMS, anchorfree 8.2 9.4
blazeperson - [x] Valentin Bazarevsky / Google Android Blazepose Google Paper 224x224 9.4 11
Person Pose
ultralightpose, Skeleton Xueha Ma UltralightPose 192x256 no NMS, anchorfree 5.84 6.4
blazepose - [x] Valentin Bazarevsky / Google Android Blazepose DepthAI Blazepose Paper 256x256 3D 6.3ms(light), 7.9ms(full), 22ms(heavy) 6.6ms (light)

Notes

Name Anchors Application of Anchors NMS Softmax Sigmoid, Tanh
age not available
race not available
arcface no anchors none no none none
blazeface detectstride8,16 anchors gnerate_proposals NMS none sigmoid
blazehand hand anchors 8,16,16,16 none NMS warp sigmoid
blazehand skelet uses mediapipe hand
blazepose person anchors 8,16,32,32,32 decode_boxes NMS warp sigmoid
blazepose skelet no anchors none no unwarp none
blur computes high frequency content in image
fastestdet no anchors none NSM none sigmoid, tanh
handpose hand detectstride8,16,32 anchors generate_proposals NMS softmax none
handpose skelet no anchors none no none none
live 2 models average of confidence of both models
mask not implemented yet
mediapipehandpose skeleton no anchors none none none none
retinaface anchors, 8,16,32 generateproposal NMS none exp
scrfd, 9 different models anchors 8,16,32 genreateproposal NMS none none
ultraface anchors 8,16,32,64 generateBBOX NMS none none
ultralightpose person no anchors none no none none
ultralightpose skelet no anchors none no none none
yolo5 not implemented yet
yolo7 detectstride8,16,32 anchors detect_stride NMS sigmoid
yolo8 grid_strides table generate_proposal NMS softmax sigmoid
yolox not implemented yet

Test Programs

  • test_arcface: runs retinaface to find faces then extracts ROI and applies arcface to compute embeddings
  • test_blazehandpose: uses blaze palm to find palms then runs handpose to find sceleton
  • test_balzeperson: finds people
  • test_blazepersonpose: finds people then runs skeleton detection
  • test_blur: extracts ROI and assess if image is blurred
  • test_fastestdet
  • test_gestures: detects palm then extracts ROI, calculates sceleton then inteprets hand sign
  • test_handpose: finds palms then computes skeleton
  • test_live: determines if image of face is live or fake
  • test_retinaface: detects faces
  • test_yolo7: detects objects

History

2023 - Initial Release
Urs Utzinger

Documentation

Documentation of utility functions:

utils_object.py

Object Types

objectTypes = {'rect':0, 'yolo80':1, 'hand':2, 'palm7':3, 'hand21':4, 'face':5, 
               'face5':6, 'person':7, 'person4':8, 'person17':9, 'person39':10 }

Simple objects with bounding box: rect, hand, face, person

Object with keypoints, plam7, hand21, face5, person4, person17, person39, where number indicates the number of keypoints.

Ojects: yolo80 (80 classes)

Object Structure

object.type=objectTypes['rect'],  # Object Type
object.bb = np.array( [           # Bounding Box, 4 or 2 points
                      [ [-1, -1] ],
                      [ [-1, -1] ],
                      [ [-1, -1] ],
                      [ [-1, -1] ]
                      ], dtype=np.float32 )
object.p  = -1.                   # Probability
object.l  = -1                    # Label number
object.k  = []                    # Keypoints 
object.v  = []                    # Keypoints visibility 

Object Methods

True or False:

  • hasKeypoints, hasVisibility, isRotated, is1D, is2D, is3D

Regular:

  • extent: max-min of bounding box

  • center: center of bounding box

  • width_height: on rotated bounding box

  • relative2absolute: scale from 0..1 to 0.. width/height

  • transform: apply cv2.transfrom to bounding box and keypoints

  • intransform: inverse transform

  • resize: resize and shift bounding box

  • square: ensure square bounding box, takes largest dimension

  • angle: angle of keypoints face5, palm7, person4, person17, person 39

  • rotateBoundingBox: rotate rectangualr bounding box by angle

  • draw: draw the bounding boxe and keypoints

  • drawRect: draw bounding box

  • printText: prints text to top left corner of bounding box

  • drawObjects: draw multiple objects

  • calculateBox: phased out

LandMarksSmoothingFilter

  • get_object_scale
  • apply
  • get_alpha
  • reset
OneEuroFilter
  • get_alpha
  • apply
LowPassFilter
  • apply
  • apply_with_alpha
  • has_last_raw_value
  • last_raw_value
  • last_value
  • reset

utils_image.py

  • resizeImage to new width and new height, padding optional
  • resizeImage2TargetSize so that width or height is targetsize and pad so that width or height is multiple of base
  • extractObjectROI extract image from any bounding box and scale to targetsize
  • extractRectROI, extract image from un-roated bounding box

utils_hand.py

  • gesture of handsceleton will select none, point, swear, thumbs up,down,left,right, vulcan, oath, paper, rock, victory, finger, hook, pinky, one, two, three, four, ok

utils_face.py

  • overlaymatch places found face on top of face

utils_cnn.py

  • nms_cv, non maximum supppresion, filters bounding boxes based on overlap, uses openCV nms, fastest approach

  • nms, simple, fully python based

  • nms_combination, likely not needed

  • nms_weighted, uses weighted approach for overlapping bounding boxes

  • nms_blaze original python code for blaze, includes mathematical explanations

  • matchEmbeddings, given one embedding and comparing it to a list of embedings, finds the one closest matching

  • Zscore (data - mean(data)) / std(data)

  • CosineDistance between two embeddings

  • EuclidianDistnace between two embeddings

  • l2normalize x/sqrt(sum(x*x)), try cv2.norm instead

  • findThreshold, might be obsolete

utils_blaze.py

  • decode_boxes
  • Anchor object
  • Anchor Params
  • generate_anchors
  • generate_anchors_np

utils_affine.py

  • composeAffine (T,R,Z,S) creates transformation matrix from translation, rotation, zoom/scale, shear
  • decomposeAffine23 converts transformation matrix back to T,R,Z,S
  • decomposeAffine44 converst transfomration matrix to T,R,Z,S

setup.py

Instruction to cythonize subroutines

Affine and Warp Transformations

Region of interested can be extracted from images using affine or warped transformation.

  • Affine transformation inlucdes scaling, rotation, translation and shear. The transformation uses a 2x3 Matrix and preserves the parallelism of lines.

  • Warped transformation includes perspective and distortion.

Affine transformation (scaling,rotation,translation,shear)

trans_mat = cv2.getAffineTransform(srcPts, dstPts)
img  = cv2.warpAffine(src=img, M=trans_mat)
src_pt = np.array([[(x, y)]], dtype=np.float32)
dst_pt = cv2.transform(src_pt,trans_mat)         


R = [ [a11, a12, t13]
      [a21, a22, t23]
      [  0,   0,   1] ]

dst = [ A | t ] {src}
      [ 0 | 1 ] {1} 

dst = A * src + t
        
dst_x = a11*x + a12*y + t13 
dst_y = a21*x + a22*y + t23

Rotation around the origin

R = [ [cos(θ), -sin(θ), 0]
      [sin(θ),  cos(θ), 0]
      [0     ,  0     , 1] ]

Rotation around center cx,cy, this is the same as T(cx,cy)∗R(θ)∗T(−cx,−cy)

R = [ [cos(θ), -sin(θ), -cx⋅cos(θ)+cy⋅sin(θ)+cx]
      [sin(θ),  cos(θ), −cx⋅sin(θ)−cy⋅cos(θ)+cy]
      [0     ,  0     ,  1] ]

Translation

T = [ [1, 0, dx],
      [0, 1, dy],
      [0, 0, 1]]

Scaling

S = [ [sx, 0, 0],
      [0, sy, 0],
      [0,  0, 1]]

Shear

H = [ [ 0, sx, 0],
      [sy,  0, 0],
      [ 0,  0, 1]]

There is also reflection and mirror image transformation.

Warp transformation

R = [ [a11, a12, a13]
      [a21, a22, a23]
      [a31, a32, a33] ]

dst = [ a11, a12, a13 ] {src}
      [ a21, a22, a23 ] {1} 
      ----------------------
      [ a31, a32 ,a33] {src}
                       {1}

Pip upload

Note to myself:

py -3 setup.py check
py -3 setup.py sdist
py -3 setup.py bdist_wheel
pip3 install dist/thenewpackage.whl
twine upload dist/*

References

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.