Coder Social home page Coder Social logo

realtime2dobjectrecognition's Introduction

RealTime2DObjectRecognition

Description

I developed a real-time 2D object recognition system. The first step involves thresholding the colored image (or video frame) to convert it to a binary image, distinguishing between the foreground (the object) and the background. Next, morphological filters are applied to clean up the object. Image segmentation is then used to identify the object region, and the features of the object are calculated. The features were then compared to those extracted from training data, and the input image was classified based on its (K) closest neighbors.

Demo

Threshold the input video

Segment the image into regions

Oriented Bounding Box

To achieve this, I first calculated the angle to the axis of the least central moment. Then, I rotated and projected each component's coordinates in the reverse direction. Next, I identified the four corners of the oriented bounding box and rotated them back to their original space.
Additionally, I added a number to each bounding box, starting from 0. These numbers help users identify which object or box they want to label or classify, particularly when multiple objects are present. During the prediction mode, the predicted class/label for each box is placed on top of the box. The text is scaled proportionally to the size of the bounding box.
Note that the texts are oriented similarly to the bounding box using the same rotation approach mentioned above. OpenCV only supports adding texts horizontally, which can result in texts getting cut off when they are at the edge of an image. However, the same text could fit into the frame when it is rotated. Therefore, extra work was done to take care of this edge case.

Multi-object Classification Using KNN

The Mean Scaled Euclidean distance metric was employed to measure the similarity between objects. To distinguish between known and unknown objects, a threshold distance value of 2 was chosen. Objects with a distance greater than 2 were classified as unknown. The results indicate that all predictions were accurate, which may be attributed to the distinct shapes of each object.

Features used were: the height-to-width ratio, the percentage of the oriented bounding box filled, and the first Hu moment, which is also known as the centroid moment. The first Hu moment represents the ratio of central x and y moments to its area. It measures how spread out the object is around its centroid, with smaller values indicating a more compact shape and larger values indicating a more spread-out shape.

Real Time Demo

  1. Multiple Objects Recognition: https://drive.google.com/file/d/10MzExfBppKaCkNESPtcJaIyz5V8hXSRi/view?usp=sharing
  2. Recognizing Unknown Objects: https://drive.google.com/file/d/1n3vxoAAsuEY3D5gY3iBGcrnnUgNtvobG/view?usp=sharing

Instructions

Run objectRecognition.cpp Some useful hotkeys:

  • s = Save Frame
  • q = Quit program
  • p = Nearest Neighbor classification
  • k = 3-NN classification
  • a = press at any time after pressing k or p so that you can attach a label to a bonding box.
  • t = thresholded binary image
  • m = threshold + morphological filtered binary image
  • c = threshold + morphological filtered binary image. Segmented and colored the top 5 largest regions in the image.

More notes: To gather training data, users can activate the prediction mode by pressing the 'p' key for the nearest-neighbor classifier or the 'k' key for the 3-NN classifier. The live video input is dynamically thresholded, cleaned, and segmented, and the top 5 largest items are identified and marked with oriented bounding boxes. The predicted labels for each item are displayed alongside their respective bounding boxes. If an object is not recognized, it is labeled as an "Unknown".

Users can press' a' to add a new label to the database, and the bounding boxes are sequentially numbered starting from 0. They are then prompted to enter the bounding box number and the corresponding label or class. This approach can also be used when the system made the wrong predictions โ€“ users can add more data to navigate the system to the correct future predictions.

OS and IDE

OS: MacOS Ventura 13.0.1 (22A400)

IDE: XCode

Acknowledgement

realtime2dobjectrecognition's People

Contributors

theanlim avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.