Coder Social home page Coder Social logo

object-detection-with-simplified-yolo's Introduction

Object Detection with Simplified YOLO

  • Dewang Sultania
  • Tested on: Ubuntu 18.04, Intel Xeon E-2176M @ 2.70GHz 16GB, Quadro P2000 4GB (Personal Computer)
  • Dependencies: pytorch, tensorflow(tensorboard), numpy, scikit, matplotlib

Table of Contents

  1. Introduction
  2. Data Preprocessing
  3. Model Architecture
  4. Training Details
  5. Post Processing
  6. Results
  7. Run Instructions

Object detection is a fundamental task in computer vision. The problem of object recognition essentially consists of first localizing the object and the classifying with a semantic label. In recent deep learning based methods, YOLO is an extremely fast real time multi object detection algorithm.

The repository contains code for object detection using YOLO. There are 10K street scene images, with corresponding labels as training data. The image dimensions are 1281283 and labels include semantic class and bounding box corresponding to each object in the image.

The given format of the labels was (class, x1, y1, x2, y2) where where x1, y1 are the top left corner of the bounding box and x2, y2 are the bottom right corner of the bounding box. For each image the provided labels were converted to a 8X8X8 ground truth matrix, which has the same dimension as the output of YOLO detection network. The instruction of this conversion is as follows:

  • We consider a 16x16 image patch as a grid cell and thus divide the full image into 8x8 patches in the 2D spatial dimension. In the output activation space, one grid cell represents one 16x16 image patch with corresponding aligned locations.
  • For simplied YOLO, we only use one anchor box, where we assume the anchor size is the same as the grid cell size. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. This means that there is only one anchor for each object instance.
  • For each anchor, there are 8 channels, which encode Pr(Objectness), x, y, w, h, P(class=pedestrian), P(class=traffic light), and P(class=car).
  • The Pr(Objectness) is the probability of whether this anchor is an object or background. When assigning the ground-truth for this value, "1" indicates object and "0" indicates background.
  • The channels 2-5, x, y indicate the offset to the center of anchor box; w, h is the relative scale of the image width and height.
  • In channels 6-8, you need to convert the ground truth semantic label of each object into one-hot coding for each anchor boxes.
  • Note that if the anchor box does not have any object (Pr=0), you don’t need to assign any values to channels 2-8, since we will not use them during training.
Ground Truth Channels

The model takes input with dimension of 128X128X3 and outputs an activation with dimension 8X8X8.

During training, the localization and classification errors are optimized jointly. The loss function is shown as below. i indicates number of grid cells and j indicates number of anchor boxes at each grid cell. In our case, there is only one anchor box at each grid cell and B = 1.

During inference, the network is going to predict lots of overlapping redundant bounding boxes. To eliminate the redundant boxes, there are basically two steps:

  • Get rid of predicted boxes with low objectness probability (Pc < 0.6).
  • After the first step, for each class, run IoU for all the bounding boxes and cluster boxes with IoU > 0.5 as a group. For each group, find the one with highest Pc and suppress the other boxes. This is referred as non-max suppression.

Loss Curve

object-detection-with-simplified-yolo's People

Contributors

iron-stark avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.