Coder Social home page Coder Social logo

subodh-malgonde / vehicle-detection Goto Github PK

View Code? Open in Web Editor NEW
91.0 5.0 34.0 32.49 MB

Detect vehicles in a video

Jupyter Notebook 99.95% Python 0.05%
deep-learning yolov1-keras yolov1 yolo convolutional-neural-networks vehicle-detection self-driving-car

vehicle-detection's Introduction

Vehicle Detection

Udacity - Self-Driving Car NanoDegree This repository contains code for a project I did as a part of Udacity's Self Driving Car Nano Degree Program. The goal is to write a software pipeline to detect vehicles in a video.

The code is available in Vehicle_Detection.ipynb.

Algorithm Used: You Only Look Once (YOLO) v1

Brief Intro

Traditional, computer vision technique based, approaches for object detection systems repurpose classifiers to perform detection. To detect an object, these systems take a classifier for that object and evaluate it at various locations and scales in a test image. Systems like deformable parts models (DPM) use a sliding window approach where the classifier is run at evenly spaced locations over the entire image.

Other approaches like R-CNN use region proposal methods to first generate potential bounding boxes in an image and then run a classifier on these proposed boxes. After classification, post-processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene. These complex pipelines are slow and hard to optimize because each individual component must be trained separately.

YOLO reframes object detection as a single regression problem, straight from image pixels to bounding box coordinates and class probabilities. A single convolutional network simultaneously predicts multiple bounding boxes and class probabilities for those boxes. YOLO trains on full images and directly optimizes detection performance.

In this project we will implement tiny-YOLO v1. Full details of the network, training and implementation are available in the paper - http://arxiv.org/abs/1506.02640

YOLO Output

YOLO divides the input image into an SxS grid. If the center of an object falls into a grid cell, that grid cell is responsible for detecting that object. Each grid cell predicts B bounding boxes and confidence scores for those boxes.

Confidence is defined as (Probability that the grid cell contains an object) multiplied by (Intersection over union of predicted bounding box over the ground truth). Or

Confidence = Pr(Object) x IOU_truth_pred.                                                      (1)

If no object exists in that cell, the confidence scores should be zero. Otherwise we want the confidence score to equal the intersection over union (IOU) between the predicted box and the ground truth.

Each bounding box consists of 5 predictions:

  1. x
  2. y
  3. w
  4. h
  5. confidence

The (x; y) coordinates represent the center of the box relative to the bounds of the grid cell. The width and height are predicted relative to the whole image. Finally the confidence prediction represents the IOU between the predicted box and any ground truth box.

Each grid cell also predicts C conditional class probabilities, Pr(ClassijObject). These probabilities are conditioned on the grid cell containing an object. We only predict one set of class probabilities per grid cell, regardless of the number of boxes B.

At test time we multiply the conditional class probabilities and the individual box confidence predictions,

Pr(Class|Object) x Pr(Object) x IOU_truth_pred = Pr(Class) x IOU_truth_pred                    (2)

which gives us class-specific confidence scores for each box. These scores encode both the probability of that class appearing in the box and how well the predicted box fits the object.

So at test time, the final output vector for each image is a S x S x (B x 5 + C) length vector

The Model

Architecture

The model architecture consists of 9 convolutional layers, followed by 3 fully connected layers. Each convolutional layer is followed by a Leaky RELU activation function, with alpha of 0.1. The first 6 convolutional layers also have a 2x2 max pooling layers.

Architecture

Implementation

Pre-processing

Area of interest, cropping and resizing

Input to the model is a batch of 448x448 images. So we first determine the area of interest for each image. We only consider this portion of the image for prediction, since cars won't be present all over the image, just on the roads in the lower portion of the image. Then this cropped image is resized to a 448x448 image.

Normalization

Each image pixel is normalized to have values between -1 and 1. We use simple min-max normalization to achieve this.

Training

I have used pre-trained weights for this project. Training is done in 2 parts

Part 1: Training for classification

This model was trained on ImageNet 1000-class classification dataset. For this we take the first 6 convolutional layers followed by a followed by a fully connected layer.

Part 2: Training for detection

The model is then converted for detection. This is done by adding 3 convolutional layers and 3 fully connected layers. The modified model is then trained on PASCAL VOC detection dataset.

The pre-trained weights for this model (180 MB) are available here.

png

Post Processing

The model was trained on PASCAL VOC dataset. We use S = 7, B = 2. PASCAL VOC has 20 labelled classes so C = 20. So our final prediction, for each input image, is:

output tensor length = S x S x (B x 5 + C)
output tensor length = 7 x 7 x (2x5 + 20)
output tensor length = 1470.

The structure of the 1470 length tensor is as follows:

  1. First 980 values corresponds to probabilities for each of the 20 classes for each grid cell. These probabilities are conditioned on objects being present in each grid cell.
  2. The next 98 values are confidence scores for 2 bounding boxes predicted by each grid cells.
  3. The next 392 values are co-ordinates (x, y, w, h) for 2 bounding boxes per grid cell.

As you can see in the above image, each input image is divided into an S x S grid and for each grid cell, our model predicts B bounding boxes and C confidence scores. There is a fair amount of post-processing involved to arrive at the final bounding boxes based on the model's predictions.

Class score threshold

We reject output from grid cells below a certain threshold (0.2) of class scores (equation 2), computed at test time.

Reject overlapping (duplicate) bounding boxes

If multiple bounding boxes, for each class overlap and have an IOU of more than 0.4 (intersecting area is 40% of union area of boxes), then we keep the box with the highest class score and reject the other box(es).

Drawing the bounding boxes

The predictions (x, y) for each bounding box are relative to the bounds of the grid cell and (w, h) are relative to the whole image. To compute the final bounding box coodinates we have to multiply w & h with the width & height of the portion of the image used as input for the network.

Testing

The pipeline is applied to individual images. Here is the result.

png

The Video

The pipeline is applied to a video. Click on the image to watch the video or click here. You will be redirected to YouTube.

Project Video

vehicle-detection's People

Contributors

kant avatar ryan-keenan avatar subodh-malgonde avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

vehicle-detection's Issues

cannot load the weight how can I fix it?

While trying to load the weights I always get one of this errors and I could not find a solution did you have this problem? Or do you know how can I fix it ?
name 'null' is not defined
cannot import name 'load_weights'

TypeError: expected str, bytes or os.PathLike object, not Sequential

When I run:

import numpy as np
import cv2


def load_weights(model, yolo_weight_file):
data = np.fromfile(yolo_weight_file, np.float32)
data = data[4:]

index = 0
for layer in model.layers:
    shape = [w.shape for w in layer.get_weights()]
    if shape != []:
        kshape, bshape = shape
        bia = data[index:index + np.prod(bshape)].reshape(bshape)
        index += np.prod(bshape)
        ker = data[index:index + np.prod(kshape)].reshape(kshape)
        index += np.prod(kshape)
        layer.set_weights([ker, bia])

Followed by:

model = get_model()
model.load_weights(model, 'tiny-yolo-weights/yolo-tiny.weights')

I get the error TypeError: expected str, bytes or os.PathLike object, not Sequential

Any help? Thanks!

about the moviepy

At the end of Vehicle_Detection.ipynb,you deal with the videos by using the moviepy. However, i can't find the document of moviepy, can you tell me how to deal with the problem? Thank you so much.

File error

def load_weights(model, yolo_weight_file):
data = np.fromfile(yolo_weight_file, np.float32)
data = data[4:]

index = 0
for layer in model.layers:
    shape = [w.shape for w in layer.get_weights()]
    if shape != []:
        kshape, bshape = shape
        bia = data[index:index + np.prod(bshape)].reshape(bshape)
        index += np.prod(bshape)
        ker = data[index:index + np.prod(kshape)].reshape(kshape)
        index += np.prod(kshape)
        layer.set_weights([ker, bia]).  

This I pasted initials in site packages Under Utils
Then I downloaded tiny-yolo.weights
Then I ran
#2 Load weights
from utils import load_weights

model = get_model()
load_weights(model,'yolo-tiny.weights') & there's is file error .
What shall I do?

Bounding boxes not showing

Hey, I ran the code, and the bounding boxes show up for only very small thresholds like 0.007 that too randomly placed. Please help.

How to train for custom Dataset?

Hey can you explain how to train using my custom images and annotations. And refer something to make object detection from scratch.

Error while loading weights

Load weights

from utils import load_weights
model = get_model()
load_weights(model,'yolo-tiny.weights')

Error:

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:8: UserWarning: Update your Conv2D call to the Keras 2 API: Conv2D(16, (3, 3), input_shape=(3, 448, 4..., strides=(1, 1), padding="same")

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:13: UserWarning: Update your Conv2D call to the Keras 2 API: Conv2D(32, (3, 3), padding="same")
del sys.path[0]
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:15: UserWarning: Update your MaxPooling2D call to the Keras 2 API: MaxPooling2D(pool_size=(2, 2), padding="valid")
from ipykernel import kernelapp as app

InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
1653 try:
-> 1654 c_op = pywrap_tf_session.TF_FinishOperation(op_desc)
1655 except errors.InvalidArgumentError as e:

InvalidArgumentError: Negative dimension size caused by subtracting 2 from 1 for '{{node max_pooling2d_38/MaxPool}} = MaxPoolT=DT_FLOAT, data_format="NHWC", ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]' with input shapes: [?,1,224,32].

During handling of the above exception, another exception occurred:

ValueError Traceback (most recent call last)
14 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py in _create_c_op(graph, node_def, inputs, control_inputs, op_def)
1655 except errors.InvalidArgumentError as e:
1656 # Convert to ValueError for backwards compatibility.
-> 1657 raise ValueError(str(e))
1658
1659 return c_op

ValueError: Negative dimension size caused by subtracting 2 from 1 for '{{node max_pooling2d_38/MaxPool}} = MaxPoolT=DT_FLOAT, data_format="NHWC", ksize=[1, 2, 2, 1], padding="VALID", strides=[1, 2, 2, 1]' with input shapes: [?,1,224,32].

packages are not found

i tried to do conda env create -f environment.yml
but i got this .any solution??

ResolvePackageNotFound:

  • pexpect==4.0.1=py35_0
  • appnope==0.1.0=py35_0
  • sqlite==3.13.0=0
  • tk==8.5.18=0
  • icu==54.1=0
  • openssl==1.0.2j=0
  • tensorflow==0.11.0=py35_0
  • zlib==1.2.8=3
  • ptyprocess==0.5.1=py35_0
  • readline==6.2=2
  • mkl==11.3.3=0
  • tbb==4.3_20141023=0
  • qt==5.6.2=0
  • protobuf==3.0.0=py35_0
  • hdf5==1.8.17=1
  • terminado==0.6=py35_0
  • freetype==2.5.5=1
  • libpng==1.6.27=0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.