Coder Social home page Coder Social logo

speeddetection's Introduction

Project Overview

Project Workflow

We will begin with a video in mp4 format. The video contains dash cam footage of a vehicle driving in downtown San Francisco. Our goal is to predict the speed of the car at each frame ( 20, 400 total ).

Our workflow will be as follows:

- Video -> 20,400 images -> Segmentation
- Segmentation -> optical flow matrix set 
- Segmentation -> one hot encoding matrix set 
- Combine one hot encoding with optical flow 
- Long Short Term Memory ( LSTM ) model 

Preparing the dataset

Segmentation

After separating the video into 20,400 images we will process each image through segmentation.

Segmentation is a form of image classification. A typical image classifier has a set number of classes. For example we can make a binary classifier that distinguishes between Adam Sandler or Brad Pitt. Our classifier will label an image as Adam Sandler if it thinks the entire image resembles Adam Sandler. Likewise with Brad Pitt.

But what if the image contains both Brad Pitt and Adam Sandler? How should the image classifier categorize the image? Segmentation is a way to work around this problem. This technique anticipates that both our classes ( Brad Pitt and Adam Sandler ) may be in an image at the same time and therefore approaches classification differently. Instead of classifying an entire image as Brad or Adam, segmentation classifies each pixel as either Adam or Brad.

So taking the information we gleaned from the Brad and Adam example, let's apply it to our current image set. We have driving footage so we need to figure out a way to categorize each object that appears in our images. Fast.Ai created a segmentation model with the following 31 classes below:

classes = [ 'Animal', 'Archway', 'Bicyclist', 'Building', 'Car', 'CartLuggagePram', 'Child', 'Column_Pole', 'Fence', 'LaneMkgsDriv',
'LaneMkgsNonDriv', 'Misc_Text', 'MotorcycleScooter', 'OtherMoving', 'ParkingBlock', 'Pedestrian', 'Road', 'RoadShoulder', 'Sidewalk', 
'SignSymbol', 'Sky', 'SUVPickupTruck', 'TrafficCone', 'TrafficLight', 'Train', 'Tree', 'Truck_bus', 'Tunnel', 'VegetagtionMisc', 'Void', 
'Wall' ] 

Once the image is segmented each of the 31 classes are represented in two ways. If the image is in matrix format the class is shown as a number ( e.g., 0 - 30 ), and in image format we differentiate classes by color ( e.g., sidewalks are purple, buildings are orange ).

Segmenting all our images will allow us to implement other techniques that will help us detect the speed at each frame. From this point we will create two sets of our segmentation image set. One for our optical flow processing and the other for our one hot encoding processing.

Optical Flow

dash cam footage optical flow example

Optical flow is a computer vision tool used for sequential data. In our case we're using it to track any movement from image to image. This allows us to see which of our classes ( pedestrians, street signs ) are moving through our sequence of images.

Let's imagine again our image as a matrix. Our matrix has dimensions of 20 x 30. Optical flow uses small filters ( i.e., a smaller matrix, usually 3 x 3 or 5 x 5 ) that hovers over every part of the 20 x 30 matrix and identifies small dense clusters of similar pixels and then compares it to the next matrix in the sequence.

Visually this ends up looking like the image above on the right, only movement is tracked. Comparing the optical flow gif to the video frame gif on the left we can see that lane markings are being picked up by the optical flow model! That's great news for us because we want to give our LSTM model an object to reference so that it can develop a speed estimate for each frame.

For more information and an implementation of optical flow check this out.

One Hot Encoding

Recall our segmented image example, we have 31 classes total. Our original video is 17 minutes of driving. All 31 classes are not always present in the frame. For example sometimes there are buildings present and other times there are not.

There is another way of communicating that fact. In matrix form an image may be size 1 x W x H where W = width, H = height, and 1 = the number of layers in the matrix. Each pixel is labeled for its corresponding class, but how do we know if a class is not present in the image?

One hot encoding helps display that information. Instead of representing our classes as individual pixels, each class is represented as an individual layer. If the class is not present in the image, the entire layer is filled with 0's. If the class is present, there are 1's wherever the class object is located and 0's everywhere else. In the one hot encoding matrix our new dimensions are now 30 x W x H.

As an example, if we wanted to check if there were any buildings in our image we could look at layer 3 of our one hot encoded matrix.

Jeremy Jordan has created a great illustration of this concept, shown below.

Combining One Hot with Optical Flow

Optical flow preserves the direction of each object. If the optical flow shows that the car is moving towards a building then when we apply matrix multiplication of optical flow and one hot matrices we preserve that movement information. Since our one hot matrix tells us when something is not in an image, when we apply matrix multiplication any optical flow movement at that area will be considered insignificant and will be rendered as 0.

In the next post we'll insert these matrices into our LSTM model.

speeddetection's People

Contributors

lvllvl avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.