Coder Social home page Coder Social logo

tasfiq-k / camvid-semantic-segmentation Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 30 MB

Semantic Segmentation Using PSPnet model architecture on CamVid dataset

License: MIT License

Jupyter Notebook 100.00%
cnn computer-vision images-segmentation keras model-design numpy opencv python3 semantic-segmentation tensorflow

camvid-semantic-segmentation's Introduction

A segmentation model based on PSPNet trained on CamVid data

Camvid Semantic Segmentation

Data

The data used in this work is the Cambride-Driving Labeled Video Database, short for CamVid. Which can be found here.

N.B. The owner of this Dataset is The University of Cambridge.

The Cambridge-driving Labeled Video Database (CamVid) provides ground truth labels that associate each pixel with one of 32 semantic classes. This dataset is often used in (real-time) semantic segmentation research.

The original dataset can be found here

Data Preparation

The data was already prepared and useable from the get go, as it was already in kaggle, so no data collection and preparation was necessary. But to work with the images and train a model, some basic preparation was necessary. Including:

  • Loading the data at much smaller size (256, 256, 3) as it would cause memory issue. The original images were of size (720, 960, 3)
  • Getting the RGB mapping of the classes from the provided dictionary.
  • Creating mapping to get the RGB values by the class index.
  • Adjusting the semantic maps for all the 32 classes.

Model Training

Model Architecture

There are different model architectures to approach segmentation problems, whcih differes from the typical classfication problems. As segmentation works with the localization of the object in the image. Different approaches was necessary. The Unet, FCN (Fully Convolutional Network), PSPnet are some of the architectures that are commonly used when solving segmentation problems.

In this work, I've utilized the Pyramid Scene Parsing Network PSPNet architecture for the segmentation task. The PSPNet utilizes pyramid parsing module to harvest different sub-region representations, followed by upsampling and concatenation layers to form the final feature representation. which carries both local and global context information. Finally, the representation is fed into a convolution layer to get the final per-pixel prediction. [1]


Image source: PSPNet paper

In the paper, they've used a pretrained ResNet[2] model with the dilated network strategy to extract the feature map. But in this work, I've implemented my own network to extract the feature maps. I've achieved 0.79 DICE score and 0.76 IOU score with only 3.3M parameters.

The network which was used in this work, is given below:


Technical Details

  • Tensorflow Machine learning library was used
  • Total model parameters 3.3M+
  • Model was trained for 50 epochs
  • Loss function: Categorical Cross Entropy
  • Metrics used: DICE, IOU, ACCURACY

Result

DICE score : 0.79

IOU score : 0.76

References

[1] H.Zhao, J. Shi, X. Qi, X. Wang, J. Jia Pyramid Scene Parsing Network. https://arxiv.org/abs/1612.01105

[2] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. https://arxiv.org/abs/1512.03385

camvid-semantic-segmentation's People

Contributors

tasfiq-k avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.