Coder Social home page Coder Social logo

a1usha / bag-chair-model Goto Github PK

View Code? Open in Web Editor NEW
1.0 3.0 0.0 186.28 MB

Object detector to detect free and occupied bagchairs in an image or video for sandbags monitor project.

Python 0.30% Jupyter Notebook 99.70%
detection-api transfer-learning tensorflow annotation weights pascal-voc dataset convolution-network

bag-chair-model's Introduction

Bag chair detection model

Machine learning lives here ๐Ÿค–

Iโ€™m going to train an object detector to detect free and occupied bagchairs in an image or video for sandbags monitor project. For this purpose, I will use the deep learning technique called Transfer learning with help of Tensorflow Object Detection API.

What is Transfer Learning

In most cases, training a convolutional neural network is a difficult and time-consuming process that requires a lot of computing power and data. In modern realities, both components are quite possible to find - ImageNet (or others) as data library and Google Colab (or others) as power. However, the use of cloud computations can cost the user a pretty penny.

Therefore, in order to speed up the process and save the wallet, people use transfer learning: use already trained (mostly called pre-trained) convolution network as the starting point for their own model (use pre-trained weights as the initial weights for own model).

The whole process can be divided into three large steps (however, this is how any model training works):

  • Collect data - This may already be data collected by someone (as ImageNet) or data received manually (this is my case, I have not found large collections of images of bagchairs other than those that can be found in Google Images).

  • Annotate the data - In short, it is the process of marking the location of objects and specifying their classes in the data.

  • Fine-tune the net - Re-train the weights of the ConvNet using regular backpropagation.

Tensorflow object detection API

Not so far ago, Tensorflow developers made available an Object Detection API for simplifying process of fine-tuning of a pre-trained model. API is provided as a set of scripts, which with minor modifications can be used for your own purposes.

Next, I will describe my own experience and approach to using the above methods.

  1. I collected images and annotated them. There are several tools for annotating dataset that can be found on the Internet - I used MakeSence. It is important to note that there are several annotation formats - COCO, Pascal VOC and YOLO. The code in the future will also depend on the chosen format. I used Pascal VOC, it stores annotation in XML file. Also, you should create label map file (.pbtxt) for future processing.
<annotation>
	<folder>images</folder>
	<filename>image0.jpg</filename>
	<path>download_data/downloads/images/image0.jpg</path>
	<source>
		<database>Unspecified</database>
	</source>
	<size>
		<width>522</width>
		<height>481</height>
		<depth>3</depth>
	</size>
	<object>
		<name>occupied_bagchair</name>
		<pose>Unspecified</pose>
		<truncated>Unspecified</truncated>
		<difficult>Unspecified</difficult>
		<bndbox>
			<xmin>4</xmin>
			<ymin>2</ymin>
			<xmax>521</xmax>
			<ymax>479</ymax>
		</bndbox>
	</object>
</annotation>

pascal_label_map.pbtxt file

item {
  id: 1
  name: 'empty_bagchair'
}

item {
  id: 2
  name: 'occupied_bagchair'
}

More info about annotaion formats: Image data labeling and annotation

  1. Create TF Records. I took the script from the API as a basis and changed it a little (rather, simplified it). It is worth mentioning why this format is needed - TFRecord is Tensorflow's own binary storage format, using it for storage of dataset can have significant impact on performance of import pipeline and for training in future. More info: Tensorflow Records? What they are and how to use them
python create_tfrecords_from_xml.py `
     --image_dir=data\images `
     --annotations_dir=data\annotations `
     --label_map_path=data\label_map\pascal_label_map.pbtxt `
     --output_path=tf_data\
  1. Choose and download pre-trained model. In our main project, we are planning to use single-board computer called Raspberry Pi 4 for model inference. Therefore, models adapted to work on mobile devices were considered as a basis for training. MobileNet is a good example of such a model. The creators of this model architecture have achieved great speed by using depthwise separable convolutions. As a result, my choice fell on an model called SSD MobileNet-v2, which is an improved version of MobileNet-v1. The pre-trained model can be downloaded from here.

  2. Fill in the required fields of the configuration file. Typically, such a file is called pipeline.config. It is necessary to specify the path to the train/test tfrecord files, number of classes (in my case - 2), path to label map file and path to checkpoints (downloaded model) in it.

  3. Train the model. I used Google Colab to speed up my training process. It provides user with powerfull GPU for free (as I remember, for ~9 hours). I prepared this notebook for transfer learning using Tensorflow Object Detection API. It is worth noting that even with a powerful graphics accelerator, the learning process can take a fair amount of time.

  4. Export the frozen graph. This part also included to the training notebook.

  5. Convert model to tf lite format (optional). I prepared this notebook for model tflite model convertion. You can use this repository to run your tflite model on a Raspberry Pi or Android device

  6. Start using your model. I prepared this notebook with my results.

Results

After successfully completing model training (honestly, my free Colab session time has expired ๐Ÿ‘ฝ), I tested the model on a few photos:

case 1

case 2

case 3

However, there are small flaws ...
case 4

At the moment I have a couple of ideas on how to improve the quality of the model, they all relate to data preparation.

to be continued...

bag-chair-model's People

Contributors

a1usha avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

bag-chair-model's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.