Bag chair detection model

Machine learning lives here 🤖

I’m going to train an object detector to detect free and occupied bagchairs in an image or video for sandbags monitor project. For this purpose, I will use the deep learning technique called Transfer learning with help of Tensorflow Object Detection API.

What is Transfer Learning

In most cases, training a convolutional neural network is a difficult and time-consuming process that requires a lot of computing power and data. In modern realities, both components are quite possible to find - ImageNet (or others) as data library and Google Colab (or others) as power. However, the use of cloud computations can cost the user a pretty penny.

Therefore, in order to speed up the process and save the wallet, people use transfer learning: use already trained (mostly called pre-trained) convolution network as the starting point for their own model (use pre-trained weights as the initial weights for own model).

The whole process can be divided into three large steps (however, this is how any model training works):

Collect data - This may already be data collected by someone (as ImageNet) or data received manually (this is my case, I have not found large collections of images of bagchairs other than those that can be found in Google Images).
Annotate the data - In short, it is the process of marking the location of objects and specifying their classes in the data.
Fine-tune the net - Re-train the weights of the ConvNet using regular backpropagation.

Tensorflow object detection API

Not so far ago, Tensorflow developers made available an Object Detection API for simplifying process of fine-tuning of a pre-trained model. API is provided as a set of scripts, which with minor modifications can be used for your own purposes.

Next, I will describe my own experience and approach to using the above methods.

I collected images and annotated them. There are several tools for annotating dataset that can be found on the Internet - I used MakeSence. It is important to note that there are several annotation formats - COCO, Pascal VOC and YOLO. The code in the future will also depend on the chosen format. I used Pascal VOC, it stores annotation in XML file. Also, you should create label map file (.pbtxt) for future processing.

<annotation>
	<folder>images</folder>
	<filename>image0.jpg</filename>
	<path>download_data/downloads/images/image0.jpg</path>
	<source>
		<database>Unspecified</database>
	</source>
	<size>
		<width>522</width>
		<height>481</height>
		<depth>3</depth>
	</size>
	<object>
		<name>occupied_bagchair</name>
		<pose>Unspecified</pose>
		<truncated>Unspecified</truncated>
		<difficult>Unspecified</difficult>
		<bndbox>
			<xmin>4</xmin>
			<ymin>2</ymin>
			<xmax>521</xmax>
			<ymax>479</ymax>
		</bndbox>
	</object>
</annotation>

pascal_label_map.pbtxt file

item {
  id: 1
  name: 'empty_bagchair'
}

item {
  id: 2
  name: 'occupied_bagchair'
}

More info about annotaion formats: Image data labeling and annotation

Create TF Records. I took the script from the API as a basis and changed it a little (rather, simplified it). It is worth mentioning why this format is needed - TFRecord is Tensorflow's own binary storage format, using it for storage of dataset can have significant impact on performance of import pipeline and for training in future. More info: Tensorflow Records? What they are and how to use them

python create_tfrecords_from_xml.py `
     --image_dir=data\images `
     --annotations_dir=data\annotations `
     --label_map_path=data\label_map\pascal_label_map.pbtxt `
     --output_path=tf_data\

Choose and download pre-trained model. In our main project, we are planning to use single-board computer called Raspberry Pi 4 for model inference. Therefore, models adapted to work on mobile devices were considered as a basis for training. MobileNet is a good example of such a model. The creators of this model architecture have achieved great speed by using depthwise separable convolutions. As a result, my choice fell on an model called SSD MobileNet-v2, which is an improved version of MobileNet-v1. The pre-trained model can be downloaded from here.
Fill in the required fields of the configuration file. Typically, such a file is called pipeline.config. It is necessary to specify the path to the train/test tfrecord files, number of classes (in my case - 2), path to label map file and path to checkpoints (downloaded model) in it.
Train the model. I used Google Colab to speed up my training process. It provides user with powerfull GPU for free (as I remember, for ~9 hours). I prepared this notebook for transfer learning using Tensorflow Object Detection API. It is worth noting that even with a powerful graphics accelerator, the learning process can take a fair amount of time.
Export the frozen graph. This part also included to the training notebook.
Convert model to tf lite format (optional). I prepared this notebook for model tflite model convertion. You can use this repository to run your tflite model on a Raspberry Pi or Android device
Start using your model. I prepared this notebook with my results.