Mask RCNN

Performance focused implementation of Mask RCNN based on the Tensorpack implementation. The original paper: Mask R-CNN

Overview

This implementation of Mask RCNN is focused on increasing training throughput without sacrificing any accuracy. We do this by training with a batch size > 1 per GPU using FP16 and two custom TF ops.

Status

Training on N GPUs (V100s in our experiments) with a per-gpu batch size of M = NxM training

Training converges to target accuracy for configurations from 8x1 up to 32x4 training. Training throughput is substantially improved from original Tensorpack code.

A pre-built dockerfile is available in DockerHub under fewu/mask-rcnn-tensorflow:master-latest. It is automatically built on each commit to master.

Notes

Running this codebase requires a custom TF binary - available under GitHub releases (custom ops and fix for bug introduced in TF 1.13
We give some details the codebase and optimizations in CODEBASE.md

To launch training

Data preprocessing
- We are using COCO 2017, you can download the data from COCO data.
- The pre-trained resnet backbone can be donloaded from ImageNet-R50-AlignPadding.npz
- The file folder needs to have the following directory structure:
```
data/
  annotations/
    instances_train2017.json
    instances_val2017.json
  pretrained-models/
    ImageNet-R50-AlignPadding.npz
  train2017/
    # image files that are mentioned in the corresponding json
  val2017/
    # image files that are mentioned in corresponding json
```
- If you want to use COCO 2014, please refer to here
- If you want to use EKS or Sagemaker, you need to create your own S3 bucket which contains the data in the same directory structure, and change the S3 bucket name in the following files:
  - EKS: stage-data
  - SageMaker: S3 download
- If you want to use EKS, you also need to create the a FSx filesystem
  - You don't need to link your S3 bucket if you have followed the previous steps
  - You need to change the FSx filesystem id in pv-fsx file.
Container is recommended for training
- To train with docker, refer to Docker
- To train with Amazon EKS, refer to EKS
- To train with Amazon SageMaker, refer to SageMaker

Training results

The result was running on P3dn.24xl instances using EKS. 12 epochs training:

Num_GPUs x Images_Per_GPU	Training time	Box mAP	Mask mAP
8x4	5.09h	37.47%	34.45%
16x4	3.11h	37.41%	34.47%
32x4	1.94h	37.20%	34.25%

24 epochs training:

Num_GPUs x Images_Per_GPU	Training time	Box mAP	Mask mAP
8x4	9.78h	38.25%	35.08%
16x4	5.60h	38.44%	35.18%
32x4	3.33h	38.33%	35.12%

Tensorpack fork point

Forked from the excellent Tensorpack repo at commit a9dce5b220dca34b15122a9329ba9ff055e8edc6

samikama / mask-rcnn-tensorflow Goto Github PK

mask-rcnn-tensorflow's Introduction

Mask RCNN

Overview

Status

Notes

To launch training

Training results

Tensorpack fork point

mask-rcnn-tensorflow's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent