Coder Social home page Coder Social logo

deeplabv3-tensorflow's Introduction

DeepLabV3 Semantic Segmentation

Reimplementation of DeepLabV3 Semantic Segmentation

This is an (re-)implementation of DeepLabv3 -- Rethinking Atrous Convolution for Semantic Image Segmentation in TensorFlow for semantic image segmentation on the PASCAL VOC dataset. The implementation is based on DrSleep's implementation on DeepLabV2 and CharlesShang's implementation on tfrecord.

Features

  • Tensorflow support
  • Multi-GPUs on single machine (synchronous update)
  • Multi-GPUs on multi servers (asynchronous update)
  • ImageNet pre-trained weights
  • Pre-training on MS COCO
  • Evaluation on VOC 2012
  • Multi-scale evaluation on VOC 2012

Requirement

Tensorflow 1.4

python 3.5
tensorflow 1.4
CUDA  8.0
cuDNN 6.0

Tensorflow 1.2

python 3.5
tensorflow 1.2
CUDA  8.0
cuDNN 5.1

The code written in Tensorflow 1.4 are compatible with Tensorflow 1.2, tested on single GPU machine.

Installation

sh setup.sh

Train

  1. Configurate config.py.
  2. Run python3 convert_voc12.py --split-name=SPLIT_NAME, this will generate a tfrecord file in $DATA_DIRECTORY/records.
  3. Single GPU: Run python3 train_voc12.py (with validation mIOU every SAVE_PRED_EVERY).

Performance

This repository only implements MG(1, 2, 4), ASPP and Image Pooling. The training is started from scratch. (The training took me almost 2 days on a single GTX 1080 Ti. I changed the learning rate policy in the paper: instead of the 'poly' learning rate policy, I started the learning rate from 0.01, then set fixed learning rate to 0.005 and 0.001 when the seg_loss stopped to decrease, and used 0.001 for the rest of training. )

Updated 1/11/2018

I continued training with learning rate 0.0001, there is a huge increase on validation mIOU.

Updated 2/05/2018

There was an improvement on the implementation of Multi-grid, thanks @howard-mahe. The new validation results should be updated soon.

Updated 2/11/2018

The new validation result was trained from scratch. I didn't implement the two stage training policy (fixing BN and stride 16 -> 8). I may try few more runs to see if there is an improvement on the performance, but I think it is a fine-tuning work.

mIOU Validation
paper 77.21%
repo 70.63%

The validation mIOU for this repo is achieved without multi-scale and left-right flippling.

The improvement can be achieved by finetuning on hyperparameters such as learning rate, batch size, optimizer, initializer and batch normalization. I didn't spend too much time on training and the results are temporary.

Welcome to try and report your numbers.

deeplabv3-tensorflow's People

Contributors

ndong-petuum avatar zl1446 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

deeplabv3-tensorflow's Issues

get bug of memory leaks

hi I get some memory leaks when I run this code, I try to add sess.graph.finalize() to frozen the graph, but it still exists, Can anyone give some advice? thanks.

Training Batch Normalization

Hi, training the batch normalization is most important contribution in the paper

we change the training protocol in [10, 11] with three main differences: (1) larger crop size, (2) upsampling logits during training, and (3) fine-tuning batch normalization

For training the batch normalization,

Our added modules on top of ResNet all include batch normalization parameters, which we found important to be trained as well... we employ output stride = 16 and compute the batch normalization statistics with a batch size of 16... After training on the trainaug set with 30K iterations and initial learning rate = 0.007, we then freeze batch normalization parameters, employ output stride = 8, and train on the official PASCAL VOC 2012 trainval set for another 30K iterations and smaller base learning rate = 0.001.

And

convolutions with rates = (6, 12, 18) when output stride = 16 (all with 256 filters and batch normalization),...Note that the rates are doubled when output stride = 8.

From these notes, I want to mention:

  1. The batch normalization parameters in resnet (from block 1->4) and in ASPP need to trained with batch size of 16 and output stride = 16 on Augmentation set
  2. After 30k, these batch normalizations have to freeze and train on original PASCAL (which for fine segmentation class than aug) in the output stride =8

Currently, I did not find your implementation perform the two points. Hope it can achieve similar results when you consider it. My implementation just achieves 73.4% using the first point.

从VOC2012的训练模型Finetune的错误问题

你好,非常好的工程, 我这边遇到我从VOC2012训练出的模型无法对新数据集做FINETUNE, 我使用了--not-restore-last标志仍然报错, 错误主要提示无法从21类匹配到18类(这个是我的类别数), 请问这个是什么原因呢?

ASPP model without extra blocks

Hello, let's see the original sentence first

ASPP: In Tab. 5, we experiment with the effect of incorporating multi-grid in block4 and image-level features to the improved ASPP module. We first fix ASP P =(6, 12, 18) (i.e., employ rates = (6, 12, 18) for the three parallel 3 × 3 convolution branches), and vary the multigrid value. Employing Multi Grid = (1, 2, 1) is better than Multi Grid = (1, 1, 1), while further improvement is attained by adopting Multi Grid = (1, 2, 4) in the context of ASP P = (6, 12, 18) (cf ., the ‘block4’ column in Tab. 3). If we additionally employ another parallel branch with rate = 24 for longer range context, the performance drops slightly by 0.12%. On the other hand, augmenting the ASPP module with image-level feature is effective, reaching the final performance of 77.21%

  1. So the ASPP model shows in Fig. 5 only for 4 blocks, not 7 blocks. The 7 blocks only used for the cascaded model (Tab. 4) and achieved 76.66%. Thus, the paper used ASPP with 4 blocks as the final model. When I looked at your implementation, I found that you used ASPP for 7 blocks, that may be not same as the paper mentioned. Please correct me if I was wrong.
  2. The batch norm was trained with decay weight is 0.9997, instead of 0.997. Please check it. The author used batch norm layer after conv 3x3 in ASPP model. Do you think the relu also used after the batch norm layer? It means conv(3x3)+batchnorm+relu
    Thanks

No result in training

Excuse me, why I did not report any program errors in training, but there has been no results, no weight training, even accuracy, loss value and IOU have no output, and the final output of a snapshots, nothing, and this is why, thank you for your reply.

streaming mean iou

first thanks for your work
and I found you use streaming mean iou as evaluation method.
but when I output the result, I found the first result is zeros. like:
step 0 , mean_iou: 0.000000 (4.134 sec/step) setp 1 , mean_iou: 0.324524(0.286 sec/step)

I don't know why the result is zeros even my pred_img and gt_img is the same? I search many research and streaming_mean_iou in tensorflow is calculated through (tp/(tp+fp+fn)), I really want to know the reason? thanks for answer.

Multigrid block misunderstanding.

Hi Nanqing,

First, thanks a lot for your implementation, this is a great piece of work!

I feel like you misunderstood the Multigrid block of DeepLabV3 network. You create a bottleneck_hdc unit that do:

conv1: conv 1x1, stride=1 rate=rate*multi_grid[0]
conv2: conv 3x3, stride=stride, rate=rate*multi_grid[1]
conv3: conv 1x1, stride=1, rate=rate*multi_grid[2]

and then you repeat 3 times bottleneck_hdc unit.
In ResNet bottleneck, conv1 is a decreasing projection, conv2 is a 3x3 convolution where dilation is supposed to happen, conv3 is an increasing projection. Please note that projections are 1x1 convolutions, for which dilation_rate doesn't have any effect.

What is described in DeepLabV3 for block n | n={4,5,6,7}, is a succession of 3 standard bottleneck_v1 whose dilation rates are multi_grid=(1,2,1) for each of the 3 blocks, respectively. The corrected code should be:

multi_grid=(1,2,1)
D = 512
for r in range(4):
  with tf.variable_scope('block%d'%(r+4), [net]):
    rate=2**(r+1)
    for i in range(3):
      with tf.variable_scope('unit_%d'%(i+1), [net]):
        dilation_rate=rate*multi_grid[i]
        net = bottleneck(net, D*4, D, stride=1, rate=dilation_rate)

How about performance in comparison with paper?

Great to know someone implement Deeplabv3. It is not an issue, I just want to know the performance what you have achieved. Could you tell me how about mIoU did you achieve? Thanks so much.

This is my log, but it is too worse. One more thing, your code did not provide resnet pretrain and did not use it, I guess the reason of below log is training from scratch

step 8902, tot_loss = 1.370319, seg_loss = 1.045083, reg_loss = 0.325237, mean_iou: 0.031608, lr: 0.009866(0.552 sec/step)
step 8903, tot_loss = 1.761859, seg_loss = 1.436624, reg_loss = 0.325235, mean_iou: 0.031609, lr: 0.009866(0.535 sec/step)

runs error

import libs.preprocess.voc as preprocess
ImportError: No module named preprocess.voc

shuffle batch out of range error

Prepared data, error 👍

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_4_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 4, current size 0)
	 [[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

PosiDeLabVv3 TysFouth-Fouth-Fas-FasMaskrCNN和TysSouth-DeLIPAB RESNET

(dompamine-env) test@test:~/DeepLabV3-Tensorflow$ python3 train_voc12.py
Traceback (most recent call last):
File "train_voc12.py", line 17, in
from libs.datasets.dataset_factory import read_data
File "/home/test/DeepLabV3-Tensorflow/libs/datasets/dataset_factory.py", line 4, in
import tensorflow.contrib.slim as slim
ImportError: No module named 'tensorflow.contrib'

Number of feature after block 3

Hello, I checked your implementation in ASPP model with resnet 101. I foubd that your number of feature map after block 3 ( starts use dilation with rate 2) and in the ASPP may be not correct. If I understand, it must be 1024 for block 3, 2048 for block 4 and 2048×5 for ASPP, instead of 256 in your code. Am I right?

Checkpoint Error

Hi, Nanqing,
First, thanks a lot for your implementation, this is a great piece of work! I wonder if the pretrained model
you provided is corresponding to your code as the training have taken so much time.Could you share the snapshot that you completed?My email address is [email protected]'m waiting for your reply.

TRAIN on voc2012

你好,

非常欣赏你的工程, 我今天基于VOC2012进行了训练, 使用学习率=0.01从头训练, 感觉收敛还是很慢,

请问这个现象正常吗? 按照教程训练, 这个指标可以达到85%以上吗?

training error

你好,
非常感谢你的工程,但我在训练过程中遇到一个问题。 seg_logits = tf.boolean_mask(raw_output_up, mask) raw_output_up 是(1,321,321,21), mask 是(321, 321), 报错提示 Shapes (1, 321) and (321, 321) are incompatible 请问这应该怎么解决呢?

'''image-level feature'''

Is the '''image-level feature''' implemented in the wrong way?
It should be a global feature with pooling and unpooling

ImageNet pre-trained weights for ResNet101

This is a discussion about using pre-train, not the bug

As you know, the Imagenet has 4 blocks, and the Imagenet pre-trained will store the weight of the 4 blocks. The size of these block will be

image

However, the deeplabv3 has 7 blocks (extra block 5, block 6, and block 7 as replicas of block 4). So if I understand correctly, it will be

block3 (1x1,256; 3x3,256;1x1,1024)-> block 4 (1x1,512; 3x3,512;1x1,2048)-> block 5 (1x1,512; 3x3,512;1x1,2048)-> block 6 (1x1,512; 3x3,512;1x1,2048)-> block 7 (1x1,512; 3x3,512;1x1,2048)

We can copy the weight of block4 to three extra blocks, except the first convolution of each extra block due to shape mismatch. The reason is that the first convolution of block 4 (conv1) will be 1x1x1024 (where 1024 is the last shape of input block 3), while the first convolution of the extra block is 1x1x2048 (where 2048 is the last shape of input block 4). Do you have the same problem as above? Or the author just use pre-train weight for block 1-4, while block 5-7 are trained from scratch. Currently, I used checkpoint_utils.init_from_checkpoint to copy weight from block 4 to extra blocks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.