Coder Social home page Coder Social logo

deeplabv3-tensorflow's Issues

training error

你好,
非常感谢你的工程,但我在训练过程中遇到一个问题。 seg_logits = tf.boolean_mask(raw_output_up, mask) raw_output_up 是(1,321,321,21), mask 是(321, 321), 报错提示 Shapes (1, 321) and (321, 321) are incompatible 请问这应该怎么解决呢?

No result in training

Excuse me, why I did not report any program errors in training, but there has been no results, no weight training, even accuracy, loss value and IOU have no output, and the final output of a snapshots, nothing, and this is why, thank you for your reply.

'''image-level feature'''

Is the '''image-level feature''' implemented in the wrong way?
It should be a global feature with pooling and unpooling

How about performance in comparison with paper?

Great to know someone implement Deeplabv3. It is not an issue, I just want to know the performance what you have achieved. Could you tell me how about mIoU did you achieve? Thanks so much.

This is my log, but it is too worse. One more thing, your code did not provide resnet pretrain and did not use it, I guess the reason of below log is training from scratch

step 8902, tot_loss = 1.370319, seg_loss = 1.045083, reg_loss = 0.325237, mean_iou: 0.031608, lr: 0.009866(0.552 sec/step)
step 8903, tot_loss = 1.761859, seg_loss = 1.436624, reg_loss = 0.325235, mean_iou: 0.031609, lr: 0.009866(0.535 sec/step)

从VOC2012的训练模型Finetune的错误问题

你好,非常好的工程, 我这边遇到我从VOC2012训练出的模型无法对新数据集做FINETUNE, 我使用了--not-restore-last标志仍然报错, 错误主要提示无法从21类匹配到18类(这个是我的类别数), 请问这个是什么原因呢?

PosiDeLabVv3 TysFouth-Fouth-Fas-FasMaskrCNN和TysSouth-DeLIPAB RESNET

(dompamine-env) test@test:~/DeepLabV3-Tensorflow$ python3 train_voc12.py
Traceback (most recent call last):
File "train_voc12.py", line 17, in
from libs.datasets.dataset_factory import read_data
File "/home/test/DeepLabV3-Tensorflow/libs/datasets/dataset_factory.py", line 4, in
import tensorflow.contrib.slim as slim
ImportError: No module named 'tensorflow.contrib'

ImageNet pre-trained weights for ResNet101

This is a discussion about using pre-train, not the bug

As you know, the Imagenet has 4 blocks, and the Imagenet pre-trained will store the weight of the 4 blocks. The size of these block will be

image

However, the deeplabv3 has 7 blocks (extra block 5, block 6, and block 7 as replicas of block 4). So if I understand correctly, it will be

block3 (1x1,256; 3x3,256;1x1,1024)-> block 4 (1x1,512; 3x3,512;1x1,2048)-> block 5 (1x1,512; 3x3,512;1x1,2048)-> block 6 (1x1,512; 3x3,512;1x1,2048)-> block 7 (1x1,512; 3x3,512;1x1,2048)

We can copy the weight of block4 to three extra blocks, except the first convolution of each extra block due to shape mismatch. The reason is that the first convolution of block 4 (conv1) will be 1x1x1024 (where 1024 is the last shape of input block 3), while the first convolution of the extra block is 1x1x2048 (where 2048 is the last shape of input block 4). Do you have the same problem as above? Or the author just use pre-train weight for block 1-4, while block 5-7 are trained from scratch. Currently, I used checkpoint_utils.init_from_checkpoint to copy weight from block 4 to extra blocks.

Number of feature after block 3

Hello, I checked your implementation in ASPP model with resnet 101. I foubd that your number of feature map after block 3 ( starts use dilation with rate 2) and in the ASPP may be not correct. If I understand, it must be 1024 for block 3, 2048 for block 4 and 2048×5 for ASPP, instead of 256 in your code. Am I right?

Multigrid block misunderstanding.

Hi Nanqing,

First, thanks a lot for your implementation, this is a great piece of work!

I feel like you misunderstood the Multigrid block of DeepLabV3 network. You create a bottleneck_hdc unit that do:

conv1: conv 1x1, stride=1 rate=rate*multi_grid[0]
conv2: conv 3x3, stride=stride, rate=rate*multi_grid[1]
conv3: conv 1x1, stride=1, rate=rate*multi_grid[2]

and then you repeat 3 times bottleneck_hdc unit.
In ResNet bottleneck, conv1 is a decreasing projection, conv2 is a 3x3 convolution where dilation is supposed to happen, conv3 is an increasing projection. Please note that projections are 1x1 convolutions, for which dilation_rate doesn't have any effect.

What is described in DeepLabV3 for block n | n={4,5,6,7}, is a succession of 3 standard bottleneck_v1 whose dilation rates are multi_grid=(1,2,1) for each of the 3 blocks, respectively. The corrected code should be:

multi_grid=(1,2,1)
D = 512
for r in range(4):
  with tf.variable_scope('block%d'%(r+4), [net]):
    rate=2**(r+1)
    for i in range(3):
      with tf.variable_scope('unit_%d'%(i+1), [net]):
        dilation_rate=rate*multi_grid[i]
        net = bottleneck(net, D*4, D, stride=1, rate=dilation_rate)

ASPP model without extra blocks

Hello, let's see the original sentence first

ASPP: In Tab. 5, we experiment with the effect of incorporating multi-grid in block4 and image-level features to the improved ASPP module. We first fix ASP P =(6, 12, 18) (i.e., employ rates = (6, 12, 18) for the three parallel 3 × 3 convolution branches), and vary the multigrid value. Employing Multi Grid = (1, 2, 1) is better than Multi Grid = (1, 1, 1), while further improvement is attained by adopting Multi Grid = (1, 2, 4) in the context of ASP P = (6, 12, 18) (cf ., the ‘block4’ column in Tab. 3). If we additionally employ another parallel branch with rate = 24 for longer range context, the performance drops slightly by 0.12%. On the other hand, augmenting the ASPP module with image-level feature is effective, reaching the final performance of 77.21%

  1. So the ASPP model shows in Fig. 5 only for 4 blocks, not 7 blocks. The 7 blocks only used for the cascaded model (Tab. 4) and achieved 76.66%. Thus, the paper used ASPP with 4 blocks as the final model. When I looked at your implementation, I found that you used ASPP for 7 blocks, that may be not same as the paper mentioned. Please correct me if I was wrong.
  2. The batch norm was trained with decay weight is 0.9997, instead of 0.997. Please check it. The author used batch norm layer after conv 3x3 in ASPP model. Do you think the relu also used after the batch norm layer? It means conv(3x3)+batchnorm+relu
    Thanks

TRAIN on voc2012

你好,

非常欣赏你的工程, 我今天基于VOC2012进行了训练, 使用学习率=0.01从头训练, 感觉收敛还是很慢,

请问这个现象正常吗? 按照教程训练, 这个指标可以达到85%以上吗?

runs error

import libs.preprocess.voc as preprocess
ImportError: No module named preprocess.voc

streaming mean iou

first thanks for your work
and I found you use streaming mean iou as evaluation method.
but when I output the result, I found the first result is zeros. like:
step 0 , mean_iou: 0.000000 (4.134 sec/step) setp 1 , mean_iou: 0.324524(0.286 sec/step)

I don't know why the result is zeros even my pred_img and gt_img is the same? I search many research and streaming_mean_iou in tensorflow is calculated through (tp/(tp+fp+fn)), I really want to know the reason? thanks for answer.

shuffle batch out of range error

Prepared data, error 👍

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1323, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1302, in _run_fn
    status, run_metadata)
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 473, in __exit__
    c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: RandomShuffleQueue '_4_shuffle_batch/random_shuffle_queue' is closed and has insufficient elements (requested 4, current size 0)
	 [[Node: shuffle_batch = QueueDequeueManyV2[component_types=[DT_FLOAT, DT_UINT8], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](shuffle_batch/random_shuffle_queue, shuffle_batch/n)]]

Training Batch Normalization

Hi, training the batch normalization is most important contribution in the paper

we change the training protocol in [10, 11] with three main differences: (1) larger crop size, (2) upsampling logits during training, and (3) fine-tuning batch normalization

For training the batch normalization,

Our added modules on top of ResNet all include batch normalization parameters, which we found important to be trained as well... we employ output stride = 16 and compute the batch normalization statistics with a batch size of 16... After training on the trainaug set with 30K iterations and initial learning rate = 0.007, we then freeze batch normalization parameters, employ output stride = 8, and train on the official PASCAL VOC 2012 trainval set for another 30K iterations and smaller base learning rate = 0.001.

And

convolutions with rates = (6, 12, 18) when output stride = 16 (all with 256 filters and batch normalization),...Note that the rates are doubled when output stride = 8.

From these notes, I want to mention:

  1. The batch normalization parameters in resnet (from block 1->4) and in ASPP need to trained with batch size of 16 and output stride = 16 on Augmentation set
  2. After 30k, these batch normalizations have to freeze and train on original PASCAL (which for fine segmentation class than aug) in the output stride =8

Currently, I did not find your implementation perform the two points. Hope it can achieve similar results when you consider it. My implementation just achieves 73.4% using the first point.

Checkpoint Error

Hi, Nanqing,
First, thanks a lot for your implementation, this is a great piece of work! I wonder if the pretrained model
you provided is corresponding to your code as the training have taken so much time.Could you share the snapshot that you completed?My email address is [email protected]'m waiting for your reply.

get bug of memory leaks

hi I get some memory leaks when I run this code, I try to add sess.graph.finalize() to frozen the graph, but it still exists, Can anyone give some advice? thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.