Hi, I am a bit confused with the output shape of text-detection-0001. <p dir="auto

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you, <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

text-detection-0001 output shape is different from paper and description about open_model_zoo HOT 6 CLOSED

openvinotoolkit commented on June 17, 2024

text-detection-0001 output shape is different from paper and description

from open_model_zoo.

Comments (6)

snosov1 commented on June 17, 2024

@Ilya-Krylov can you comment?

from open_model_zoo.

Ilya-Krylov commented on June 17, 2024

@banderlog Thank you for your finding. Actually this is a mistake. There should be

[1x2x192x320] - logits related to text/no-text classification for each pixel.
[1x16x192x320] - logits related to linkage between pixels and their neighbors.

If so it corresponds to description in the paper. Output blob format is [BATCH_SIZE, CHANNELS_NUMBER, H, W].

from open_model_zoo.

banderlog commented on June 17, 2024

Thank you for fast answer, I'm glad that I was able to help and that the second output's shape now the same in the all 3 sources. But I still have a problem: text-detection-0001 output is only one tensor.

Please, look on my example below:

import cv2

img = cv2.imread('test.jpg')

td = cv2.dnn.readNet('./text-detection-0001.xml','./text-detection-0001.bin')
blob = cv2.dnn.blobFromImage(img, 1, (768, 1280))
td.setInput(blob)
a, b = td.forward()
>>> ValueError: not enough values to unpack (expected 2, got 1)

And if I'll check an output's shape:

a = td.forward()
a.shape
>>> (1, 16, 192, 320)

As far as I understand, I still need logits related to text/no-text classification for each pixel with shape [1x2x192x320]. Could you comment it please, maybe you need some additional info, like cv2.getBuildInformation() output?

Currently I am using custom build of OpenCV 4.0.1 with Inference Engine built from dldt git.

from open_model_zoo.

dkurt commented on June 17, 2024

@banderlog,

This method returns a single output:

 |  forward(...)
 |      forward([, outputName]) -> retval
 |      .   @brief Runs forward pass to compute output of layer with name @p outputName.
 |      .   *  @param outputName name for layer which output is needed to get
 |      .   *  @return blob for first output of specified layer.
 |      .   *  @details By default runs forward pass for the whole network.

source: https://docs.opencv.org/master/db/d30/classcv_1_1dnn_1_1Net.html#a98ed94cb6ef7063d3697259566da310b

You need to use td.forward(td.getUnconnectedOutLayersNames())

from open_model_zoo.

banderlog commented on June 17, 2024

Thank you, @dkurt , it worked 😄

a, b = td.forward(td.getUnconnectedOutLayersNames())

a.shape
>>>  (1, 2, 192, 320)
b.shape
>>> (1, 16, 192, 320)
td.getUnconnectedOutLayersNames()
>>> ['pixel_cls/add_2', 'pixel_link/add_2']

May I suggest that td.forward(td.getUnconnectedOutLayersNames()) should be added into text-detection-0001.md?

from open_model_zoo.

banderlog commented on June 17, 2024

Blob creation should be done in this way: blob = cv2.dnn.blobFromImage(img3, 1, (1280,768)), if you do as above, you will get no errors, but also no comprehensible output.

from open_model_zoo.

text-detection-0001 output shape is different from paper and description about open_model_zoo HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent