Coder Social home page Coder Social logo

attentionocr's Issues

Recognition model batch predict

For the recognition part, I noticed that it's a simple 'for loop', I want to improve performance with batch predict, so I made subtle changes just to test:

pads = [image_padded, image_padded]
image_padded = np.array(pads)
print("Batch images: ", image_padded.shape)
# Batch images:  (2, 299, 299, 3)

texts, probs = self.model.predict(image_padded, self.label_dict)

Then I got following error:

ValueError: Cannot feed value of shape (2, 299, 299, 3) for Tensor 'image:0', which has shape '(1, 299, 299, 3)'

Why 'image:0' has shape '(1, 299, 299, 3)' rather than '(?, 299, 299, 3)'? Is it fixed when training? Really appreciate any suggestions on how to fix this

关于docker的问题

首先很感谢可以分享代码,我想问一下,可以不通过docker的方式运行吗?

关于数据扩充

请问下您,在做文字识别的时候,有使用其他的数据集或者自己制作的数据集吗,如果有的话,方便分享一下吗?如果不方便分享,可以说下下思路吗?

关于如何模型感受野问题

1
如上图,,当字体和图片比较起来偏小时,会识别出来一堆# 字符,请教一下,针对这种场景有什么办法可以解决?

上传图片报错

在flask页面上传图片后报错了,

File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2309, in call
return self.wsgi_app(environ, start_response)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2295, in wsgi_app
response = self.handle_exception(e)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1741, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
response = self.full_dispatch_request()
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
raise value
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
rv = self.dispatch_request()
File "/home/an/anaconda3/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
return self.view_functionsrule.endpoint
File "/ocr/ocr/flaskapp.py", line 134, in predict_ocr_image
image = detection(img_path, ocr_detection_model, ocr_recognition_model, ocr_label_dict)
File "/ocr/ocr/flaskapp.py", line 242, in detection
r_boxes, polygons, scores = detection_model.predict(bgr_image)
File "/ocr/ocr/text_detection.py", line 60, in predict
r_box, polygon = generate_polygon(mask, box)
File "/ocr/ocr/util.py", line 559, in generate_polygon
contours, hierarchy = cv2.findContours(mask_int,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
ValueError: too many values to unpack (expected 2)

您好,请问详细的版本

  1. 用的CUDA10吗?
  2. 另外要单独 conda install keras-gpu.吗,如果不跟版本号直接会把tensorflow直接更新到2.0
    刚开始报这个错: No OpKernel was registered to support Op 'NcclAllReduce' with these attrs.
    参考的:https://ask.csdn.net/questions/931786
    后来重新装了个环境:tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed. Status: CUDA driver version is insufficient for CUDA runtime version
    期待回复,谢谢。

试跑了一下test.py但遇到一个问题,请帮忙看一下

python test.py
2019-11-13 16:33:41.908568: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-11-13 16:33:41.950705: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 16:33:41.951266: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
2019-11-13 16:33:41.951360: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951422: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951477: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951534: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951589: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:41.951644: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda-9.0/lib64
2019-11-13 16:33:42.539829: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-11-13 16:33:42.539906: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-11-13 16:33:42.540637: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-13 16:33:42.575270: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192000000 Hz
2019-11-13 16:33:42.576026: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c16973a10 executing computations on platform Host. Devices:
2019-11-13 16:33:42.576040: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2019-11-13 16:33:42.576103: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-13 16:33:42.576110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]
2019-11-13 16:33:42.656045: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-13 16:33:42.656412: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x557c18c65640 executing computations on platform CUDA. Devices:
2019-11-13 16:33:42.656424: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1080 Ti, Compute Capability 6.1
Traceback (most recent call last):
File "test.py", line 121, in
test(args)
File "test.py", line 91, in test
model = TextRecognition(args.pb_path, cfg.seq_len+1)
File "test.py", line 23, in init
self.init_model()
File "test.py", line 37, in init_model
self.label_ph = self.sess.graph.get_tensor_by_name('label:0')
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name
return self.as_graph_element(name, allow_tensor=True, allow_operation=False)
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element
return self._as_graph_element_locked(obj, allow_tensor, allow_operation)
File "/home/quh/.conda/envs/pytorch/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked
"graph." % (repr(name), repr(op_name)))
KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph."

docker运行demo时内存不够怎么配置

W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 358.89MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.

普通显卡
tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce GT 1030 major: 6 minor: 1 memoryClockRate(GHz): 1.5185 pciBusID: 0000:01:00.0 totalMemory: 1.95GiB freeMemory: 1.63GiB

train报错

from utils.np_box_ops import iou as np_iou
ImportError: No module named 'utils'

speed

would you mind sharing the speed

可识别字符长度改为16

seq_len是单个文本行最多可识别的字符数,是这个意思吧。
我现在想训一个短文本行的模型,最长seq_len设置为16,请问还需要修改哪些地方?
直接改为16报错,报ValueRrror,维度不匹配, 具体错误为cannot feed value of shape(16,33) for Tensor 'label:0', which has shape(?,17)
麻烦您指点一下,多谢

docker run failed

Hi:
when i perform nvidia-docker run --runtime=nvidia -p 5000:5000 -it zhang0jhon/demo:ocr bash

errors comes:
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/1dd9d7b67a05f6c1b95ad52e6ada9b2ff3e9f249c85d214f405feb610c19b569/log.json: no such file or directory): fork/exec /usr/bin/nvidia-container-runtime: no such file or directory: : unknown.

thank you!

关于训练的提问

请问下您的AttentionOCR模型是单独训练的,还是和mask-rcnn结合在一起一起训练的?如果单独训练AttentionOCR模型,数据格式是FSNS-tfrecord吗?因为我准备用自己的数据训练,不去下载ICDAR官方数据。

Default MaxPoolingOp only supports NHWC on device type CPU

你好:
请问我再运行完python flaskapp.py,上传图片之后,进行预测的时候,会显示下面的错误,请问这是什么原因导致的?
tensorflow.python.framework.errors_impl.InvalidArgumentError: Default MaxPoolingOp only supports NHWC on device type CPU
[[node pool0/MaxPool (defined at /tensor_flow/OCRSpace/ocr/ocr/text_detection.py:29) ]]

Original stack trace for 'pool0/MaxPool':

请问应该怎么修改呢?多谢。。。

关于检测文本的坐标

你好,请问一下docker版本里检测文本的坐标位置在哪儿?输出的坐标是什么?还有resize是什么意思?坐标是resize之后的吗?我想输出检测文本在原图中的坐标该怎么办? @zhang0jhon

您好,请问对长行的效果如何

我看了训练图像输入的size是256*256的,不知道我改一下,对长行的效果怎么样,请问您那边有测试吗?谢谢
我看代码是可以改的,如果可行,我打算转换一下自已的数据试下,3Q。

关于识别的问题

假设已经定位到文字部分(暂不考虑定位方法),若采用AttentionOCR去识别,识别结果是针对图片中文字整体识别还是针对图片中的文字一个一个进行识别,因为之前采用crnn-ctc的模型是对图片中的文字一起识别,但是我看到您的images文件夹中图片有标识每一个汉字的识别概率,不知道我表达清楚没有^~^

icdar_datasets.npy

你好,请问文件icdar_datasets.npy里面是什么内容,它的格式是什么样的?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.