bgshih / aster Goto Github PK

View Code? Open in Web Editor NEW

724.0 21.0 195.0 367 KB

Recognizing cropped text in natural images.

License: MIT License

Python 97.16% CMake 0.24% Shell 0.03% C++ 2.57%

ocr computer-vision scene-text recognition

aster's People

Contributors

Stargazers

Watchers

Forkers

fendaq yuckfu wushilian wxbxj fireae duchen521 yuanhang8605 smilewsw felixmonkey xiezhihua001 10183308 rkshuai conleykong xzf125244170 choonkiattay lyimage caocao ieee820 ustc2014 lzd0825 happog xshhhm lss616263 xiaolaodi liuxng alannewimage mrlightman5 ma-dawei aucidme kspook dengjun to-future dreadlord1984 xbcreal crossli jeffrey98-ai ly774508966 xijunjun peternara cronaldo1997 eglxiang chuzhiml xiaohujecky wanglc2008 jdc08161063 wuyunxiangwyx jiangxiluning tarsbase reijmer gzhcv hubert2102 xiaoyubing meitianjinbu yongduek hbulaoma pkang2017 lilin19890401 billyzju jingjing-you xgmiao qingsong99 gaoxin627 sunxingxingtf liuwenhaha yiyifu wuxiaolianggit runauto mrlongzhang jaeyubsong lonelygo nebuladream ustczhouyu zealian gehongpeng ouya-bytes lijian10086 ericdoug hell-to-heaven melvin-leo missyangx chadpieere viviban sungjune-p wuhaodemo thanhhoang283 amoonhappy liuzhuang1024 joinalahmed cuimiao187561 cqray1990 firefreedomk aptlin stivensss ewail attendfov caoyangcr7 iprayerr mercurial24 actasidiot teresasun

aster's Issues

CuDNN error

Hi @bgshih : when i run python aster/demo.py i encounter this error:
Loaded runtime CuDNN library: 7103 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration.
i will be greateful if you could give some advise.

Error occur when execute "python3 aster/demo.py

Traceback (most recent call last):
File "aster/demo.py", line 92, in
tf.app.run()
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "aster/demo.py", line 55, in main
predictions_dict = model.predict(tf.expand_dims(resized_image_tensor, 0))
File "/home/abc/Projects/aster/meta_architectures/multi_predictors_recognition_model.py", line 51, in predict
predictor_outputs = predictor.predict(feature_maps, scope='{}/Predictor'.format(name))
File "/home/abc/Projects/aster/predictors/attention_predictor.py", line 74, in predict
maximum_iterations=self._max_num_steps
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 309, in dynamic_decode
swap_memory=swap_memory)
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3202, in while_loop
result = loop_context.BuildLoop(cond, body, loop_vars, shape_invariants)
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2940, in BuildLoop
pred, body, original_loop_vars, loop_vars, shape_invariants)
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2877, in _BuildLoop
body_result = body(*packed_vars_for_body)
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/contrib/seq2seq/python/ops/decoder.py", line 254, in body
decoder_finished) = decoder.step(time, inputs, state)
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/contrib/seq2seq/python/ops/beam_search_decoder.py", line 490, in step
cell_outputs, next_cell_state = self._cell(inputs, cell_state)
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/python/ops/rnn_cell_impl.py", line 191, in call
return super(RNNCell, self).call(inputs, state)
File "/home/abc/Envs/tf-py3/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 714, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/abc/Projects/aster/core/sync_attention_wrapper.py", line 54, in call
self._attention_layers[i] if self._attention_layers else None)
ValueError: too many values to unpack (expected 2)

reshaping function error in demo.py

ValueError: Unexpected behavior when reshaping between beam width and batch size. The reshaped tensor has shape: (1, 5, 256). We expected it to have shape (batch_size, beam_width, depth) == (1, 5, 25). Perhaps you forgot to create a zero_state with batch_size=encoder_batch_size * beam_width?

is this becuase of the Tensorflow version issue or something else?
Currently I am using Tensorflow 1.12.

chinese txt issue

hello, I want to train a chinese ocr model, but I don't know how to change the lexicon, could you explain this problem?

build error

/usr/bin/ld: cannot find -ltensorflow_framework
collect2: error: ld returned 1 exit status
CMakeFiles/aster.dir/build.make:146: recipe for target 'libaster.so' failed
make[2]: *** [libaster.so] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/aster.dir/all' failed
make[1]: *** [CMakeFiles/aster.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

中文训练问题

请问aster中文字符训练数据一定需要keypoints，之前用crnn训练的时候只有

Confidence score for each recognized word

Hi,

How do we map prediction score (which is a negative value in most cases) to a confidence value between 0 and 1. I need to check by what probability each word was recognized by the model.

Thanks.

修改字符的类别数目

请问在哪里可以修改分类的字符数目，用自己的数据训练网络？

buggy at aster/tools/create_synthtext_tfrecord.py?

I am sorry but on my side things worked well until I tried to create synthtext_crop_all.tfrecord via excecuting python aster/tools/create_synthtext_tfrecord.py.

Error comes as follow:

  File "aster/tools/create_synthtext_tfrecord.py", line 150, in main
    _create_samples_of_an_image(writer, image_rel_path, text, word_polygons, char_polygons)
NameError: name '_create_samples_of_an_image' is not defined

Walking through the source code , I found even though method _create_samples_of_an_image of class SynthTextCreator is invoked , the function takes 4 positional arguments but 5 were given, things are still broken.

I have already successfully run the demo , generated synth90k_all.tfrecord & iiit5k_test_1k.tfrecord ,train and eval on both datasets. So my run time environment can be considered ok.

I am sorry to bother but sincerely waiting for response :)

predictions_dict = model.predict(tf.expand_dims(resized_image_tensor, 0)) recognitions = model.postprocess(predictions_dict)

代码中可以找到predict（）和postprocess（）函数的具体实现吗？我不太理解
@AbstractMethod
def predict(self, preprocessed_inputs, scope=None):
pass

@AbstractMethod
def loss(self, predictions_dict, scope=None):
pass

@AbstractMethod
def postprocess(self, predictions_dict, scope=None):
pass
这几个函数这么定义了之后要怎么用？

中文识别效果好吗？

有人试过没呢？

how to run batch inference

Amount of parameters of this model

Does anyone has a rough idea how many parameters this model has?

newbie asking for help : how to initialize ASTER in a class constructor

I'm a complete newbie in Tensorflow. I've been able to run the script aster/demo.py but I'd like to refactor it so that the algorithm is initiated in a class constructor. The class would have only an other method 'recongnize'

class ASTER(object):
  def __init__(self):
  	...

  def recognize(self, image_path):
    ...

I would then use an instance of ASTER with :

recognizer = ASTER()
print("res1:", recognizer.recognize("./image1.jpg"))
print("res2:", recognizer.recognize("./image2.jpg"))

Is it possible to do so? I've tried a couple of things but failed miserably... Thanks in advance.

anyone train with tf >=1.4

have anyone train with tf>=1.4, company server cannot allow switch cuda8,

A Open pdf version

http://cloud.eic.hust.edu.cn:8071/UpLoadFiles/Papers/ASTER_PAMI18.pdf

I cannot find pipeline_pb2

Evaluation in your paper

demo.py error

chase@zlq:~$ python3 aster/demo.py
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
/usr/lib/python3.5/importlib/_bootstrap.py:222: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
return f(*args, **kwds)
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:root:Number of classes is 94
INFO:root:UNK label is 2
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:root:Number of classes is 94
INFO:root:UNK label is 2
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 460, in make_tensor_proto
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 460, in
str_values = [compat.as_bytes(x) for x in proto_values]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/compat.py", line 65, in as_bytes
(bytes_or_text,))
TypeError: Expected binary or unicode string, got Dimension(1)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "aster/demo.py", line 92, in
tf.app.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "aster/demo.py", line 55, in main
predictions_dict = model.predict(tf.expand_dims(resized_image_tensor, 0))
File "/home/chase/aster/meta_architectures/multi_predictors_recognition_model.py", line 39, in predict
transform_output_dict = self._spatial_transformer.batch_transform(stn_inputs)
File "/home/chase/aster/core/spatial_transformer.py", line 45, in batch_transform
sampling_grid = self._batch_generate_grid(input_control_points)
File "/home/chase/aster/core/spatial_transformer.py", line 99, in _batch_generate_grid
[batch_size, 1, 1]) # => [B, k+3, k+3]
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_array_ops.py", line 3847, in tile
name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 493, in apply_op
raise err
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 490, in apply_op
preferred_dtype=default_dtype)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 676, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 121, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/tensor_util.py", line 464, in make_tensor_proto
"supported type." % (type(values), values))
TypeError: Failed to convert object of type <class 'list'> to Tensor. Contents: [Dimension(1), 1, 1]. Consider casting elements to a supported type.

how to use custom data?

ERROR：protoc aster/protos/*.proto --python_out=.

aster/protos/rnn_cell.proto:7:3: Expected "required", "optional", or "repeated".
aster/protos/rnn_cell.proto:7:24: Missing field number.
aster/protos/hyperparams.proto:44:3: Expected "required", "optional", or "repeated".
aster/protos/hyperparams.proto:44:27: Missing field number.
aster/protos/hyperparams.proto:64:3: Expected "required", "optional", or "repeated".
aster/protos/hyperparams.proto:64:27: Missing field number.
aster/protos/bidirectional_rnn.proto: Import "aster/protos/rnn_cell.proto" was not found or had errors.
aster/protos/bidirectional_rnn.proto: Import "aster/protos/hyperparams.proto" was not found or had errors.
aster/protos/bidirectional_rnn.proto:9:12: "RnnCell" is not defined.
aster/protos/bidirectional_rnn.proto:10:12: "Regularizer" is not defined.
aster/protos/bidirectional_rnn.proto:12:12: "Hyperparams" is not defined.

How about python2?

Hi,

Much appreciate this project and thanks.

One of my issues is about the compatibility with python2, did you do some tests for python2?

Br.

只能英文字符吗，数字可以吗，识别字符在哪里设置

使用训练好的模型的部分权重进行训练

由于字典的改变，不能接着以前的模型继续训练，如何使用之前训练好的模型中的部分权重进行训练

trainer.py里的detection_model.restore_map( )没有这个模块啊

run train.py err

after checkpointsaveHook created , process was killed

Chinese TXT

Hi,

I want to use this model to detect Chinese text images .
When I create tfrecords, the groundtruth_text for Chinese characters after 'utf-8' encoding is different from English and numbers and might cause problems. Would you please give me some advice?

Thanks.

demo.py反向预测的时候，是正向还是反向的？

error ‘input_reader_config not of type input_reader_pb2.InputReader’

when I run python train.py --exp_dir experiments/demo --num_clones 2,
raise error ‘input_reader_config not of type input_reader_pb2.InputReader’

File "train.py", line 217, in
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "train.py", line 213, in main
worker_job_name, is_chief, train_dir)
File "aster/trainer.py", line 129, in train
data_augmentation_options
File "aster/trainer.py", line 19, in _create_input_queue
tensor_dict = create_tensor_dict_fn()
File "aster/builders/input_reader_builder.py", line 33, in build
raise ValueError('input_reader_config not of type '
ValueError: input_reader_config not of type input_reader_pb2.InputReader.

error occur when run "python aster/demo.py"

I get the error:
Traceback (most recent call last):
File "aster/demo.py", line 9, in
from aster.builders import model_builder
File "/home/seven/Downloads/aster/builders/model_builder.py", line 5, in
from aster.builders import predictor_builder
File "/home/seven/Downloads/aster/builders/predictor_builder.py", line 9, in
from aster.predictors import attention_predictor
File "/home/seven/Downloads/aster/predictors/attention_predictor.py", line 14, in
from aster.c_ops import ops
File "/home/seven/Downloads/aster/c_ops/ops.py", line 26, in
_oplib = _load_oplib(FLAGS.oplib_name)
File "/home/seven/Downloads/aster/c_ops/ops.py", line 23, in _load_oplib
oplib = tf.load_op_library(lib_copy_path)
File "/home/seven/anaconda2/envs/aster/lib/python3.5/site-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename, status)
File "/home/seven/anaconda2/envs/aster/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.NotFoundError: /tmp/libaster_ac83631a.so: undefined symbol: _ZNK10tensorflow14TensorShapeRep11DebugStringEv

数据集格式是什么样的

Does this work on Mac

I have been trying to Install and Run this framework on a Mac. However, It is not working. Is it only for Ubuntu?

Raw RNN string output?

First, thank you for sharing this project.

I want to be able to see the raw RNN string output that isn't checked against a lexicon. For example, "L-EE-A-RR-N-I-NN-G" instead of "LEARNING".

What part of the architecture would I change to allow this to happen?

Thank you.

ｅｖａｌ error

2019-03-31 16:47:03.634158: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: ValueError: attempt to get argmin of an empty sequence
[[Node: PyFuncStateless = PyFuncStateless[Tin=[DT_STRING, DT_STRING], Tout=[DT_STRING], token="pyfunc_0", _device="/job:localhost/replica:0/task:0/device:CPU:0"](prefetch_queue_Dequeue:4, strided_slice_5)]]
INFO:root:Skipping image

About the training set

Where can I download the training set, it is too big to generate one myself.

create_synthtext_tfrecord.py error

NameError: global name '_create_samples_of_an_image' is not defined

the function is right?

如何评估

Hi,
您好,非常感谢你开源的代码! 话说能否详细介绍一下如何在整个数据库上评估以达到论文表格中的结果呢? 我运行eval.py 发现会报错.

祝好

fail to train the model

请问你们能训练aster吗，我用python3 train.py训练的时候，初始化一下了15行代码，到从我下载的预训练模型中恢复数据的那一步的时候就卡住了，ctrl+c也关不掉了。如下所示，
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:root:Number of classes is 94
INFO:root:UNK label is 2
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:root:Number of classes is 94
INFO:root:UNK label is 2
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Scale of 0 disables regularizer.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Restoring parameters from /aster/experiments/demo/log/model.ckpt
请问你们是怎么样训练的恳请分享经验，非常感谢

No module named 'aster'

/home/zhaoyangze/anaconda3/envs/tf/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
return f(*args, **kwds)
Traceback (most recent call last):
File "aster/demo.py", line 8, in
from aster.protos import pipeline_pb2
ModuleNotFoundError: No module named 'aster'
This happens when I execute python3 demo.py.Anyone can help？

特殊字符识别问题

新加的， elif config.built_in_set == label_map_pb2.CharacterSet.ALLCASES_SYMBOLS1:
character_set = list(string.digits+chr(937)+chr(948)+'.R=%m')
但是Ωδ这两个符号测试时显示不出来，是这个样子00=0.56%，但应该是δ=0.56%

Are learning rate and num_steps too large?

Sorry to bother you... In your trainval.prototxt, I noticed that the Initial learning rate is 1, and it will drop with the training progress. But during my training, the acc is keeping about 0.15 and not growing during the 0-600000 step. Is it because of the high learning rate?
And the num_steps is 1200000, so it will spend about 1 week for training... So It`s difficult for me to adjust parameters... Is there any possibility that I can quickly verify whether my network can eventually be trained well?

Dataset:

http://cvit.iiit.ac.in/projects/SceneTextUnderstanding/IIIT5K.html
It contains 5000 cropped word images from Scene Texts and born-digital images.

Computer:

GPU: Tesla-V100-PCIE-16GB

Thanks a lot~

训练中文模型的一些修改

修改experiment/模型名称/config/下的配置文件，将built_in_set更改成为text_file并指定中文字符集路径。
配置文件中，attention_predictor下一定要将reverse修改为false
groundtruth中每个汉字中间加入分隔符，并修改core/label_map.py中的text_to_labels函数中的tf.string_split函数参数。
由于gt中加入了分隔符，需要相应修改utils/recognition_evaluation.py中相关部分使得评估正确。

However, it seems that in the first residual unit in every residual block,
the residual unit of [3x3 3x3] is used.
To be specific, it is shown in the line 38-43 of https://github.com/bgshih/aster/blob/master/convnets/resnet.py .

What is the reason?
And is my understanding correct?

Thank you very much.

Sincerely,
Kim.

论文中您所使用的ICDAR15测试数据集是全集2077张图片还是去除含有non-alphanumeric characters图片后的1811张

论文中您所使用的ICDAR15测试数据集是全集2077张图片还是去除含有non-alphanumeric characters图片后的1811张？