The model is mainly based on the method described in the article:
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
Baoguang Shi, Xiang Bai, Cong Yao
https://arxiv.org/abs/1507.05717
This is a modification version based on https://github.com/weinman/cnn_lstm_ctc_ocr