gitycc / crnn-pytorch Goto Github PK

View Code? Open in Web Editor NEW

237.0 6.0 67.0 28.1 MB

Convolutional Recurrent Neural Network (CRNN) for image-based sequence recognition using Pytorch

License: MIT License

Python 99.61% Shell 0.39%

pytorch ocr-recognition crnn-ctc

crnn-pytorch's Introduction

CRNN Pytorch

Quick Demo

$ pip install -r requirements.txt
$ python src/predict.py -h

Everything is okay. Let's predict the demo images.

$ python src/predict.py demo/*.jpg
device: cpu
Predict: 100% [00:00<00:00,  4.89it/s]

===== result =====
demo/170_READING_62745.jpg > reading
demo/178_Showtime_70541.jpg > showtime
demo/78_Novel_52433.jpg > novel

CRNN + CTC

This is a Pytorch implementation of a Deep Neural Network for scene text recognition. It is based on the paper "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition (2016), Baoguang Shi et al.".

Blog article with more info: https://ycc.idv.tw/crnn-ctc.html

Download Synth90k dataset

$ cd data
$ bash download_synth90k.sh

@InProceedings{Jaderberg14c,
  author       = "Max Jaderberg and Karen Simonyan and Andrea Vedaldi and Andrew Zisserman",
  title        = "Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition",
  booktitle    = "Workshop on Deep Learning, NIPS",
  year         = "2014",
}

@Article{Jaderberg16,
  author       = "Max Jaderberg and Karen Simonyan and Andrea Vedaldi and Andrew Zisserman",
  title        = "Reading Text in the Wild with Convolutional Neural Networks",
  journal      = "International Journal of Computer Vision",
  number       = "1",
  volume       = "116",
  pages        = "1--20",
  month        = "jan",
  year         = "2016",
}

Pretrained Model

We pretrained the RCNN model on Synth90k dataset. The weights saved at checkpoints/crnn_synth90k.pt.

Evaluate the model on the Synth90k dataset

$ python src/evaluate.py

Evaluate on 891927 Synth90k test images:

Test Loss: 0.53042

Decoded Method	Sequence Accuracy	Prediction Time
greedy	0.93873	0.44398 ms/image
beam_search (beam_size=10)	0.93892	6.9120 ms/image
prefix_beam_search (beam_size=10)	0.93900	42.598 ms/image

Train your model

You could adjust hyper-parameters in ./src/config.py.

And train crnn models,

$ python src/train.py

Acknowledgement

Please cite this repo. crnn-pytorch if you use it.

crnn-pytorch's People

Contributors

Stargazers

Watchers

Forkers

toydogcat sumhncku yenlinwu codeboy5 leo731121 kamgang-b ljqcn101 erik1110 aa-amory jasonminsookim aiedward gehongpeng bigtongue5566 baoxianjian nourfdss zhjwy9343 prague6695 aristotle-li adea820616 ilgrad baucheng limaries30 mmilani1 skywalker0803r larry234 yanwu-ge maple5525 rootzzp ag83-a sailfish009 project-abgal deeper-learn xd-liu otto2048 ww-zwj fortressrain sagar-das-crypto jschulz97 vegeballoon yuling91 doublek29 sugumarravichandran codeofduty44 fariba87 qiaolingchen00 swold99 pomegranate9 alisheheryar victu946 box9229 martsvov burjune marc4862 jin-w-fs shankal17 vpr1989 tgravel qwertyz15 konstantindjairo leonardoviotti 735726032 tayyabmujahid ddedoss lnvadev mzpmzk zhangxinnan zxf864823150

crnn-pytorch's Issues

getting loss as NAN during training for 7k crops

Desktop\ptr_torch6\src>python train.py
device: cpu
epoch: 1
train_batch_loss[ 10 ]: nan
train_batch_loss[ 20 ]: nan
train_batch_loss[ 30 ]: nan
train_batch_loss[ 40 ]: nan
train_batch_loss[ 50 ]: nan
train_batch_loss[ 60 ]: nan
train_batch_loss[ 70 ]: nan
Corrupted image for 19

Does it only do lowercase predictions?

Hey, thanks for the work, is there any parameter that I can change in order to get the predictions for capital letters....?, is it case sensitive or only lowercase ?

Data format issue

Hi there,

Many thanks for the clean implementation...

I have a issue here,

annotation_train.txt (example annotations)

./1391/4/361_Kindest_42517.jpg kindest
./1211/6/401_Grizzlies_33809.jpg grizzlies
./1833/4/74_BALDY_5595.jpg baldy

annotation_val.txt (example annotations)

./2271/2/390_PATRICIA_55938.jpg patricia
./929/4/397_Percival_56583.jpg percival
./1837/5/377_REPULSIVE_64992.jpg repulsive

lexicon.txt (example annotations)

0
00
002101
01
01206
01206368166

When I ran python src/train.py
it was giving me error -

index = int(index_str)
ValueError: invalid literal for int() with base 10: 'kindest'

Then I changed -

    def _load_from_raw_files(self, root_dir, mode):
        mapping = {}
        with open(os.path.join(root_dir, 'lexicon.txt'), 'r') as fr:
            for i, line in enumerate(fr.readlines()):
                mapping[i] = line.strip()
        paths_file = None
        if mode == 'train':
            paths_file = 'annotation_train.txt'
        elif mode == 'dev':
            paths_file = 'annotation_val.txt'
        elif mode == 'test':
            paths_file = 'annotation_test.txt'
        paths = []
        texts = []
        with open(os.path.join(root_dir, paths_file), 'r') as fr:
            for line in fr.readlines():
                path, index_str = line.strip().split(' ')
                path = os.path.join(root_dir, path)
                paths.append(path)
                texts.append(index_str)
        return paths, texts

Where instead of taking int of str I passed the index_str directly to the text.append. The training runs but accuracy looks 0.0. I was doubting the way I have created data. Would you mind giving me some advice here?

Note: I just trained for about 20 epochs, is is too early to say the accuracy is 0.0 or if I train it more there is probability of seeing some results?

need help with input data preparation(training from custom images)

I have an excel of 10k crops containing two
columns:

Column 1: Image_path (<path/abc.png>)

Column 2: Ground_Truth ()

data.csv looks like:

path | gt

C:/Users/1234/crop/ABC 07 07 2020_page1.png | 8 05 75 824.46Cr
C:/Users/1234/crop/PQW 07 10 2020_page1.png | Time 11 42 23
C:/Users/1234/crop/XRE 08 10 2020_page1.png | Account No. 200000592
C:/Users/1234/crop/JKL 07 10 2020_page1.png | 1 00 00 00 000.00

Now, I need to use this input for the training.
Shall I feed this in dataset.py first ? Then need to start training?.

Or if you can help me with steps.

你好，模型请教

你好，关于你的代码跟源码上有什么不一样呢？CRNN内部model做了什么调整

Libtorch Error

I want to use my own trained model in libtorch, but I'm unable to do so. I'm encountering the following error.

terminate called after throwing an instance of 'c10::Error'
what(): open file failed, file path: src/crnn_synth90k.pt
Exception raised from FileAdapter at /home/asis/pytorch1.8/pytorch/caffe2/serialize/file_adapter.cc:11 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits, std::allocator >) + 0x6b (0x7f75c1f6695b in /home/asis/pytorch1.8/pytorch/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0xce (0x7f75c1f631be in /home/asis/pytorch1.8/pytorch/torch/lib/libc10.so)
frame #2: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) + 0x258 (0x7f75b7452568 in /home/asis/pytorch1.8/pytorch/torch/lib/libtorch_cpu.so)
frame #3: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::hash<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits, std::allocator > const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > > > >&) + 0x40 (0x7f75b89ef750 in /home/asis/pytorch1.8/pytorch/torch/lib/libtorch_cpu.so)
frame #4: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, c10::optionalc10::Device) + 0x6f (0x7f75b89ef8df in /home/asis/pytorch1.8/pytorch/torch/lib/libtorch_cpu.so)
frame #5: Crnn::Crnn(std::__cxx11::basic_string<char, std::char_traits, std::allocator >&, std::__cxx11::basic_string<char, std::char_traits, std::allocator >&) + 0x68 (0x55c6271749a8 in ./CrnnDeploy)
frame #6: main + 0xbe (0x55c62717391e in ./CrnnDeploy)
frame #7: __libc_start_main + 0xe7 (0x7f75a8e4dc87 in /lib/x86_64-linux-gnu/libc.so.6)
frame #8: _start + 0x2a (0x55c627173afa in ./CrnnDeploy)

无请删除

Extending model to other data sets; how to code labels?

Figured out the coding after downloading the synth data set ;)