Coder Social home page Coder Social logo

hsiehyichia / scene-text-recognition Goto Github PK

View Code? Open in Web Editor NEW
163.0 15.0 55.0 1.61 GB

Scene text detection and recognition based on Extremal Region(ER)

License: MIT License

C++ 84.05% C 8.44% Python 2.36% Jupyter Notebook 3.78% CMake 1.37%
mser adaboost svm opencv ocr image-processing computer-vision scene-text-recognition scene-text-detection non-maximum-suppression

scene-text-recognition's Introduction

Scene text recognition

A real-time scene text recognition algorithm. Our system is able to recognize text in unconstrain background.
This algorithm is based on several papers, and was implemented in C/C++.

Enviroment and dependency

  1. OpenCV 3.1 or above
  2. CMake 3.10 or above
  3. Visual Studio 2017 Community or above (Windows-only)

How to build?

Windows

  1. Install OpenCV; put the opencv directory into C:\tools
    • You can install it manually from its Github repo, or
    • You can install it via Chocolatey: choco install opencv, or
    • If you already have OpenCV, edit CMakeLists.txt and change WIN_OPENCV_CONFIG_PATH to where you have it
  2. Use CMake to generate the project files
    cd Scene-text-recognition
    mkdir build-win
    cd build-win
    cmake .. -G "Visual Studio 15 2017 Win64"
  3. Use CMake to build the project
    cmake --build . --config Release
  4. Find the binaries in the root directory
    cd ..
    dir | findstr scene
  5. To execute the scene_text_recognition.exe binary, use its wrapper script; for example:
    .\scene_text_recognition.bat -i res\ICDAR2015_test\img_6.jpg

Linux

  1. Install OpenCV; refer to OpenCV Installation in Linux
  2. Use CMake to generate the project files
    cd Scene-text-recognition
    mkdir build-linux
    cd build-linux
    cmake ..
  3. Use CMake to build the project
    cmake --build .
  4. Find the binaries in the root directory
    cd ..
    ls | grep scene
  5. To execute the binaries, run them as-is; for example:
    ./scene_text_recognition -i res/ICDAR2015_test/img_6.jpg

Usage

The executable file scene_text_recognition must ultimately exist in the project root directory (i.e., next to classifier/, dictionary/ etc.)

./scene_text_recognition -v:            take default webcam as input  
./scene_text_recognition -v [video]:    take a video as input  
./scene_text_recognition -i [image]:    take an image as input  
./scene_text_recognition -i [path]:     take folder with images as input,  
./scene_text_recognition -l [image]:    demonstrate "Linear Time MSER" Algorithm  
./scene_text_recognition -t detection:  train text detection classifier  
./scene_text_recognition -t ocr:        train text recognition(OCR) classifier 

Train your own classifier

Text detection

  1. Put your text data to res/pos, non-text data to res/neg
  2. Name your data in numerical, e.g. 1.jpg, 2.jpg, 3.jpg, and so on.
  3. Make sure training folder exist
  4. Run ./scene_text_recognition -t detection
mkdir training
./scene_text_recognition -t detection
  1. Text detection classifier will be found at training folder

Text recognition(OCR)

  1. Put your training data to res/ocr_training_data/
  2. Arrange the data in [Font Name]/[Font Type]/[Category]/[Character.jpg], for instance Time_New_Roman/Bold/lower/a.jpg. You can refer to res/ocr_training_data.zip
  3. Make sure training folder exist, and put svm-train to root folder (svm-train will be build by the system and should be found at build/)
  4. Run ./scene_text_recognition -t ocr
mkdir training
mv svm-train scene-text-recognition/
scene_text_recognition -t ocr
  1. Text recognition(OCR) classifier will be fould at training folder

How it works

The algorithm is based on an region detector called Extremal Region (ER), which is basically the superset of famous region detector MSER. We use ER to find text candidates. The ER is extracted by Linear-time MSER algorithm. The pitfall of ER is repeating detection, therefore we remove most of repeating ERs with non-maximum suppression. We estimate the overlapped between ER based on the Component tree. and calculate the stability of every ER. Among the same group of overlapped ER, only the one with maximum stability is kept. After that we apply a 2-stages Real-AdaBoost to fliter non-text region. We choose Mean-LBP as feature because it's faster compare to other features. The suviving ERs are then group together to make the result from character-level to word level, which is more instinct for human. Our next step is to apply an OCR to these detected text. The chain-code of the ER is used as feature and the classifier is trained by SVM. We also introduce several post-process such as optimal-path selection and spelling check to make the recognition result better.

overview

Notes

For text classification, the training data contains 12,000 positive samples, mostly extract from ICDAR 2003 and ICDAR 2015 dataset. the negative sample are extracted from random images with a bootstrap process. As for OCR classification, the training data is consist of purely synthetic letters, including 28 different fonts.

The system is able to detect text in real-time(30FPS) and recognize text in nearly real-time(8~15 FPS, depends on number of texts) for a 640x480 resolution image on a Intel Core i7 desktop computer. The algorithm's end-to-end text detection accuracy on ICDAR dataset 2015 is roughly 70% with fine tune, and end-to-end recognition accuracy is about 30%.

Result

Detection result on IDCAR 2015

result1 result2 result3

Recognition result on random image

result4 result5

Linear Time MSER Demo

The green pixels are so called boundry pixels, which are pushed into stacks. Each stack stand for a gray level, and pixels will be pushed according to their gary level. result4

References

  1. D. Nister and H. Stewenius, “Linear time maximally stable extremal regions,” European Conference on Computer Vision, pages 183196, 2008.
  2. L. Neumann and J. Matas, “A method for text localization and recognition in real-world images,” Asian Conference on Computer Vision, pages 770783, 2010.
  3. L. Neumann and J. Matas, “Real-time scene text localization and recognition,” Computer Vision and Pattern Recognition, pages 35383545, 2012.
  4. L. Neumann and J. Matas, “On combining multiple segmentations in scene text recognition,” International Conference on Document Analysis and Recognition, pages 523527, 2013.
  5. H. Cho, M. Sung and B. Jun, ”Canny Text Detector: Fast and robust scene text localization algorithm,” Computer Vision and Pattern Recognition, pages 35663573, 2016.
  6. B. Epshtein, E. Ofek, and Y. Wexler, “Detecting text in natural scenes with stroke width transform,” Computer Vision and Pattern Recognition, pages 29632970, 2010.
  7. P. Viola and M. J. Jones, “Rapid object detection using a boosted cascade of simple features,” Computer Vision and Pattern Recognition, pages 511518, 2001.

scene-text-recognition's People

Contributors

flriancu avatar hsiehyichia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scene-text-recognition's Issues

Error executing with a video argument

Hello,

I've successfully compiled the solution and I've been able to test it with some images. However, when passing a video file as argument, it seems that it only process one frame, the first one. I am using as it is written in the docs: "scene_text_recognition.exe -v videofile.name". Am I doing something wrong?

Thanks!
Ana

Documentation on Training

@HsiehYiChia Thank you for you hard work

After reading your replies on issue #6, it is still unclear how to train our own model.
Can you simplify the training process to us.

What does it mean to be “strong” and “weak”?

I know it uses the machine classifier to judge non-text.
But i didn't know what does it mean to be “strong” and “weak”?
What are the "strong" and "weak" classifying based on?
I want to learn it and make it better to classify。

My English is no well,maybe you can't easily to read。
Hope to get your reply.

//Chinese 中文
strong和weak分类器是根据什么来判定的?
我想去深入了解一下并把它改进改进。

The steps of model training

Hi, Thank you for your work , Can you tell me The steps of model training more in detail , which function will be used, Thank you

error while making the scene text recognition

os system: ubuntu 16.04
cmake version 3.5.1
opencv : 3.4

g++ (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609

make -j8
[ 9%] Building CXX object CMakeFiles/svm-train.dir/src/svm-train.cpp.o
[ 18%] Building CXX object CMakeFiles/scene_text_recognition.dir/src/adaboost.cpp.o
[ 27%] Building CXX object CMakeFiles/scene_text_recognition.dir/src/ER.cpp.o
[ 36%] Building CXX object CMakeFiles/svm-train.dir/src/svm.cpp.o
[ 54%] Building CXX object CMakeFiles/scene_text_recognition.dir/src/main.cpp.o
[ 63%] Building CXX object CMakeFiles/scene_text_recognition.dir/src/SpellingCorrector.cpp.o
[ 72%] Building CXX object CMakeFiles/scene_text_recognition.dir/src/svm.cpp.o
[ 54%] Building CXX object CMakeFiles/scene_text_recognition.dir/src/OCR.cpp.o
[ 81%] Building CXX object CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o
[ 90%] Linking CXX executable svm-train
[ 90%] Built target svm-train
/home/giuser/Scene-text-recognition/src/utils.cpp: In function ‘void get_lbp_data()’:
/home/giuser/Scene-text-recognition/src/utils.cpp:1416:24: warning: ISO C++ forbids converting a string constant to ‘char*’ [-Wwrite-strings]
char data_filename = "training/detection_training_data.txt";
^
[100%] Linking CXX executable scene_text_recognition
CMakeFiles/scene_text_recognition.dir/src/ER.cpp.o: In function ERFilter::er_tree_extract(cv::Mat)': ER.cpp:(.text+0x165d): undefined reference to cv::error(int, cv::String const&, char const
, char const*, int)'
CMakeFiles/scene_text_recognition.dir/src/ER.cpp.o: In function cv::String::String(char const*)': ER.cpp:(.text._ZN2cv6StringC2EPKc[_ZN2cv6StringC5EPKc]+0x54): undefined reference to cv::String::allocate(unsigned long)'
CMakeFiles/scene_text_recognition.dir/src/ER.cpp.o: In function cv::String::~String()': ER.cpp:(.text._ZN2cv6StringD2Ev[_ZN2cv6StringD5Ev]+0x14): undefined reference to cv::String::deallocate()'
CMakeFiles/scene_text_recognition.dir/src/ER.cpp.o: In function cv::String::operator=(cv::String const&)': ER.cpp:(.text._ZN2cv6StringaSERKS0_[_ZN2cv6StringaSERKS0_]+0x28): undefined reference to cv::String::deallocate()'
CMakeFiles/scene_text_recognition.dir/src/OCR.cpp.o: In function OCR::extract_feature(cv::Mat&, svm_node*)': OCR.cpp:(.text+0xce1): undefined reference to cv::findContours(cv::_InputOutputArray const&, cv::OutputArray const&, int, int, cv::Point)'
OCR.cpp:(.text+0x10b8): undefined reference to cv::normalize(cv::_InputArray const&, cv::_InputOutputArray const&, double, double, int, int, cv::_InputArray const&)' CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function image_mode(ERFilter*, char*)':
utils.cpp:(.text+0x44e): undefined reference to cv::imread(cv::String const&, int)' CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function video_mode(ERFilter*, char*)':
utils.cpp:(.text+0x977): undefined reference to cv::VideoCapture::VideoCapture(cv::String const&)' utils.cpp:(.text+0xa99): undefined reference to cv::VideoWriter::fourcc(char, char, char, char)'
utils.cpp:(.text+0xaed): undefined reference to cv::VideoWriter::open(cv::String const&, int, double, cv::Size_<int>, bool)' utils.cpp:(.text+0xb32): undefined reference to cv::VideoWriter::fourcc(char, char, char, char)'
utils.cpp:(.text+0xb86): undefined reference to cv::VideoWriter::open(cv::String const&, int, double, cv::Size_<int>, bool)' utils.cpp:(.text+0x1800): undefined reference to cv::imwrite(cv::String const&, cv::_InputArray const&, std::vector<int, std::allocator > const&)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function load_challenge2_test_file(cv::Mat&, int)': utils.cpp:(.text+0x23f8): undefined reference to cv::imread(cv::String const&, int)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function load_challenge2_training_file(cv::Mat&, int)': utils.cpp:(.text+0x274b): undefined reference to cv::imread(cv::String const&, int)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function show_result(cv::Mat&, cv::Mat&, std::vector<Text, std::allocator<Text> >&, std::vector<double, std::allocator<double> >, std::vector<ER*, std::allocator<ER*> >, std::vector<std::vector<ER*, std::allocator<ER*> >, std::allocator<std::vector<ER*, std::allocator<ER*> > > >, std::vector<std::vector<ER*, std::allocator<ER*> >, std::allocator<std::vector<ER*, std::allocator<ER*> > > >, std::vector<std::vector<ER*, std::allocator<ER*> >, std::allocator<std::vector<ER*, std::allocator<ER*> > > >, std::vector<std::vector<ER*, std::allocator<ER*> >, std::allocator<std::vector<ER*, std::allocator<ER*> > > >)': utils.cpp:(.text+0x3526): undefined reference to cv::getTextSize(cv::String const&, int, double, int, int*)'
utils.cpp:(.text+0x372c): undefined reference to cv::putText(cv::_InputOutputArray const&, cv::String const&, cv::Point_<int>, int, double, cv::Scalar_<double>, int, int, bool)' utils.cpp:(.text+0x3b2b): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)'
utils.cpp:(.text+0x3ba5): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)' utils.cpp:(.text+0x3c1f): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)'
utils.cpp:(.text+0x3c99): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)' utils.cpp:(.text+0x3d13): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o:utils.cpp:(.text+0x3d77): more undefined references to cv::imshow(cv::String const&, cv::_InputArray const&)' follow CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function draw_FPS(cv::Mat&, double)':
utils.cpp:(.text+0x41a1): undefined reference to cv::putText(cv::_InputOutputArray const&, cv::String const&, cv::Point_<int>, int, double, cv::Scalar_<double>, int, int, bool)' CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function draw_linear_time_MSER(std::__cxx11::basic_string<char, std::char_traits, std::allocator >)':
utils.cpp:(.text+0x425d): undefined reference to cv::imread(cv::String const&, int)' utils.cpp:(.text+0x42c5): undefined reference to cv::VideoWriter::fourcc(char, char, char, char)'
utils.cpp:(.text+0x4319): undefined reference to cv::VideoWriter::open(cv::String const&, int, double, cv::Size_<int>, bool)' utils.cpp:(.text+0x4ca3): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function draw_multiple_channel(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)': utils.cpp:(.text+0x5707): undefined reference to cv::imread(cv::String const&, int)'
utils.cpp:(.text+0x5aaf): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)' utils.cpp:(.text+0x5b23): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)'
utils.cpp:(.text+0x5b97): undefined reference to cv::imshow(cv::String const&, cv::_InputArray const&)' utils.cpp:(.text+0x5c21): undefined reference to cv::imwrite(cv::String const&, cv::_InputArray const&, std::vector<int, std::allocator > const&)'
utils.cpp:(.text+0x5cba): undefined reference to cv::imwrite(cv::String const&, cv::_InputArray const&, std::vector<int, std::allocator<int> > const&)' utils.cpp:(.text+0x5d53): undefined reference to cv::imwrite(cv::String const&, cv::_InputArray const&, std::vector<int, std::allocator > const&)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function output_MSER_time(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)': utils.cpp:(.text+0x60c9): undefined reference to cv::imread(cv::String const&, int)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function output_optimal_path(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)': utils.cpp:(.text+0x6763): undefined reference to cv::imread(cv::String const&, int)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function load_gt(int)': utils.cpp:(.text+0x6cf3): undefined reference to cv::imread(cv::String const&, int)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function calc_recall_rate()': utils.cpp:(.text+0x7b2e): undefined reference to cv::MSER::create(int, int, int, double, double, int, double, double, int)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function bootstrap()': utils.cpp:(.text+0xd10f): undefined reference to cv::imread(cv::String const&, int)'
utils.cpp:(.text+0xd34d): undefined reference to cv::imwrite(cv::String const&, cv::_InputArray const&, std::vector<int, std::allocator<int> > const&)' utils.cpp:(.text+0xd4c2): undefined reference to cv::imwrite(cv::String const&, cv::_InputArray const&, std::vector<int, std::allocator > const&)'
CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function get_lbp_data()': utils.cpp:(.text+0xd9df): undefined reference to cv::imread(cv::String const&, int)'
utils.cpp:(.text+0xdc31): undefined reference to cv::imread(cv::String const&, int)' CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function get_ocr_data()':
utils.cpp:(.text+0xee47): undefined reference to cv::imread(cv::String const&, int)' CMakeFiles/scene_text_recognition.dir/src/utils.cpp.o: In function cv::String::String(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)':
utils.cpp:(.text._ZN2cv6StringC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE[_ZN2cv6StringC5ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE]+0x5d): undefined reference to `cv::String::allocate(unsigned long)'
collect2: error: ld returned 1 exit status
CMakeFiles/scene_text_recognition.dir/build.make:268: recipe for target 'scene_text_recognition' failed
make[2]: *** [scene_text_recognition] Error 1
CMakeFiles/Makefile2:104: recipe for target 'CMakeFiles/scene_text_recognition.dir/all' failed
make[1]: *** [CMakeFiles/scene_text_recognition.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

can any help me out
Thanks

Poor Results

Dear HsiehYiChia,

I run the model with the ".\scene_text_recognition.bat -i res\ICDAR2015_test\img_6.jpg" command and I noticed that the result were not the same with the one mentioned. For example I was expecting to see the result1.jpg but the output was the result. Can you tell me how to fix it? Thank you in advance.
133934124_135773374894865_1616194843260061321_n

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.