Coder Social home page Coder Social logo

mobile-ocr's Introduction

MobileOCR

Keywords: OCR, Page Dewarping, Deep Learning, Tesseract-OCR, DAS2018, ICDAR2017

Conference Paper

1. Introduction: Camera-based document analysis

This repository is part of the Master Thesis "Camera-based Document Analysis based on Deep Learning and OCR".

Capturing document images with the Smartphone provide a convenient way to digitize physical documents and facilitate the automation of document processing and information retrieval. In contrast to flatbed scans, camera-captured documents require a more sophisticated preprocessing pipeline, because of perspective distortions, suboptimal lighting and physically deformed documents. The main goal of this work was to:

  • build an end-to-end OCR-Pipeline (input: document image, output: full text transcription) based on the best Open Source solution currently available.
  • analyze Deep Learning techniques to deal with one of the major challenges discussed at the DAS2018 workshop in the domain of camera-based document analysis: Page Dewarping (in particular perspective distortions and folded/ curved documents).

A high-level overview is illustrated in the following figure:

This repo contains a demo application for the by the page dewarping component extended OCR pipeline (Tesseract 4.0).

2. Proposed Method

Methodically, different neural network architectures were investigated on a large-scale synthetic dataset to estimate the document's corner points from a single input image, without prior assumptions. The distorted image is then mapped to its canonical position by using the 4-point homography parameterization. The best result is achieved by a modified Xception-network, with a mean displacement error of 3.38px. Finally, the correction component is integrated into Tesseract 4.0 and evaluated on the SmartDoc 2015 challenge 2 test set. Experiments show that the correction component improves the character accuracy results by more than 15 percentage points (93.11%), in comparison to Tesseract alone (77.27%).

3. Demo Examples

Page Dewarping results:

By tesseract recognized textlines before after dewarping:

4. Setup

  1. Install Tesseract OCR; at time of writing, tesseract 4.0.0-beta.1 was used as OCR engine.

  2. Download homography_model into /res/homographyModel/

  3. Install dependencies (using conda virtualenv)

    conda env create -f environment.yml
    # note: to use gpu support, exchange tensorflow with tensorflow-gpu (environment.yml)

5. Usage

To test different pipeline modes, consider ocrMaster; to test the page dewarping performance test_homographyDL.py.

6. File Structure

doc
    ├── ...                                                          # README resources
res                               
    ├── homographyModel/                                             # dir to trained homography model 
    ├── smartDocSamples/                                             # smartDoc challenge 2 test set samples
    ├── smartDocSamplesOutput/                                       # mobileOCR results are stored here   
src
    ├── pipeline/                                                    # mobileOCR pipeline
        ├── dl_homograhpy_tf/                                        # deep learning: homography; dewarping 
        ├── modes/                                                   # different OCR pipieline modes
        ├── textline_recognition/                                    # Tesseract 4.0 python wrapper 
        ├── ocrMaster.py                                             # interface to OCR pipeline
environment.yml                                                      # dependencies (easy setup)

License

mobileOCR_License

mobile-ocr's People

Contributors

nikolai10 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.