Coder Social home page Coder Social logo

tesstrain-win's Introduction

tesstrain-win

Train Tesseract LSTM with make on Windows

About tesstrain-win

The tesstrain-win comes from the tesseract-ocr/tesstrain , In order to make it run on Windows, there are some changes to the makefile and the overall file structure.

The ocrd-train(OCR-D/ocrd-train) in tesstrain-win is the Predecessor of tesseract-ocr/tesstrain. it Could help us understand the makefile of tesseract-ocr/tesstrain.

The file structure in tesstrain-win: image

Recommendations for choosing a training method

image

Requirements

tesseract

You will need a recent version (>= 4.0.0beta1) of tesseract built with the training tools and matching leptonica bindings.

Build instructions and more can be found in the Tesseract project wiki.

Build tesseract instructions on Windows can be found in the Tesseract4.0+VS2017+win10.

Python

You need a recent version of Python 3.x. For image processing the Python library Pillow is used.

Cygwin

In order to run the makefile on Windows, you need the Cygwin. Install instructions could refer to Install Cygwin on Win10 for makefile

How to Use tesstrain-win

Before training your own database, it is recommended to train ocrd-testset.zip first.

If the ocrd-testset.zip can be trained normally, it means that the current computer training environment is OK.

How to train ocrd-testset.zip

  1. Extract ./data/foo-ground-truth/ocrd-testset.zip to ./data/foo-ground-truth.

  2. Run the command prompt as an administrator, Go to the tesstrain-win directory, e. g.:

cd %USERPROFILE%/tesstrain-win
  1. run make training
make training

How to train your own database

  1. Give your database a name

    You could give the name by change the line 11 in makefile

MODEL_NAME = New_Name

Or you could give the name when you run make training

make training MODEL_NAME=New_Name
  1. Prepare the base traineddata

    If you train from scratch, no need to do this. If you train Fine-tune, download the base traineddata from the tessdata_best,and Place it to the ./data/tessdata.

  2. update the foo.numbers/foo.punc/foo.wordlist in data filefolder

    The three files should be consistent with the base traineddata or the relevant language you are training.

    e.g. :if your base traineddata is eng, You could download them from langdata_lstm/eng.But you need to rename them separately:New_Name.numbers/New_Name.punc/New_Name.wordlist after download.

  3. Prepare the ground truth

    Place ground truth consisting of line images and transcriptions in the folder data/MODEL_NAME-ground-truth. This list of files will be split into training and evaluation data, the ratio is defined by the RATIO_TRAIN variable.

    Images must be TIFF and have the extension .tif or PNG and have the extension .png, .bin.png or .nrm.png.

    Transcriptions must be single-line plain text and have the same name as the line image but with the image extension replaced by .gt.txt.

  4. Run the command prompt as an administrator, Go to the tesstrain-win directory, e. g.:

cd %USERPROFILE%/tesstrain-win
  1. run make training
make training

More Information About Train Tesseract LSTM

More information about Train Tesseract LSTM could refer to:

Train Tesseract LSTM methods Comparison

Train Tesseract LSTM with make on Windows

How the makefile in tesstrain-win work

Train Tesseract LSTM with tesstrain.sh on Windows

Win10 Tesseract4.1 LSTM training

tesstrain-win's People

Contributors

livezingy avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.