Coder Social home page Coder Social logo

ocrd_kraken's Introduction

ocrd_kraken

OCR-D wrapper for the Kraken OCR engine

CI Docker Automated build image

Introduction

This package offers OCR-D compliant workspace processors for (some of) the functionality of Kraken.

(Each processor is a parameterizable step in a configurable workflow of the OCR-D functional model. There are usually various alternative processor implementations for each step. Data is represented with METS and PAGE.)

It includes image preprocessing (binarization), layout analysis (region and line+baseline segmentation), and text recognition.

Installation

With Docker

This is the best option if you want to run the software in a container.

You need to have Docker

docker pull ocrd/kraken

To run with Docker:

docker run --rm \
-v path/to/workspaces:/data \
-v path/to/models:/usr/local/share/ocrd-resources \
ocrd/kraken ocrd-kraken-recognize ...
# or ocrd-kraken-segment or ocrd-kraken-binarize

Native, from PyPI

This is the best option if you want to use the stable, released version.

pip install ocrd_kraken

Native, from git

Use this option if you want to change the source code or install the latest, unpublished changes.

We strongly recommend to use venv.

git clone https://github.com/OCR-D/ocrd_kraken
cd ocrd_kraken
sudo make deps-ubuntu # or manually from git or via ocrd_all
make deps        # or pip install -r requirements.txt
make install     # or pip install .

Models

Kraken uses data-driven (neural) models for segmentation and recognition, but comes with no pretrained "official" models. There is a public repository of community-provided models, which can also be queried and downloaded from via kraken standalone CLI. (See Kraken docs for details.)

For the OCR-D wrapper, since all OCR-D processors must resolve file/data resources in a standardized way, there is a general mechanism for managing models, i.e. installing and using them by name. We currently manage our own list of recommended models (without delegating to the above repo).

Models always use the filename suffix .mlmodel, but are just loaded by their basename.

See the OCR-D model guide and

ocrd resmgr --help

Usage

For details, see docstrings in the individual processors and ocrd-tool.json descriptions, or simply --help.

Available OCR-D processors are:

  • ocrd-kraken-binarize (nlbin – not recommended)
    • adds AlternativeImage files (per page, region or line) to the output fileGrp
  • ocrd-kraken-segment (all-in-one segmentation – recommended for handwriting and simply layouted prints)
    • adds TextRegions, TableRegions, ImageRegions, MathsRegions, NoiseRegions, ReadingOrder and AlternativeImage to Page (depending on model training)
    • adds TextLines to TextRegions, including their Baseline
  • ocrd-kraken-recognize (benefits from annotated Baselines, falls back to center-normalized bboxes)
    • adds Words to TextLines
    • adds Glyphs to Words
    • adds TextEquiv

Testing

make test

This downloads test data from https://github.com/OCR-D/assets under repo/assets, and runs some basic tests of the Python API.

Set PYTEST_ARGS="-s --verbose" to see log output (-s) and individual test results (--verbose).

ocrd_kraken's People

Contributors

bertsky avatar kba avatar mikegerber avatar stweil avatar wrznr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ocrd_kraken's Issues

use multi-model recognition

Kraken offers "multi-script" (actually multi-model) prediction in one pass, so instead of a fixed model, we could run with multiple models and use the annotated language and script mappings to select per-segment (as in ocrd-tesserocr-recognize with xpath_model).

IIUC, that would entail using mm_rpred (instead of rpred) and passing lang/script to bounds['boxes'][...]['tags'] (or bounds['lines'][...]['tags'] with baseline segmentation) and a dict from lang/script to model names as the first arg.

fallback to CPU if no GPU

It's unfortunate that Kraken itself requires selecting the computing device to have Pytorch use in advance.

For practical purposes, workflows should try to use CUDA if available. That's why ocrd_detectron2 falls back to cpu.

This should be implemented (and then documented) here as well.

Binarization creates 2 source files in target workspace

While binarizing image to a new workspace two tifs will be created.
The filename of the image is not the filename of the given mets.xml!
It seems the filename of the METS file in the cache directory!?

(source) Files are stored in the root directory of the workspace and looks like this:
file.path.to.old.workspace.filename
and
file.path.to.new.workspace.file.path.to.old.workspace.filename

The original file (OCR-D-IMG/filename) is missing in the new workspace!
(Inside METS is a reference to the first file mentioned above!)
Steps:

ocrd workspace validate
ocrd workspace clone -a -m mets.xml
cd /tmp/pyocrd-'xyz'
ocrd-kraken-binarize -w /new/target/dir

ocrd-kraken-ocr process call seems to be broken

Traceback (most recent call last):
  File "/home/j23d/.local/share/virtualenvs/ocrd_butler-o_KhKE38/lib/python3.6/site-packages/celery/app/trace.py", line 385, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/j23d/projects/ocrd_butler/ocrd_butler/celery_utils.py", line 21, in __call__
    return TaskBase.__call__(self, *args, **kwargs)
  File "/home/j23d/.local/share/virtualenvs/ocrd_butler-o_KhKE38/lib/python3.6/site-packages/celery/app/trace.py", line 648, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/j23d/projects/ocrd_butler/ocrd_butler/execution/tasks.py", line 82, in create_task
    **kwargs)
  File "/home/j23d/.local/share/virtualenvs/ocrd_butler-o_KhKE38/lib/python3.6/site-packages/ocrd/processor/base.py", line 56, in run_processor
    processor.process()
  File "/home/j23d/.local/share/virtualenvs/ocrd_butler-o_KhKE38/lib/python3.6/site-packages/ocrd_kraken/ocr.py", line 44, in process
    content=bin_image_bytes.getvalue())
  File "/home/j23d/.local/share/virtualenvs/ocrd_butler-o_KhKE38/lib/python3.6/site-packages/ocrd/workspace.py", line 162, in add_file
    raise Exception("'content' was set but no 'local_filename'")
Exception: 'content' was set but no 'local_filename'

I suspect that the call is not up to date with the current version of OCR-D core.

documentation: README completeness, debug ocrd-tool.json

Please debug your ocrd_tool.json file.
I found some errors:

<report valid="false">
  <error>[tools.ocrd-kraken-binarize.input_file_grp] 'OCR-D-IMG' is not of type 'array'</error>
  <error>[tools.ocrd-kraken-binarize.output_file_grp] 'OCR-D-IMG-BIN' is not of type 'array'</error>
  <error>[tools.ocrd-kraken-binarize.parameters.level-of-operation] 'description' is a required property</error>
  <error>[tools.ocrd-kraken-segment] 'input_file_grp' is a required property</error>
  <error>[tools.ocrd-kraken-segment] 'output_file_grp' is a required property</error>
  <error>[tools.ocrd-kraken-segment.parameters.maxcolseps] 'description' is a required property</error>
  <error>[tools.ocrd-kraken-segment.parameters.scale] 'description' is a required property</error>
  <error>[tools.ocrd-kraken-segment.parameters.black_colseps] 'description' is a required property</error>
  <error>[tools.ocrd-kraken-segment.parameters.white_colseps] 'description' is a required property</error>
  <error>[tools.ocrd-kraken-ocr] 'input_file_grp' is a required property</error>
  <error>[tools.ocrd-kraken-ocr] 'output_file_grp' is a required property</error>
  <error>[tools.ocrd-kraken-ocr.parameters.lines-json.required] 'true' is not of type 'boolean'</error>
</report>

You can find the ocrd-tool.json documentation: https://ocr-d.github.io/ocrd_tool

Please check your README file and complet them. An ideal README file look like:

# Name of application


## Introduction
...

## Installation
...

## Usage
...

## Testing
...

Thank you very much.

recognize: word coordinates are often invalid

Currently, the _make_word approach (creating a Word with dummy coordinates first, then adding points glyph by glyph

current_word.get_Coords().points += ' ' + points_from_polygon(poly)
and finally them to a bbox when the next word starts,
current_word.get_Coords().points = points_from_bbox(*bbox_from_polygon(polygon_from_points(current_word.get_Coords().points.strip())))
IIUC) creates polygons which are semantically unsound, e.g. 141,1263 141,1343 141,1343 141,1263 (notice the same points repeating, so we actually have only 2 instead of 4 here).

Support --version

It would be nice if all tools support printing version.
ocrd-kraken-binarize --version is not supported.

ocrd-kraken-segment creates negative coordinates (=invalid PAGE)

Hi,

I have an example, where ocrd-kraken-segment creates negative coordinates (=invalid PAGE).
I just have used:

ocrd resmgr download ocrd-kraken-segment blla.mlmodel
ocrd-kraken-segment -I <inputFileGrp> -O <outputFileGrp>

example.zip
As Result I can see:

<pc:TextRegion id="region_line_36">
            <pc:Coords points="3040,382 3040,-2 3219,-2 3219,382 3216,575 3037,569"/>

Binarization: Created files have no file:GROUPID

ocrd-kraken-binarize --version
Version 0.0.1, ocrd/core 0.3.1
ocrd workspace find --file-grp OCR-D-KRAKEN-BIN --output-field ID
OCR-D-KRAKEN-BIN_0001
OCR-D-KRAKEN-BIN_0002

ocrd workspace find --file-grp OCR-D-KRAKEN-BIN --output-field groupId
None
None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.