Coder Social home page Coder Social logo

ocrd_anybaseocr's Introduction

Document Preprocessing and Segmentation

CircleCI PyPI

Tools to preprocess and segment scanned images for OCR-D

Installing

Requires Python >= 3.6.

  1. Create a new venv unless you already have one

     python3 -m venv venv
    
  2. Activate the venv

     source venv/bin/activate
    
  3. To install from source, get GNU make and do:

     make install
    

    There are also prebuilds available on PyPI:

     pip install ocrd_anybaseocr
    

(This will install both PyTorch and TensorFlow, along with their dependents.)

Tools

All tools, also called processors, abide by the CLI specifications for OCR-D, which roughly looks like:

ocrd-<processor-name> [-m <path to METs input file>] -I <input group> -O <output group> [-p <path to parameter file>]* [-P <param name> <param value>]*

Binarizer

Method Behaviour

For each page (or sub-segment), this processor takes a scanned colored / gray scale document image as input and computes a binarized (black and white) image.

Implemented via rule-based methods (percentile based adaptive background estimation in Ocrolib).

Example

ocrd-anybaseocr-binarize -I OCR-D-IMG -O OCR-D-BIN -P operation_level line -P threshold 0.3

Deskewer

Method Behaviour

For each page (or sub-segment), this processor takes a document image as input and computes the skew angle of that. It also annotates a deskewed image.

The input images have to be binarized for this module to work.

Implemented via rule-based methods (binary projection profile entropy maximization in Ocrolib).

Example

ocrd-anybaseocr-deskew -I OCR-D-BIN -O OCR-D-DESKEW -P maxskew 5.0 -P skewsteps 20 -P operation_level page

Cropper

Method Behaviour

For each page, this processor takes a document image as input and computes the border around the page content area (i.e. removes textual noise as well as any other noise around the page frame). It also annotates a cropped image.

The input image does not need to be binarized, but should be deskewed for the module to work optimally.

Implemented via rule-based methods (gradient-based line segment detection and morphology based textline detection).

Example:

ocrd-anybaseocr-crop -I OCR-D-DESKEW -O OCR-D-CROP -P rulerAreaMax 0 -P marginLeft 0.1

Dewarper

Method Behaviour

For each page, this processor takes a document image as input and computes a morphed image which will make the text lines straight if they are curved.

The input image has to be binarized for the module to work, and should be cropped and deskewed for optimal quality.

Implemented via data-driven methods (neural GAN conditional image model trained with pix2pixHD/Pytorch).

Models

ocrd resmgr download ocrd-anybaseocr-dewarp '*'

Example

ocrd-anybaseocr-dewarp -I OCR-D-CROP -O OCR-D-DEWARP -P resize_mode none -P gpu_id -1

Text/Non-Text Segmenter

Method Behaviour

For each page, this processor takes a document image as an input and computes two images, separating the text and non-text parts.

The input image has to be binarized for the module to work, and should be cropped and deskewed for optimal quality.

Implemented via data-driven methods (neural pixel classifier model trained with Tensorflow/Keras).

Models

ocrd resmgr download ocrd-anybaseocr-tiseg '*'

Example

ocrd-anybaseocr-tiseg -I OCR-D-DEWARP -O OCR-D-TISEG -P use_deeplr true

Block Segmenter

Method Behaviour

For each page, this processor takes the raw document image as an input and computes a text region segmentation for it (distinguishing various types of text blocks).

The input image need not be binarized, but should be deskewed for the module to work optimally.

Implemented via data-driven methods (neural Mask-RCNN instance segmentation model trained with Tensorflow/Keras).

Models

ocrd resmgr download ocrd-anybaseocr-block-segmentation '*'

Example

ocrd-anybaseocr-block-segmentation -I OCR-D-TISEG -O OCR-D-BLOCK -P active_classes '["page-number", "paragraph", "heading", "drop-capital", "marginalia", "caption"]' -P min_confidence 0.8 -P post_process true

Textline Segmenter

Method Behaviour

For each page (or region), this processor takes a cropped document image as an input and computes a textline segmentation for it.

The input image should be binarized and deskewed for the module to work.

Implemented via rule-based methods (gradient and morphology based line estimation in Ocrolib).

Example

ocrd-anybaseocr-textline -I OCR-D-BLOCK -O OCR-D-LINE -P operation_level region

Document Analyser

Method Behaviour

For the whole document, this processor takes all the cropped page images and their corresponding text regions as input and computes the logical structure (page types and sections).

The input image should be binarized and segmented for this module to work.

Implemented via data-driven methods (neural Inception-V3 image classification model trained with Tensorflow/Keras).

Models

ocrd resmgr download ocrd-anybaseocr-layout-analysis '*'

Example

ocrd-anybaseocr-layout-analysis -I OCR-D-LINE -O OCR-D-STRUCT

Testing

To test the tools under realistic conditions (on OCR-D workspaces), download OCR-D/assets. In particular, the code is tested with the dfki-testdata dataset.

To download the data:

make assets

To run module tests:

make test

To run processor/workflow tests:

make cli-test

License

 Licensed under the Apache License, Version 2.0 (the "License");
 you may not use this file except in compliance with the License.
 You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.

ocrd_anybaseocr's People

Contributors

bertsky avatar kba avatar khurramhashmi avatar mahmed1995 avatar mjenckel avatar n00blet avatar stweil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ocrd_anybaseocr's Issues

Issue with de-warp - strange result

Hi Martin,
we met shortly in Bonn.
You have explained the de-warping which was very interesting for me.
I have tried out a bit - and after some environment issues, I could run it.
Unfortunately, the results (visible as image in OCR-D-IMG-DEWARP) looks very strange.
Maybe I have used wrong parameters.
I have used the model which is decribed in anybaseocr/models/readme.md:
https://cloud.dfki.de/owncloud/index.php/s/3zKza5sRfQB3ygy
I have used the following parameter sets:
{
"gpu_id":0,
"pix2pixHD":"/home/stefan/ocrd_all/pix2pixHD",
"model_name":"/home/stefan/ocrd_all/pix2pixHD/models/"
}
You can download the image examples here:
ftp://ftp.ccs-gmbh.net (directory example1)
User: OCRDExamples
Password: OcRd%123!

Improve documentation for dewarp

Requirements:
Installation of pix2pixHD
pip3 install torch torchvision
pip install dominate
parameter model_name seems to be ignored
set parameter for pix2pixHD

Use of AlternativeImage and Regions

With 44247ab we tried to add AlternativeImage functionality to "binarize", "deskew", "cropping" and "dewarp". However there were some questions whether we used AlternativeImage correctly. Currently each Module expects the PAGE-XML output of the previous model as input, adds a new alternative image + eventual XML output (orientation, border coords) to the PAGE-XML and saves it as a new output. For this it expects two output folders, one for the AlternativeImg output, one of the output of the new PAGE-XML. If they are the same there will be an error about already existing files (IMG and PAGE file will have the same fileID).
Alongside this there was also a question about AlternativeImage regions. While deskewing and cropping only has very limited application to regions, binarization and dewarping on regions might be useful. Due to the limitations of pix2pixHD, dewarping requires files on the HD rather than just images in PIL format. This works for AlternativeImages, but the question remains if AlternativeImage Regions also exist on the hard drive or are created from the original image as required.

ocrd/all:maximum not detecting GPUs

Inside the Docker container ocrd/all:maximum, the processor ocrd-anybaseocr-dewarp fails to run, because no GPUs can be detected.

$ docker run --gpus all --rm -it -v "${PWD}/data/:/data" ocrd/all:maximum
root@18f58a93eeae:/data# cd ocrd_test/
root@18f58a93eeae:/data/ocrd_test# ocrd-anybaseocr-dewarp -I img-BINPAGE -O img-DEWARP
12:34:26.607 INFO matplotlib.font_manager - generated new fontManager
Using TensorFlow backend.
12:34:27.988 INFO ocrd.workspace_validator - input_file_grp=['img-BINPAGE'] output_file_grp=['img-DEWARP']
12:34:27.989 INFO OcrdAnybaseocrDewarper - No output file group for images specified, falling back to 'OCR-D-IMG-DEWARP'
12:34:27.990 ERROR OcrdAnybaseocrDewarper - Your system has no CUDA installed. No GPU detected.

However, running nvidia-smi reveals that a GPU is available inside the Docker container.

root@18f58a93eeae:/data/ocrd_test# nvidia-smi
Tue Mar  3 12:34:38 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P2000        On   | 00000000:01:00.0 Off |                  N/A |
| 44%   30C    P8     4W /  75W |      1MiB /  5049MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
root@18f58a93eeae:/data/ocrd_test# 

Does the system (inside the Docker container) require additional packages?

tiseg: weights model not found even though it exists

(d2) home@home-lnx:~/programs/ocrd_anybaseocr$ ocrd-anybaseocr-tiseg -m ./data/mets.xml -I OCR-D-PAGE-CROP -O OCR-D-PAGE-TISEG -p ./models/seg_model.hdf5
Using TensorFlow backend.
2020-08-04 16:20:51.720748: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory
2020-08-04 16:20:51.720804: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory
2020-08-04 16:20:51.720811: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/d2/bin/ocrd-anybaseocr-tiseg", line 8, in <module>
    sys.exit(cli())
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/click/core.py", line 781, in main
    with self.make_context(prog_name, args, **extra) as ctx:
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/click/core.py", line 700, in make_context
    self.parse_args(ctx, args)
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/click/core.py", line 1048, in parse_args
    value, args = param.handle_parse_result(ctx, opts, args)
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/click/core.py", line 1630, in handle_parse_result
    value = invoke_param_callback(self.callback, ctx, self, value)
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/click/core.py", line 123, in invoke_param_callback
    return callback(ctx, param, value)
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/ocrd/decorators.py", line 28, in _handle_param_option
    return parse_json_string_or_file(*list(value))
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/ocrd_utils/str.py", line 151, in parse_json_string_or_file
    value_parsed = parse_json_string_with_comments(f.read())
  File "/home/home/anaconda3/envs/d2/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

Duplicate heading in README files

Both tool-specific README files have the same heading although they are referring to different processing steps. Maybe a single README.md file would be more useful for the users?

Stricter cropping

A DFG requirement when scanning is to show a part of the opposite page. On some pages this tends to be a problem, since anybaseocr-crop does not crop the text and later tools detect text/characters where they shouldn't.

Here are two examples.

cropping_1
cropping_2

What would be a strategy to tackle this?

"UnboundLocalError" in block segmentation

Pls. help:

$ ocrd-anybaseocr-block-segmentation -I ORIGINAL -O DFKIBS,DFKIBS-IMG -m mets.xml -p '{"block_segmentation_model" : "/home/kmw/Documents/Work/OCR-D/models/DFKI", "block_segmentation_weights" : "/home/kmw/Documents/Work/OCR-D/models/DFKI/block_segmentation_weights.h5"}'
2020-01-22 13:06:40.432457: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-01-22 13:06:40.435024: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
Using TensorFlow backend.
13:06:41.043 WARNING tensorflow - From /home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/tensorflow_core/python/compat/v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
13:06:41.172 INFO ocrd.workspace_validator - input_file_grp=['ORIGINAL'] output_file_grp=['DFKIBS', 'DFKIBS-IMG']
['ORIGINAL']
13:06:41.181 WARNING tensorflow - From /home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py:68: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
13:06:44.158 WARNING tensorflow - From /home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1635: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
13:06:46.405 WARNING tensorflow - From /home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd_anybaseocr/mrcnn/model.py:427: calling crop_and_resize_v1 (from tensorflow.python.ops.image_ops_impl) with box_ind is deprecated and will be removed in a future version.
Instructions for updating:
box_ind is deprecated, use box_indices instead
13:06:46.671 WARNING tensorflow - From /home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd_anybaseocr/mrcnn/model.py:776: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.cast` instead.
Processing 1 images
image                    shape: (2754, 2044, 3)       min:    0.00000  max:  255.00000  uint8
molded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  136.20000  float64
image_metas              shape: (1, 27)               min:    0.00000  max: 2754.00000  float64
anchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32
Traceback (most recent call last):
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/bin/ocrd-anybaseocr-block-segmentation", line 11, in <module>
    load_entry_point('ocrd-anybaseocr==0.0.1', 'console_scripts', 'ocrd-anybaseocr-block-segmentation')()
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 54, in ocrd_anybaseocr_block_segmentation
    return ocrd_cli_wrap_processor(OcrdAnybaseocrBlockSegmenter, *args, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd/decorators.py", line 60, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd/processor/base.py", line 57, in run_processor
    processor.process()
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 110, in process
    self._process_segment(page_image, page, page_xywh, page_id, input_file, n, mrcnn_model, class_names)
  File "/home/kmw/Documents/Work/OCR-D/dbg-env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 193, in _process_segment
    if border:
UnboundLocalError: local variable 'border' referenced before assignment

ocrd-anybaseocr-crop: Illegal instruction

Hello,
actually, on a plain Ubuntu 18.04 LTS Server VM the run od ocrd-anybaseocr-crop via ocrd/all:maximum fails with (last output lines, for full output see 1000.ulb-ocrd-vd18-01.log
)

08:15:43.587 INFO ocrd-olena-binarize - processing image/png input file IMG_OCR-D-IMG-PNG_1246738 ()
Warning: integral_browsing - Adjusting window width since it was larger than image height.
Warning: integral_browsing - Adjusting window height since it was larger than image width.
Warning: integral_browsing - Adjusting window width since it was larger than image height.
Warning: integral_browsing - Adjusting window width since it was larger than image height.
Warning: integral_browsing - Adjusting window height since it was larger than image width.
Warning: integral_browsing - Adjusting window width since it was larger than image height.
Warning: integral_browsing - Adjusting window width since it was larger than image height.
Warning: integral_browsing - Adjusting window height since it was larger than image width.
Warning: integral_browsing - Adjusting window width since it was larger than image height.
2020-04-27 08:15:45,221.221 INFO ocrd.workspace - Saving mets '/data/1000/mets.xml'
2020-04-27 08:15:46,668.668 INFO ocrd.workspace - Saving mets '/data/1000/mets.xml'
building OCR-D-SEG-PAGE-anyocr from OCR-D-BINPAGE-sauvola with pattern rule for ocrd-anybaseocr-crop
ocrd workspace remove-group -r OCR-D-SEG-PAGE-anyocr 2>/dev/null || true
ocrd-anybaseocr-crop -I OCR-D-BINPAGE-sauvola -O OCR-D-SEG-PAGE-anyocr -p OCR-D-SEG-PAGE-anyocr.json 2>&1 | tee OCR-D-SEG-PAGE-anyocr.log && touch -c OCR-D-SEG-PAGE-anyocr || { rm -fr OCR-D-SEG-PAGE-anyocr.json OCR-D-SEG-PAGE-anyocr; exit 1; }
2020-04-27 08:15:49,086.086 INFO matplotlib.font_manager - generated new fontManager
bash: line 1:  2230 Illegal instruction     (core dumped) ocrd-anybaseocr-crop -I OCR-D-BINPAGE-sauvola -O OCR-D-SEG-PAGE-anyocr -p OCR-D-SEG-PAGE-anyocr.json 2>&1
      2231 Done                    | tee OCR-D-SEG-PAGE-anyocr.log
Makefile:304: recipe for target 'OCR-D-SEG-PAGE-anyocr' failed
make[1]: *** [OCR-D-SEG-PAGE-anyocr] Error 1
make[1]: Leaving directory '/data/1000'

Anybody got an idea what's wrong?

block-segmentation:

I'll get ValueError: tile cannot extend outside image

Images (850 MB): https://digi.ub.uni-heidelberg.de/diglitData/v/testset-5-zeitschr-ca-1870.zip

  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.6/site-packages/ocrd/cli/process.py", line 27, in process_cli
    run_tasks(mets, log_level, page_id, tasks, overwrite)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/lib/python3.6/site-packages/ocrd/task_sequence.py", line 153, in run_tasks
    raise Exception("%s exited with non-zero return value %s. STDOUT:\n%s\nSTDERR:\n%s" % (task.executable, returncode, out, err))
Exception: ocrd-anybaseocr-block-segmentation exited with non-zero return value 1. STDOUT:
b'Processing 1 images\nimage                    shape: (3497, 2481, 3)       min:    0.00000  max:  255.00000  uint8\nmolded_images            shape: (1, 1024, 1024, 3)    min: -123.70000  max:  134.10000  float64\nimage_metas              shape: (1, 27)               min:    0.00000  max: 3497.00000  float64\nanchors                  shape: (1, 261888, 4)        min:   -0.35390  max:    1.29134  float32\n'
STDERR:
2020-08-28 12:54:45.245448: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:
2020-08-28 12:54:45.246576: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/lib:
2020-08-28 12:54:45.246640: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
12:55:05.253 WARNING tensorflow - From /dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/tensorflow_core/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Using TensorFlow backend.
12:55:10.732 WARNING OcrdAnybaseocrBlockSegmenter - Tensorflow cannot detect CUDA installation. Running without GPU will be slow.
12:55:28.139 INFO ocrd.workspace - created file ID: OCR-D-N6_00001_0, file_grp: OCR-D-N6, path: OCR-D-N6/OCR-D-N6_00001_0.png
12:55:28.719 INFO ocrd.workspace - created file ID: OCR-D-N6_00001_1, file_grp: OCR-D-N6, path: OCR-D-N6/OCR-D-N6_00001_1.png
12:55:28.731 INFO ocrd.workspace - created file ID: OCR-D-N6_00001_2, file_grp: OCR-D-N6, path: OCR-D-N6/OCR-D-N6_00001_2.png
12:55:29.562 INFO ocrd.workspace - created file ID: OCR-D-N6_00001_3, file_grp: OCR-D-N6, path: OCR-D-N6/OCR-D-N6_00001_3.png
Traceback (most recent call last):
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/bin/ocrd-anybaseocr-block-segmentation", line 8, in <module>
    sys.exit(cli())
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 391, in cli
    return ocrd_cli_wrap_processor(OcrdAnybaseocrBlockSegmenter, *args, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/ocrd/decorators.py", line 102, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/ocrd/processor/helpers.py", line 69, in run_processor
    processor.process()
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 124, in process
    self._process_segment(page_image, page, page_xywh, page_id, input_file, n, mrcnn_model, class_names, mask_image)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 349, in _process_segment
    region_img = ocrolib.array2pil(region_img)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/ocrolib/common.py", line 144, in array2pil
    return PIL.Image.frombytes("RGB",(a.shape[1],a.shape[0]),a.tostring())
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/PIL/Image.py", line 2650, in frombytes
    im.frombytes(data, decoder_name, args)
  File "/dwork/ocrd-schroot-ubuntu-eoan/usr/local/ocrd_all/venv/local/sub-venv/headless-tf22/lib/python3.6/site-packages/PIL/Image.py", line 797, in frombytes
    d.setimage(self.im)
ValueError: tile cannot extend outside image


Command exited with non-zero status 1

workflow:

. /usr/local/ocrd_all/venv/bin/activate
export TMPDIR=/dwork/tmp
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
ocrd-create-mets.xml
( /usr/bin/time ocrd process \
"olena-binarize -I OCR-D-IMG -O OCR-D-N1 -P impl wolf" \
"anybaseocr-crop -I OCR-D-N1 -O OCR-D-N2" \
"olena-binarize -I OCR-D-N2 -O OCR-D-N3 -P impl wolf" \
"cis-ocropy-denoise -I OCR-D-N3 -O OCR-D-N4 -P level-of-operation page" \
"cis-ocropy-deskew -I OCR-D-N4 -O OCR-D-N5 -P level-of-operation page" \
"anybaseocr-block-segmentation -I OCR-D-N5 -O OCR-D-N6 -P block_segmentation_weights /usr/local/ocrd_models/anybaseocr/block-segmentation/block_segmentation_weights.h5" \
"cis-ocropy-deskew -I OCR-D-N6 -O OCR-D-N7 -P level-of-operation region" \
"cis-ocropy-clip -I OCR-D-N7 -O OCR-D-N8 -P level-of-operation region" \
"cis-ocropy-segment -I OCR-D-N8 -O OCR-D-N9 -P level-of-operation region" \
"segment-repair -I OCR-D-N9 -O OCR-D-N10 -P sanitize true" \
"cis-ocropy-dewarp -I OCR-D-N10 -O OCR-D-N11" \
"calamari-recognize -I OCR-D-N11 -O OCR-D-OCR -P checkpoint /usr/local/ocrd_models/calamari/calamari_models-0.3/fraktur_19th_century/*.ckpt.json"

) >cmd.log 2>&1

ocrd-tool.json validation issues

ocrd ocrd-tool ocrd-tool.json validate
<report valid="false">
  <error>[tools.ocrd-anybaseocr-tiseg] 'input_file_grp' is a required property</error>
  <error>[tools.ocrd-anybaseocr-tiseg] 'output_file_grp' is a required property</error>
  <error>[tools.ocrd-anybaseocr-tiseg.categories.0] 'text non-text segment' is not one of ['Image preprocessing', 'Layout analysis', 'Text recognition and optimization', 'Model training', 'Long-term preservation', 'Quality assurance']</error>
  <error>[tools.ocrd-anybaseocr-tiseg.steps.0] 'text/non-text/segment' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>
  <error>[tools.ocrd-anybaseocr-textline] 'input_file_grp' is a required property</error>
  <error>[tools.ocrd-anybaseocr-textline] 'output_file_grp' is a required property</error>
  <error>[tools.ocrd-anybaseocr-textline.parameters.usegauss] Additional properties are not allowed ('action' was unexpected)</error>
  <error>[tools.ocrd-anybaseocr-textline.parameters.blackseps] Additional properties are not allowed ('action' was unexpected)</error>
  <error>[tools.ocrd-anybaseocr-textline.categories.0] 'text line segment' is not one of ['Image preprocessing', 'Layout analysis', 'Text recognition and optimization', 'Model training', 'Long-term preservation', 'Quality assurance']</error>
  <error>[tools.ocrd-anybaseocr-textline.steps.0] 'text/line/segment' is not one of ['preprocessing/characterization', 'preprocessing/optimization', 'preprocessing/optimization/cropping', 'preprocessing/optimization/deskewing', 'preprocessing/optimization/despeckling', 'preprocessing/optimization/dewarping', 'preprocessing/optimization/binarization', 'preprocessing/optimization/grayscale_normalization', 'recognition/text-recognition', 'recognition/font-identification', 'recognition/post-correction', 'layout/segmentation', 'layout/segmentation/region', 'layout/segmentation/line', 'layout/segmentation/word', 'layout/segmentation/classification', 'layout/analysis']</error>
</report>

Add tests for all processors

Currently, the invocation of make test only runs a test for binarization. Please add tests for all other processors you are providing as well. Please also consider to include make test to your continuous integration. Right now, it is rather trivial and does not fulfill its purpose.

Module "mrcnn.utils" missing

Please help! With current master, I receive:

$ ocrd-anybaseocr-block-segmentation -J
Using TensorFlow backend.
WARNING: Logging before flag parsing goes to stderr.
W0120 13:29:43.066344 140353417594688 deprecation.py:323] From /home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/tensorflow_core/python/compat/v2_compat.py:65: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
  File "/home/kmw/Documents/Work/OCR-D/env/bin/ocrd-anybaseocr-block-segmentation", line 5, in <module>
    from ocrd_anybaseocr.cli.cli import ocrd_anybaseocr_block_segmentation
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 11, in <module>
    from ocrd_anybaseocr.cli.ocrd_anybaseocr_block_segmentation import OcrdAnybaseocrBlockSegmenter
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 30, in <module>
    from ocrd_anybaseocr.mrcnn import model
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/mrcnn/model.py", line 29, in <module>
    from ocrd_anybaseocr.mrcnn import utils
ImportError: cannot import name 'utils'

How can I get utils?

Block segmentation produce strange results

OCR-D-SEG-BLOCK_0001.txt

I used this dataset: https://ocr-d-repo.scc.kit.edu/api/v1/dataresources/16568d42-57be-4367-a335-4687bc84953e/data/weigel_gnothi02_1618.ocrd.zip
And the following workflow:
ocrd-anybaseocr-binarize
ocrd-anybaseocr-deskew
ocrd-anybaseocr-crop
ocrd-anybaseocr-block-segmentation with model from https://cloud.dfki.de/owncloud/index.php/s/dgACCYzytxnb7Ey/download
For the first page I got the attached result with wrong reference.
Also the metadata entry is missing.

Overwrite existing files in OCR-D-IMG-BIN

The binarizing overwrites existing files in OCR-D-IMG-BIN without any warning.
There should at least a warning in the documentation. Is there any possibility to define an
other output group for the images?

No module named 'ocrd_anybaseocr.pix2pixhd'

(d2) home@home-lnx:~/programs/ocrd_anybaseocr$ ocrd-anybaseocr-dewarp -m ./data/mets.xml -I OCR-D-PAGE-CROP -O OCR-D-PAGE-DEWARP -p ./dewarp/latest_net_G.pth
Traceback (most recent call last):
  File "/home/home/anaconda3/envs/d2/bin/ocrd-anybaseocr-dewarp", line 5, in <module>
    from ocrd_anybaseocr.cli.ocrd_anybaseocr_dewarp import cli
  File "/home/home/anaconda3/envs/d2/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 29, in <module>
    from ..pix2pixhd.options.test_options import TestOptions
ModuleNotFoundError: No module named 'ocrd_anybaseocr.pix2pixhd'

the commands i used to install ocrd_anybaseocr:

git clone https://github.com/OCR-D/ocrd_anybaseocr.git
git submodule update --init
make install

Using other than the last AlternativeImage

In our BlockSegmentation code we use raw input rather than the processed AlternativeImages. We achieved that by using the "feature_filter" and filtering for all other processing steps. This works quite nicely, however after adding the resulting text_regions to the page file like this:

        <pc:TextRegion type="paragraph">
            <pc:AlternativeImage filename="OCR-D-IMG-BLOCK-SEGMENT/OCR-D-IMG-BLOCK-SEGMENT_0001_0.png" comments=",blksegmented"/>
            <pc:Coords points="277,0 989,0 989,2022 277,2022"/>
        </pc:TextRegion>

we cant use them in any of the following processes. We get the following error:

16:42:50.204 WARNING ocrd_utils - crop coordinates ((1953, -222, 2454, 1690)) exceed image (1845x2324)

The problem seems to be, that any future process assumes the coordinates added by BlockSegmentation should be transformed according to any previous process (e.g. cropping), even though the comments do not mention any previous computation for this region.
@kba Is it also correct, that even though the mentioned AlternativeImages exist as files in the workspace, the processor prefers to calculate the regions from the image?

For now we will change it so BlockSegmentation uses the latest AlternativeImage rather than the raw image as input.

Problem with torch

ocrd-anybaseocr-crop --help
Traceback (most recent call last):
  File "/data/monorepo/venv3.6/bin/ocrd-anybaseocr-crop", line 11, in <module>
    load_entry_point('ocrd-anybaseocr', 'console_scripts', 'ocrd-anybaseocr-crop')()
  File "/data/monorepo/venv3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 489, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/data/monorepo/venv3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2852, in load_entry_point
    return ep.load()
  File "/data/monorepo/venv3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2443, in load
    return self.resolve()
  File "/data/monorepo/venv3.6/lib/python3.6/site-packages/pkg_resources/__init__.py", line 2449, in resolve
    module = __import__(self.module_name, fromlist=['__name__'], level=0)
  File "/data/monorepo/LAYoutERkennung/ocrd_anybaseocr/cli/cli.py", line 7, in <module>
    from ocrd_anybaseocr.cli.ocrd_anybaseocr_dewarp import OcrdAnybaseocrDewarper
  File "/data/monorepo/LAYoutERkennung/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 1, in <module>
    import torch
  File "/data/monorepo/venv3.6/lib/python3.6/site-packages/torch/__init__.py", line 81, in <module>
    from torch._C import *
ImportError: dlopen: cannot load any more object with static TLS

Have you encountered that and any idea how to fix it?

$ cat /etc/*release
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian

ocrd-anybaseocr-binarize handles multiple output groups not correct

ocrd-anybaseocr-binarize -m weigel/data/mets.xml -I OCR-D-IMG -O OCR-D-IMG-BIN-IMG,OCR-D-IMG-BIN -w weigel/data
create the following files:
weigel/data/OCR-D-IMG-BIN:
OCR-D-IMG-BIN_0001.png OCR-D-IMG-BIN_0002.png OCR-D-IMG-BIN_0003.png OCR-D-IMG-BIN_0004.png

weigel/data/OCR-D-IMG-BIN-IMG,OCR-D-IMG-BIN:
OCR-D-IMG-BIN-IMG,OCR-D-IMG-BIN_0001.xml OCR-D-IMG-BIN-IMG,OCR-D-IMG-BIN_0002.xml OCR-D-IMG-BIN-IMG,OCR-D-IMG-BIN_0003.xml OCR-D-IMG-BIN-IMG,OCR-D-IMG-BIN_0004.xml

Block segmentation produces almost always empty pages

I am running the following workflow on https://digital.slub-dresden.de/werkansicht/dlf/87237/1/(with https://digital.slub-dresden.de/data/kitodo/adrefudio_20253082Z_1907/adrefudio_20253082Z_1907_mets.xml):

  1. Cropping (ocrd-anybaseocr-crop)
  2. Binarization (ocrd-anybaseocr-binarize)
  3. Segmentation (ocrd-anybaseocr-block-segmentation)

For most pages, the block segmentation finds only a few and very often not any blocks. The blocks which are found do not correspond to a comprehensible segmentation. Often it is only the page number or some non-block. Consider for example
FILE_0039_BIN-IMG

The only “block” which is found by the block segmentation is:
FILE_0039_DFKIBS-IMG,DFKIBS-IMG-IMG_0

I would be very grateful if you could give me some hints how to improve this result. Maybe you could even try to process this book in your own environment to make sure that nothing is amiss with my setup.

Cropping Output

With 44247ab we added the AlternativeImage functionality to Cropping. Since we expect all modules to be used in a pipeline (see README.md) we were wondering if simply storing the border coordinates is correct. If we use Deskewing previous to Cropping and add an alternative deskewed image to the list of AlternativeImages, shouldnt we apply the inverse rotation on the found border coordinates so they apply to the original image rather than the deskewed image?

ocrd-anybaseocr-deskew fails on multiple output groups

The userguide states:

Note: For processors using multiple input-, or output groups you have to use a comma separated list.

E.g.:

ocrd-anybaseocr-crop -I OCR-D-IMG -O OCR-D-BIN,OCR-D-IMG-BIN

Deskew fails when specifying multiple output groups, because it does not split the output groups specified at comma (",").

root@d9e81adad635:/data/ocrd_workspace# ocrd-anybaseocr-deskew -m mets.xml -I OCR-D-IMG-BIN -O OCR-D-DESKEW-PAGE,OCR-D-DESKEW
Using TensorFlow backend.
09:23:03.986 INFO ocrd.workspace_validator - input_file_grp=['OCR-D-IMG-BIN'] output_file_grp=['OCR-D-DESKEW-PAGE', 'OCR-D-DESKEW']
09:23:09.585 INFO OcrdAnybaseocrDeskewer - INPUT FILE 0 / P_00001-PRE0001-NSJUSTIZ-BAND3-Teil_1-3_1_Vorwort_Einleitung_Inhalt-20190129T160242r
09:23:09.646 INFO OcrdAnybaseocrDeskewer - Estimating Skew Angle
09:23:15.556 INFO OcrdAnybaseocrDeskewer - Estimating Thresholds
09:23:19.107 INFO OcrdAnybaseocrDeskewer - Rescaling
09:23:19.483 INFO ocrd.workspace - created file ID: OCR-D-DESKEW_0001, file_grp: OCR-D-DESKEW, path: OCR-D-DESKEW/OCR-D-DESKEW_0001.png
Traceback (most recent call last):
  File "/usr/bin/ocrd-anybaseocr-deskew", line 8, in <module>
    sys.exit(ocrd_anybaseocr_deskew())
  File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 22, in ocrd_anybaseocr_deskew
    return ocrd_cli_wrap_processor(OcrdAnybaseocrDeskewer, *args, **kwargs)    
  File "/usr/lib/python3.6/site-packages/ocrd/decorators.py", line 54, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd/processor/base.py", line 57, in run_processor
    processor.process()
  File "/usr/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_deskew.py", line 137, in process
    content=to_xml(pcgts).encode('utf-8')
  File "/usr/lib/python3.6/site-packages/ocrd/workspace.py", line 192, in add_file
    ret = self.mets.add_file(file_grp, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd_models/ocrd_mets.py", line 224, in add_file
    el_fileGrp = self.add_file_group(fileGrp)
  File "/usr/lib/python3.6/site-packages/ocrd_models/ocrd_mets.py", line 174, in add_file_group
    raise Exception('fileGrp must not contain commas')
Exception: fileGrp must not contain commas

Deskewing runs successfully when specifying a single output group. However, it complains about a missing output file group for images.

root@d9e81adad635:/data/ocrd_workspace# ocrd-anybaseocr-deskew -m mets.xml -I OCR-D-IMG-BIN -O OCR-D-DESKEW
Using TensorFlow backend.
09:27:39.620 INFO ocrd.workspace_validator - input_file_grp=['OCR-D-IMG-BIN'] output_file_grp=['OCR-D-DESKEW']
09:27:39.624 INFO OcrdAnybaseocrDeskewer - No output file group for images specified, falling back to 'OCR-D-IMG-DESKEW'
09:27:45.275 INFO OcrdAnybaseocrDeskewer - INPUT FILE 0 / P_00001-PRE0001-NSJUSTIZ-BAND3-Teil_1-3_1_Vorwort_Einleitung_Inhalt-20190129T160242r
09:27:45.338 INFO OcrdAnybaseocrDeskewer - Estimating Skew Angle
09:27:51.250 INFO OcrdAnybaseocrDeskewer - Estimating Thresholds
09:27:54.802 INFO OcrdAnybaseocrDeskewer - Rescaling
09:27:55.177 INFO ocrd.workspace - created file ID: OCR-D-IMG-DESKEW_0001, file_grp: OCR-D-IMG-DESKEW, path: OCR-D-IMG-DESKEW/OCR-D-IMG-DESKEW_0001.png

I'm running inside:

docker run --gpus all --rm -it -v "${PWD}/Data/:/data" ocrd/all:maximum

Name of the project

Since the package is called ocrd-anybaseocr and the executables are prefixed with ocrd-anybaseocr, I propose to name the repository ocrd_anybaseocr for consistency and to accomodate non-German speakers?

parameter "force" does not work

Hi,
when using the paramter "force" (this way:
"force":true
)
I would expect this existing data in METS will be overwritten.
But I get an error like this
" Output fileGrp[@use='OCR-D-DEWARP'] already in METS!"
Could you please check this?

CUDA out of memory / cannot disable CUDA

On a CUDA-enabled system with more than 3GB of GPU memory currently free, I get this from dewarp:

INFO OcrdAnybaseocrDewarper - INPUT FILE 105_02_abbr
CustomDatasetDataLoader
dataset [AlignedDataset] was created
lib/python3.6/site-packages/torchvision/transforms/transforms.py:188: UserWarning: The use of the transforms.Scale transform is deprecated, please use transforms.Resize instead.
  "please use transforms.Resize instead.")
pix2pixHD/models/pix2pixHD_model.py:128: UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  input_label = Variable(input_label, volatile=infer)
Traceback (most recent call last):
  File "bin/ocrd-anybaseocr-dewarp", line 8, in <module>
    sys.exit(ocrd_anybaseocr_dewarp())
  File "lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 32, in ocrd_anybaseocr_dewarp
    return ocrd_cli_wrap_processor(OcrdAnybaseocrDewarper, *args, **kwargs)
  File "lib/python3.6/site-packages/ocrd/decorators.py", line 82, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "lib/python3.6/site-packages/ocrd/processor/base.py", line 60, in run_processor
    processor.process()
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 130, in process
    self._process_segment(model, dataset, page, page_xywh, page_id, input_file, orig_img_size, n)
  File "lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_dewarp.py", line 164, in _process_segment
    generated = model.inference(data['label'], data['inst'], data['image'])
  File "pix2pixHD/models/pix2pixHD_model.py", line 216, in inference
    fake_image = self.netG.forward(input_concat)
  File "pix2pixHD/models/networks.py", line 211, in forward
    return self.model(input)             
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "pix2pixHD/models/networks.py", line 252, in forward
    out = x + self.conv_block(x)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward
    input = module(input)
  File "lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "lib/python3.6/site-packages/torch/nn/modules/padding.py", line 163, in forward
    return F.pad(input, self.padding, 'reflect')
  File "lib/python3.6/site-packages/torch/nn/functional.py", line 2865, in pad
    return torch._C._nn.reflection_pad2d(input, pad)
RuntimeError: CUDA out of memory. Tried to allocate 18.00 MiB (GPU 0; 3.93 GiB total capacity; 2.37 GiB already allocated; 18.94 MiB free; 35.58 MiB cached)

Frankly, this does not make any sense to me.

However, I thought, at least I should be able to disable GPU computation. The only parameter that can influence Pytorch setup in dewarp is gpu_id, which would need to be set to 'cpu'. But the tool JSON requires this to be a number!

    raise Exception("Invalid parameters %s" % report.errors)
Exception: Invalid parameters ["[gpu_id] 'cpu' is not of type 'number'"]

tiseg: improve documentation

The name tiseg and its description in the README and tool json suggests this is a segmentation processor. But it does not add regions with coordinates, only a page image with suppressed images.

This should be documented more clearly. Also, consider using the image feature clipped for your AlternativeImage.

cropping: `colSeparator` keeps growing

Currently, ocrd-anybaseocr-crop overwrites the colSeparator parameter it is passed by multiplying it each time it processes an input file, as can be seen here.

I don't know if that actually does any harm, but I think it's not correct. And it creates annotations like this after 100 or so files:

<pc:MetadataItem type="processingStep" name="preprocessing/optimization/cropping" value="ocrd-anybaseocr-crop">
  <pc:Labels>
    <pc:Label value="1304301559452385791083860667921945049512670652676283396137822601561251067109877615124435106720588263854021939843988814105709049577629659822293447714861872273775414540198337591612107674459312338931170881599710958847590103069139491522272792245308087798808372170911863890515942476730728360803222081593683116980145571758080000000000000000000000000000000" type="colSeparator"/>

ocrd-anybaseocr-crop fails if expected EXIF information in PNG image is missing

I tried an OCR-D workflow and failed:

ocrd-anybaseocr-crop -I OCR-D-IMG-PNG -O OCR-D-SEG-PAGE-anyocr -p OCR-D-SEG-PAGE-anyocr.json 2>&1 | tee OCR-D-SEG-PAGE-anyocr.log && touch -c OCR-D-SEG-PAGE-anyocr || { rm -fr OCR-D-SEG-PAGE-anyocr.json OCR-D-SEG-PAGE-anyocr; exit 1; }
22:45:58.672 INFO matplotlib.font_manager - generated new fontManager
Using TensorFlow backend.
22:46:02.518 INFO OcrdAnybaseocrCropper - No output file group for images specified, falling back to 'OCR-D-IMG-CROP'
22:46:02.530 INFO OcrdAnybaseocrCropper - INPUT FILE 0 / PHYS_0001
OUTPUT FILE  OCR-D-SEG-PAGE-anyocr
Traceback (most recent call last):
  File "/venv/bin/ocrd-anybaseocr-crop", line 8, in <module>
    sys.exit(ocrd_anybaseocr_cropping())
  File "/venv/lib/python3.7/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/venv/lib/python3.7/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/venv/lib/python3.7/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/venv/lib/python3.7/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/venv/lib/python3.7/site-packages/ocrd_anybaseocr/cli/cli.py", line 27, in ocrd_anybaseocr_cropping
    return ocrd_cli_wrap_processor(OcrdAnybaseocrCropper, *args, **kwargs)
  File "/venv/lib/python3.7/site-packages/ocrd/decorators.py", line 54, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/venv/lib/python3.7/site-packages/ocrd/processor/base.py", line 56, in run_processor
    processor.process()
  File "/venv/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_cropping.py", line 434, in process
    pcgts = page_from_file(self.workspace.download_file(input_file))
  File "/venv/lib/python3.7/site-packages/ocrd_modelfactory/__init__.py", line 75, in page_from_file
    return page_from_image(input_file)
  File "/venv/lib/python3.7/site-packages/ocrd_modelfactory/__init__.py", line 47, in page_from_image
    exif = exif_from_filename(input_file.local_filename)
  File "/venv/lib/python3.7/site-packages/ocrd_modelfactory/__init__.py", line 32, in exif_from_filename
    return OcrdExif(Image.open(image_filename))
  File "/venv/lib/python3.7/site-packages/ocrd_models/ocrd_exif.py", line 42, in __init__
    self.resolutionUnit = 'cm' if img.tag.get(296) == 3 else 'inches'
AttributeError: 'PngImageFile' object has no attribute 'tag'
make: *** [Makefile:298: OCR-D-SEG-PAGE-anyocr] Fehler 1

torchvision: Update if used

torchvision is listed as a requirement. Versions < 0.5.0 won't work with Pillow 7.0.0. I noticed you're depending on it but didn't see it in use.

CLIs do not work anymore

This is what I get on the current version after make install from a ocrd-anybaseocr-crop --help:

Traceback (most recent call last):
  File "env3/bin/ocrd-anybaseocr-crop", line 6, in <module>
    from ocrd_anybaseocr.cli.cli import ocrd_anybaseocr_cropping
  File "env3/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 4, in <module>
    from ocrd_anybaseocr.cli.ocrd_anybaseocr_binarize import OcrdAnybaseocrBinarizer
  File "env3/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_binarize.py", line 49, in <module>
    from ..utils import print_info, print_error
ModuleNotFoundError: No module named 'ocrd_anybaseocr.utils'

`ocrd-anybaseocr-block-segmentation --help` fails

Command output:

% ocrd-anybaseocr-block-segmentation --help
Using TensorFlow backend.
17:59:25.728 WARNING tensorflow - From /OCR-D/venv-20200509/lib/python3.7/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
Traceback (most recent call last):
  File "/OCR-D/venv-20200509/bin/ocrd-anybaseocr-block-segmentation", line 8, in <module>
    sys.exit(ocrd_anybaseocr_block_segmentation())
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/ocrd_anybaseocr/cli/cli.py", line 54, in ocrd_anybaseocr_block_segmentation
    return ocrd_cli_wrap_processor(OcrdAnybaseocrBlockSegmenter, *args, **kwargs)
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/ocrd/decorators.py", line 35, in ocrd_cli_wrap_processor
    processorClass(workspace=None, show_help=True)
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 71, in __init__
    super(OcrdAnybaseocrBlockSegmenter, self).__init__(*args, **kwargs) 
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/ocrd/processor/base.py", line 185, in __init__
    self.show_help()
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/ocrd/processor/base.py", line 207, in show_help
    print(generate_processor_help(self.ocrd_tool))
  File "/OCR-D/venv-20200509/lib/python3.7/site-packages/ocrd/processor/base.py", line 121, in generate_processor_help
    param['description'],
KeyError: 'description'

ocrd-anybaseocr-tiseg not applying default wiring

The --help of ocrd-anybaseocr-tiseg states a default wiring of ['OCR-D-IMG-CROP'] -> ['OCR-D-SEG-TISEG'].

root@38fa7aad0b43:/data/ocrd_workspace# ocrd-anybaseocr-tiseg --help
Using TensorFlow backend.

Usage: ocrd-anybaseocr-tiseg [OPTIONS]
  
  separate text and non-text part with anyBaseOCR

Options:
  -V, --version                   Show version
  -l, --log-level [OFF|ERROR|WARN|INFO|DEBUG|TRACE]
                                  Log level
  -J, --dump-json                 Dump tool description as JSON and exit
  -p, --parameter TEXT            Parameters, either JSON string or path
                                  JSON file
  -g, --page-id TEXT              ID(s) of the pages to process
  -O, --output-file-grp TEXT      File group(s) used as output.
  -I, --input-file-grp TEXT       File group(s) used as input.
  -w, --working-dir TEXT          Working Directory
  -m, --mets TEXT                 METS to process
  -h, --help                      This help message

Parameters:
  "operation_level" [string - page] PAGE XML hierarchy level to operate
      on Possible values: ["page", "region", "line"]

Default Wiring:
  ['OCR-D-IMG-CROP'] -> ['OCR-D-SEG-TISEG']

The workspace contains a file group named OCR-D-IMG-CROP, a corresponding folder exists.

root@38fa7aad0b43:/data/ocrd_workspace# ls -1
OCR-D-BINPAGE
OCR-D-CROP
OCR-D-DESKEW
OCR-D-IMG
OCR-D-IMG-BIN
OCR-D-IMG-CROP
OCR-D-IMG-DESKEW
mets.xml
root@38fa7aad0b43:/data/ocrd_workspace# ls -1
OCR-D-BINPAGE
OCR-D-CROP
OCR-D-DESKEW
OCR-D-IMG
OCR-D-IMG-BIN
OCR-D-IMG-CROP
OCR-D-IMG-DESKEW
mets.xml

I would expect that running orcd-anybaseocr-tiseg without any arguments would default to using OCR-D-IMG-CROP as input and OCR-D-SEG-TISEG as output. However, the program fails with the following error, because its using the non-existing INPUT as input and OUTPUT as output file group.

root@38fa7aad0b43:/data/ocrd_workspace# ocrd-anybaseocr-tiseg -m mets.xml 
Using TensorFlow backend.
09:22:34.382 INFO ocrd.workspace_validator - input_file_grp=['INPUT'] output_file_grp=['OUTPUT']
Traceback (most recent call last):
  File "/usr/bin/ocrd-anybaseocr-tiseg", line 8, in <module>
    sys.exit(ocrd_anybaseocr_tiseg())
  File "/usr/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 37, in ocrd_anybaseocr_tiseg
    return ocrd_cli_wrap_processor(OcrdAnybaseocrTiseg, *args, **kwargs)
  File "/usr/lib/python3.6/site-packages/ocrd/decorators.py", line 53, in ocrd_cli_wrap_processor
    raise Exception("Invalid input/output file grps:\n\t%s" % '\n\t'.join(report.errors))
Exception: Invalid input/output file grps:
        Input fileGrp[@USE='INPUT'] not in METS!

From what I can tell, this is due to class OcrdAnybaseocrTiseg(Processor) not overriding input_file_grp and output_file_grp in __init__, along the lines of:

kwargs['input_file_group'] = 'OCR-D-IMG-CROP'
kwargs['output_file_group'] = 'OCR-D-SEG-TISEG'

"AttributeError" when running block segmentation

Pls. help! I receive an AttributeError when trying to run the block segmentation with the models provided lately:

$ ocrd-anybaseocr-block-segmentation -I ORIGINAL -O DFKIBS,DFKIBS-IMG -m mets.xml -p '{"block_segmentation_model" : "/home/kmw/Documents/Work/OCR-D/models/DFKI", "block_segmentation_weights" : "/home/kmw/Documents/Work/OCR-D/models/DFKI/block_segmentation_weights.h5"}'
2020-01-22 12:02:41.627680: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer.so.6
2020-01-22 12:02:41.629185: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libnvinfer_plugin.so.6
Using TensorFlow backend.
WARNING: Logging before flag parsing goes to stderr.
W0122 12:02:42.263696 139929988376384 deprecation.py:323] From /home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/tensorflow_core/python/compat/v2_compat.py:88: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version.
Instructions for updating:
non-resource variables are not supported in the long term
W0122 12:02:42.310657 139929988376384 deprecation.py:323] From /home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py:68: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
Traceback (most recent call last):
  File "/home/kmw/Documents/Work/OCR-D/env/bin/ocrd-anybaseocr-block-segmentation", line 8, in <module>
    sys.exit(ocrd_anybaseocr_block_segmentation())
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/click/core.py", line 764, in __call__
    return self.main(*args, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/click/core.py", line 717, in main
    rv = self.invoke(ctx)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/click/core.py", line 956, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/click/core.py", line 555, in invoke
    return callback(*args, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/cli.py", line 54, in ocrd_anybaseocr_block_segmentation
    return ocrd_cli_wrap_processor(OcrdAnybaseocrBlockSegmenter, *args, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd/decorators.py", line 54, in ocrd_cli_wrap_processor
    run_processor(processorClass, ocrd_tool, mets, workspace=workspace, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd/processor/base.py", line 56, in run_processor
    processor.process()
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/cli/ocrd_anybaseocr_block_segmentation.py", line 93, in process
    mrcnn_model = model.MaskRCNN(mode="inference", model_dir=str(model_path), config=config)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/mrcnn/model.py", line 1841, in __init__
    self.keras_model = self.build(mode=mode, config=config)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/ocrd_anybaseocr/mrcnn/model.py", line 1860, in build
    shape=[None, None, config.IMAGE_SHAPE[2]], name="input_image")
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/keras/engine/input_layer.py", line 178, in Input
    input_tensor=tensor)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/keras/engine/input_layer.py", line 87, in __init__
    name=self.name)
  File "/home/kmw/Documents/Work/OCR-D/env/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 541, in placeholder
    x = tf.placeholder(dtype, shape=shape, name=name)
AttributeError: module 'tensorflow' has no attribute 'placeholder'
free(): invalid pointer
Aborted (core dumped)

anybaseocr-deskew wrong coordinates

when deskewing with ocrd-anybaseocr-deskew the coordinates from the Alternative Image produced in this process seem to be used for all further steps. In the end, the coordinates of all regions, lines, words etc. are wrong, even though the text is fully detected.
kit3.zip

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.