Coder Social home page Coder Social logo

ocrd_all's Introduction

OCR-D/ocrd_all

Built on CircleCI MIT licensed

This controls installation of all OCR-D modules from source (as git submodules).

It includes a Makefile for their installation into a virtual environment (venv) or Docker container.

(A venv is a local user directory with shell scripts to load/unload itself in the current shell environment via PATH and PYTHONHOME.)

Note: If you are going to install ocrd_all, you may want to first consult the OCR-D setup guide on the OCR-D website. If you are a non-IT user, it is especially recommended you utilize the guide.

Prerequisites

Space

Make sure that there is enough free disk space. For a full installation including executables from all modules, around 22 GiB will be needed (mostly on the same filesystem as the ocrd_all checkout). The same goes for the maximum-cuda variant of the prebuilt Docker images (due on the filesystem harboring Docker, typically /var/lib/docker).

Also, during build, an additional 5 GiB may be needed for temporary files, typically in the /tmp directory. To use a different location path with more free space, set the TMPDIR variable when calling make:

TMPDIR=/path/to/my/tempdir make all

Locale

The (shell) environment must have a Unicode-based localization. (Otherwise Python code based on click will not work, i.e. most OCR-D CLIs.) This is true for most installations today, and can be verified by:

locale | fgrep .UTF-8

This should show several LC_* variables. Otherwise, either select another localization globally...

sudo dpkg-reconfigure locales

... or use the Unicode-based POSIX locale temporarily:

export LC_ALL=C.UTF-8
export LANG=C.UTF-8

System packages

  • Install git, GNU make and GNU parallel.

      # on Debian / Ubuntu:
      sudo apt install make git parallel
    
  • Install wget or curl if you want to download Tesseract models.

      # on Debian / Ubuntu:
      sudo apt install wget
    
  • Install the packages for Python3 development and Python3 virtual environments for your operating system / distribution.

      # on Debian / Ubuntu:
      sudo apt install python3-dev python3-venv
    
  • Some modules require Tesseract. If your operating system / distribution already provides Tesseract 4.1 or newer, then just install its development package:

      # on Debian / Ubuntu:
      sudo apt install libtesseract-dev
    

    Otherwise, recent Tesseract packages for Ubuntu are available via PPA alex-p.

    If no Tesseract is installed, a recent version will be downloaded and built as part of the ocrd_tesserocr module rules.

  • Other modules will have additional system dependencies.

Note: System dependencies for all modules on Ubuntu 20.04 (or similar) can also be installed automatically by running:

    # on Debian / Ubuntu:
    make modules
    sudo apt install make
    sudo make deps-ubuntu

(And you can define the scope of all modules by setting the OCRD_MODULES variable as described below. If unsure, consider doing a dry-run first, by using make -n.)

GPU support

Many executables can utilize Nvidia GPU for much faster computation, if available (i.e. optionally).

For that, as a further prerequisite you need an installation of CUDA Toolkit and additional optimised libraries like cuDNN for your system.

The CUDA version currently supported is 11.8 (but other's may work as well).

Note: CUDA toolkit and libraries (in a development version with CUDA compiler) can also be installed automatically by running:

    make ocrd
    sudo make deps-cuda

This will deploy Micromamba non-intrusively (without system packages or Conda environments), but also share some of the CUDA libraries installed as Python packages system-wide via ld.so.conf rules. If unsure, consider doing a dry-run first, by using make -n.)

Usage

Run make with optional parameters for variables and targets like so:

make [PYTHON=python3] [VIRTUAL_ENV=./venv] [OCRD_MODULES="..."] [TARGET...]

Targets

deps-ubuntu

Install system packages for all modules. (Depends on modules.)

See system package prerequisites above.

deps-cuda

Install CUDA toolkit and libraries. (Depends on ocrd.)

See (optional) GPU support prerequisites above.

modules

Checkout/update all modules, but do not install anything.

all

Install executables from all modules into the venv. (Depends on modules and ocrd.)

ocrd

Install only the core module and its CLI ocrd into the venv.

docker

(Re-)build a Docker image for all modules/executables. (Depends on modules.)

dockers

(Re-)build Docker images for some pre-selected subsets of modules/executables. (Depends on modules.)

(These are the very same variants published as prebuilt images on Docker Hub, cf. CI configuration.)

Note: The image will contain all refs and branches of all checked out modules, which may not be actually needed. If you are planning on building and distributing Docker images with minimal size, consider using GIT_DEPTH=--single-branch before modules or running make tidy later-on.

clean

Remove the venv and the modules' build directories.

show

Print the venv directory, the module directories, and the executable names – as configured by the current variables.

check

Verify that all executables are runnable and the venv is consistent.

help (default goal)

Print available targets and variables.


Further targets:

[any module name]

Download/update that module, but do not install anything.

[any executable name]

Install that CLI into the venv. (Depends on that module and on ocrd.)

Variables

OCRD_MODULES

Override the list of git submodules to include. Targets affected by this include:

  • deps-ubuntu (reducing the list of system packages to install)
  • modules (reducing the list of modules to checkout/update)
  • all (reducing the list of executables to install)
  • docker (reducing the list of executables and modules to install)
  • show (reducing the list of OCRD_MODULES and of OCRD_EXECUTABLES to print)

NO_UPDATE

If set to 1, then when installing executables, does not attempt to git submodule update any currently checked out modules. (Useful for development when testing different module version prior to a commit.)

PYTHON

Name of the Python binary to use (at least python3 required).

If set to just python, then for the target deps-ubuntu it is assumed that Python is already installed.

VIRTUAL_ENV

Directory prefix to use for local installation.

(This is set automatically when activating a virtual environment on the shell. The build system will re-use the venv if one already exists here, or create one otherwise.)

TMPDIR

Override the default path (/tmp on Unix) where temporary files during build are stored.

PIP_OPTIONS

Add extra options to the pip install command like -q or -v or -e.

Note: The latter option will install Python modules in editable mode, i.e. any update to the source would directly affect the executables.

GIT_RECURSIVE

Set to --recursive to checkout/update all modules recursively. (This usually installs additional tests and models.)

Examples

To build the latest Tesseract locally, run this command first:

# Get code, build and install Tesseract with the default English model.
make install-tesseract
make ocrd-tesserocr-recognize

Optionally install additional Tesseract models.

# Download models from tessdata_fast into the venv's tessdata directory.
ocrd resmgr download ocrd-tesserocr-recognize frk.traineddata
ocrd resmgr download ocrd-tesserocr-recognize Latin.traineddata
ocrd resmgr download ocrd-tesserocr-recognize Fraktur.traineddata

Optionally install Tesseract training tools.

make install-tesseract-training

Running make ocrd or just make downloads/updates and installs the core module, including the ocrd CLI in a virtual Python 3 environment under ./venv.

Running make ocrd-tesserocr-recognize downloads/updates the ocrd_tesserocr module and installs its CLIs, including ocrd-tesserocr-recognize in the venv.

Running make modules downloads/updates all modules.

Running make all additionally installs the executables from all modules.

Running make all OCRD_MODULES="core ocrd_tesserocr ocrd_cis" installs only the executables from these modules.

Results

To use the built executables, simply activate the virtual environment:

. ${VIRTUAL_ENV:-venv}/bin/activate
ocrd --help
ocrd-...

For the Docker image, run it with your data path mounted as a user, and the processor resources as named volume (for model persistency):

docker run -it -u $(id -u):$(id -g) -v $PWD:/data -v ocrd-models:/models ocrd/all
ocrd --help
ocrd-...

Persistent configuration

In order to make choices permanent, you can put your variable preferences (or any custom rules) into local.mk. This file is always included if it exists. So you don't have to type (and memorise) them on the command line or shell environment.

For example, its content could be:

# restrict everything to a subset of modules
OCRD_MODULES = core ocrd_im6convert ocrd_cis ocrd_tesserocr

# use a non-default path for the virtual environment
VIRTUAL_ENV = $(CURDIR)/.venv

# install in editable mode (i.e. referencing the git sources)
PIP_OPTIONS = -e

# use non-default temporary storage
TMPDIR = $(CURDIR)/.tmp

# avoid automatic submodule updates
NO_UPDATE = 1

Note: When local.mk exists, variables can still be overridden on the command line, (i.e. make all OCRD_MODULES= will build all executables for all modules again), but not from the shell environment (i.e. OCRD_MODULES= make all will still use the value from local.mk).

Docker Hub

Besides native installation, ocrd_all is also available as prebuilt Docker images from Docker Hub as ocrd/all, backed by CI/CD. You can choose from three tags, minimum, medium and maximum. These differ w.r.t. which modules are included, with maximum being the equivalent of doing make all with the default (unset) value for OCRD_MODULES.

To download the images on the command line:

docker pull ocrd/all:minimum
# or
docker pull ocrd/all:medium
# or
docker pull ocrd/all:maximum

In addition to these base variants, there are minimum-cuda, medium-cuda and maximum-cuda with GPU support. (These also need nvidia-docker runtime, which will add the docker --gpus option.)

The maximum-cuda variant will be aliased to latest as well.

These tags will be overwritten with every new release of ocrd_all (i.e. rolling release). (You can still differentiate and reference them by their sha256 digest if necessary.)

However, the maximum-cuda variant of each release will also be aliased to a permanent tag by ISO date, e.g. 2023-04-02.

Usage of the prebuilt Docker image is the same as if you had built the image yourself.

This table lists which tag contains which module:

Module minimum medium maximum
core
ocrd_cis
ocrd_fileformat
ocrd_olahd_client
ocrd_im6convert
ocrd_pagetopdf
ocrd_repair_inconsistencies
ocrd_tesserocr
ocrd_wrap
workflow-configuration
cor-asv-ann -
dinglehopper -
docstruct -
format-converters -
nmalign -
ocrd_calamari -
ocrd_keraslm -
ocrd_neat -
ocrd_olena -
ocrd_segment -
ocrd_anybaseocr - -
ocrd_detectron2 - -
ocrd_doxa - -
ocrd_kraken - -
ocrd_froc - -
sbb_binarization - -
cor-asv-fst - - -
ocrd_ocropy - - -
ocrd_pc_segmentation - - -

Note: The following modules have been disabled by default and can only be enabled by explicitly setting OCRD_MODULES or DISABLED_MODULES:

  • cor-asv-fst (runtime issues)
  • ocrd_ocropy (better implementation in ocrd_cis available)
  • ocrd_pc_segmentation (dependency and quality issues)

Uninstall

If you have installed ocrd_all natively and wish to uninstall, first deactivate the virtual environment and remove the ocrd_all directory:

rm -rf ocrd_all

Next, remove all contents under ~/.parallel/semaphores:

rm -rf ~/.parallel/semaphores

Challenges

This repo offers solutions to the following problems with OCR-D integration.

No published/recent version on PyPI

Python modules which are not available in PyPI:

(Solved by installation from source.)

Conflicting requirements

Merging all packages into one venv does not always work. Modules may require mutually exclusive sets of dependent packages.

pip does not even stop or resolve conflicts – it merely warns!

  • Tensorflow:

    • version 2 (required by ocrd_calamari, ocrd_anybaseocr and eynollah)
    • version 1 (required by cor-asv-ann, ocrd_segment and ocrd_keraslm)

    The temporary solution is to require different package names:

    • tensorflow>=2
    • tensorflow-gpu==1.15.*

    Both cannot be installed in parallel in different versions, and usually also depend on different versions of CUDA toolkit.

  • OpenCV:

    • opencv-python-headless (required by core and others, avoids pulling in X11 libraries)
    • opencv-python (probably dragged in by third party packages)

    As long as we keep reinstalling the headless variant and no such package attempts GUI, we should be fine. Custom build (as needed for ARM) under the module opencv-python already creates the headless variant.

  • PyTorch:

    • torch<1.0
    • torch>=1.0
  • ...

(Solved by managing and delegating to different subsets of venvs.)

System requirements

Modules which do not advertise their system package requirements via make deps-ubuntu:

(Solved by maintaining these requirements under deps-ubuntu here.)

Contributing

Please see our contributing guide to learn how you can support the project.

Acknowledgments

This software uses GNU parallel. GNU Parallel is a general parallelizer to run multiple serial command line programs in parallel without changing them.

Reference

Tange, Ole. (2020). GNU Parallel 20200722 ('Privacy Shield'). Zenodo. https://doi.org/10.5281/zenodo.3956817

ocrd_all's People

Contributors

bertsky avatar cneud avatar kba avatar m3ssman avatar mikegerber avatar sb2020-eye avatar stweil avatar sulzbals avatar witiko avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

ocrd_all's Issues

Builds with missing TF create unusable "executables"

make all with Python 3.8 fails because of missing TF modules.

ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.15.* (from ocrd-cor-asv-ann==0.1.2) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0)
ERROR: No matching distribution found for tensorflow-gpu==1.15.* (from ocrd-cor-asv-ann==0.1.2)
make[1]: *** [Makefile:213: /home/stweil/src/github/OCR-D/venv-20200813/local/sub-venv/headless-tf1/bin/ocrd-cor-asv-ann-evaluate] Fehler 1
make[1]: Verzeichnis „/home/stweil/src/github/OCR-D/ocrd_all“ wird verlassen
make: *** [Makefile:209: /home/stweil/src/github/OCR-D/venv-20200813/bin/ocrd-cor-asv-ann-evaluate] Fehler 2
(venv-20200813) stweil@bss11:~/src/github/OCR-D/ocrd_all$ ls -l $VIRTUAL_ENV/bin/ocrd-cor*
-rw-r--r-- 1 stweil stweil   209 Aug 17 19:59 /home/stweil/src/github/OCR-D/venv-20200813/bin/ocrd-cor-asv-ann-evaluate
-rw-r--r-- 1 stweil stweil   208 Aug 17 19:59 /home/stweil/src/github/OCR-D/venv-20200813/bin/ocrd-cor-asv-ann-process

Repeating make all sufficiently often (or running make all -k "builds" all processors, and make claims that there remains nothing to do although it only created non-executable stubs in venv/bin.

Related issues: #147, #150.

Parallel build terminates with unexpected message when wget and curl are missing

Running make all -j4 on a fresh clone of ocrd_all terminated with these build messages:

configure.ac:382: installing 'config/compile'
configure.ac:86: installing 'config/config.guess'
configure.ac:86: installing 'config/config.sub'
configure.ac:27: installing 'config/install-sh'
configure.ac:27: installing 'config/missing'
Makefile.am: installing 'config/depcomp'
parallel-tests: installing 'config/test-driver'

All done.
To build the software now, do something like:

$ ./configure [--enable-debug] [...other options]

The error was caused because neither curl nor wget was installed. The related error message was hidden somewhere in the middle of the build protocol:

Makefile:641: *** found no cmdline downloader (wget/curl).  Stop.
make: *** Waiting for unfinished jobs....

RFC: Debian/Ubuntu packaging of ocrd_all components and OCR models

Now that a solution to the conflicting dependency problem is imminent, we should discuss how we can reduce build times and simplify management of OCR models by supporting OS package management.

I see three areas where package management can improve ocrd_all:

  1. Providing packages for processors with full dependencies, e.g. with AppImage as @stweil proposed.
  2. Providing packages for compile-intensiv packages, i.e. tesseract and olena
  3. Packaging models, like the GT4HistOCR-based ones, for tesseract, calamari, ocropy and kraken

Ad 1.: The only way this can work without creating system-wide dependency conflicts would be basically a repackaging of the maximum docker image. This is also of interest and AppImage is probably a good solution

Ad 2.: Since the scope is limited (tesseract and olena), @mikegerber has already built debian/ubuntu packages for olena and @AlexanderP builds tesseract for Launchpad's PPA, this would be relatively straightforward

Ad 3.: For tesseract models we can take the official tesseract-ocr-* models as a blueprint. ocropy and kraken models can also be packaged relatively easy. For calamari models, we should probably agree on a convention where and how models should be stored (ping @maxnth @andbue @chreul if you have already ideas/plans in that regard)

The model packaging in particular would be of benefit also outside the OCR-D "ecosphere".

My questions for the ocrd_all users/developers:

  1. Which of the three approaches are worth exploring in your opinion?
  2. Who has experience in Debian/Ubuntu packaging and can help with setting up the tooling necessary?
  3. How should we distribute the models? PPA seems like a straightforward choice but only supports Ubuntu (?) not Debian. Another proposal was https://packagecloud.io. Or could we build a repository as a GitHub pages static site or use GitHub releases as a pseudo-repository?

Feedback and pointers to solutions are very welcome.

Docker containers install only one python module

Pretty bizarre behavior, we have an error in our setup somewhere. Apparently only one python module per docker image is installed. This only concerns python modules, projects like im6convert, olena and tesseract are installed fine.

In minimum, only ocrd_tesserocr is installed.

In medium, only cor-asv-ann is installed.

In maximum, only ocrd_kraken (of all modules) is installed.

Why is GIT_RECURSIVE disabled by default?

GIT_RECURSIVE = # --recursive
[...]
git submodule sync $(GIT_RECURSIVE) $@
if git submodule status $(GIT_RECURSIVE) $@ | grep -qv '^ '; then \
    git submodule update --init $(GIT_RECURSIVE) $@ && \
    touch $@; fi

This will fail for modules that have themselves submodules such as ocrd_olena. Why ist this not the default?

Document Docker Hub installation

Also I have this neat table for what is available in {minim,medi,maxim}um that we could add to the README:

Module minimum medium maximum
core
ocrd_cis
ocrd_im6convert
ocrd_repair_inconsistencies
ocrd_tesserocr
tesserocr
workflow-configuration
cor-asv-ann -
dinglehopper -
format-converters -
ocrd_calamari -
ocrd_keraslm -
ocrd_olena -
ocrd_segment -
tesseract -
ocrd_anybaseocr - -
ocrd_kraken - -
ocrd_ocropy - -
ocrd_pc_segmentation - -
ocrd_typegroups_classifier - -
sbb_textline_detector - -
cor-asv-fst - -

Basic Releasemanagement

As an user of OCR-D-Tools it is the easiest approach to run the Tools the containerized way.

For a production environment release builds of ocrd-all-containers are a key requirement for stable workflows.

By now, the situation is as follows:
If an OCR-D-Container is pulled one week later on a different system, it must be possible to pull exactly the same version than it was the time before on the first system. Without tagging the container images, this can't be guaranteed, since they implicit tagged as latest.

Any common CI/CD-Plattform has the ability to run post-commit actions, so has CircleCI.

Add rules for training tools

For best accuracy of text recognition (the primary goal of OCR-D), additional training ("fine tuning") is needed. Therefore training tools should be part of an advanced OCR-D installation.

Building training tools requires additional dependencies and targets.

  • Tesseract: dependencies for training, make training-install
  • Calamari?
  • Kraken? It is currently not in the default set of tools which are built, but might be useful for training, for example to create line images from text.
  • Others?

Should we add rules? How to name the Makefile targets? Perhaps existing target names could be used with an added -training, like make all-training.

Tensorflow CPU vs GPU

  1. https://github.com/OCR-D/ocrd_all#conflicting-requirements states that ocrd_calamari would depend on tensorflow-gpu 1.14.x, but it depends on 1.15.2 since recently.

  2. There is also still some solvable(!) problem/confusion about the different TensorFlow flavours. For tensorflow 1.15.*, one can simply depend on tensorflow-gpu == 1.15.* for CPU and GPU support. I am not aware of any issues using tensorflow-gpu's CPU fallback on CPU, I use it every day. (There was some source of additional confusion because TF changed their recommendation for 1.15 only.)

  3. I just recently discovered that one can depend on an approximate version, e.g. tensorflow-gpu ~= 1.15.2 or tensorflow == 1.15.*

TL&DR: My recommendation would be that our TF1 projects just use tensorflow-gpu == 1.15.* for CPU and GPU support and be done with this problem.

[RFC] AppImage for OCR-D

An AppImage for OCR-D might be an interesting alternative to Docker containers.

AppImages contain all code and data for one or several applications in a single file (technically a compressed filesystem embedded into a standalone Linux executable). They only work on Linux, but don't require a special distribution or pre-installed packages. Their only requirement is working FUSE (user space filesystem).

Building an AppImage with similar contents as the current Docker containers has some challenges, but should be possible.

`make clean` removes venv directory in source path unconditionally

Ideally it would only remove files created by the build process.

If the user has created the venv directory manually and added own content, removing that directory might be bad.

Makefile could check whether that virtual environment was active outside of make and skip the removal then.

make CI config more robust to network

Currently CircliCI fails in deploy job due to network connectivity problems during pip install. Dockerfile already has --timeout=3000 among the options for pip. Still, we get:

Connection broken: ConnectionResetError(104, 'Connection reset by peer')

Should we increase the timeout? Or is this just a bad day?

Building tesserocr fails in parallel builds

tesserocr depends on an installed tesseract. This dependency is currently missing in the Makefile. A fresh parallel build tries to build tesserocr before the tesseract build is finished and fails therefore:

tesserocr.cpp:658:10: fatal error: tesseract/publictypes.h: No such file or directory

Installation issues

With v2020-08-04, I encountered several installation issues after following the native install guide by cloning, running sudo make deps-ubuntu followed by make all:

  • sudo make deps-ubuntu creates .../ocrd_all/.git and ~/.parallel with root permissions, requiring manually setting the correct file ownership via chown -R user:user

  • When I now run parallel --citation followed by will cite and re-trigger the build via make all, it hangs immediately at the first line with sem --fg --id ocrd_all git submodule sync cor-asv-ann

  • After deactivating cor-asv-ann and continuing with make all I discovered that all modules are emtpy (*** No rule to make target 'install'.) and need to be individually cloned via git submodule update --init and then again make all to arrive at executable modules.

Drop support for ocrd_kraken?

It's currently de-facto unmaintained and depends on an older version of kraken which depends on clstm which is a PITA to build.

We could re-enable it if/once ocrd_kraken has been updated.

[Docker] anybaseocr wants Tensorflow 2.0

Distribution

"Id": "sha256:00ceb6e2c3cd28b7d79d779be5020f1df82966d793f9a92d5768d5abaa005310",
"RepoTags": [
    "ocrd/all:maximum"
],
"RepoDigests": ["ocrd/all@sha256:47c02733c490ff34640b44713a887c731eae07d4ec69107d3cd99031247a1d26"
],
"Created": "2020-06-18T13:45:21.648347181Z",

Plattform

Ubuntu 18.04 LTS

Log

building OCR-D-SEG-PAGE-anyocr from OCR-D-BINPAGE-sauvola with pattern rule for ocrd-anybaseocr-crop
STAMP=`test -e OCR-D-SEG-PAGE-anyocr && date -Ins -r OCR-D-SEG-PAGE-anyocr`; ocrd-anybaseocr-crop   -I OCR-D-BINPAGE-sauvola -p OCR-D-SEG-PAGE-anyocr.json -O OCR-D-SEG-PAGE-anyocr --overwrite 2>&1 | tee OCR-D-SEG-PAGE-anyocr.log && touch -c OCR-D-SEG-PAGE-anyocr || { if test -z "$STAMP"; then rm -fr OCR-D-SEG-PAGE-anyocr; else touch -c -d "$STAMP" OCR-D-SEG-PAGE-anyocr; fi; false; }
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (tensorflow 1.15.3 (/usr/lib/python3.6/site-packages), Requirement.parse('tensorflow>=2.0'), {'ocrd-anybaseocr'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/ocrd-anybaseocr-crop", line 33, in <module>
    sys.exit(load_entry_point('ocrd-anybaseocr', 'console_scripts', 'ocrd-anybaseocr-crop')())
  File "/usr/bin/ocrd-anybaseocr-crop", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 96, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/build/ocrd_anybaseocr/ocrd_anybaseocr/cli/cli.py", line 3, in <module>
    from ocrd.decorators import ocrd_cli_options, ocrd_cli_wrap_processor
  File "/build/core/ocrd/ocrd/__init__.py", line 17, in <module>
    from ocrd.processor.base import run_processor, run_cli, Processor
  File "/build/core/ocrd/ocrd/processor/__init__.py", line 1, in <module>
    from .base import (
  File "/build/core/ocrd/ocrd/processor/base.py", line 6, in <module>
    from ocrd_utils import getLogger, VERSION as OCRD_VERSION
  File "/build/core/ocrd_utils/ocrd_utils/__init__.py", line 135, in <module>
    from .logging import * # pylint: disable=wildcard-import
  File "/build/core/ocrd_utils/ocrd_utils/logging.py", line 22, in <module>
    from .constants import LOG_FORMAT, LOG_TIMEFMT
  File "/build/core/ocrd_utils/ocrd_utils/constants.py", line 4, in <module>
    from pkg_resources import get_distribution
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3260, in <module>
    @_call_aside
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3244, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3273, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 585, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 598, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'tensorflow>=2.0' distribution was not found and is required by ocrd-anybaseocr
Makefile:313: recipe for target 'OCR-D-SEG-PAGE-anyocr' failed
make[1]: *** [OCR-D-SEG-PAGE-anyocr] Error 1
make[1]: Leaving directory '/data/1000'
make: *** [1000] Error 2
Makefile:204: recipe for target '1000' failed
make: Leaving directory '/data'

real	1m36,408s

Upgrade pip to 20.2+

https://pip.pypa.io/en/stable/user_guide/#changes-to-the-pip-dependency-resolver-in-20-2-2020

The most significant changes to the resolver are:

It will reduce inconsistency: it will no longer install a combination of packages that is mutually inconsistent. In older versions of pip, it is possible for pip to install a package which does not satisfy the declared requirements of another installed package. For example, in pip 20.0, pip install "six<1.12" "virtualenv==20.0.2" does the wrong thing, “successfully” installing six==1.11, even though virtualenv==20.0.2 requires six>=1.12.0,<2 (defined here). The new resolver, instead, outright rejects installing anything if it gets that input.

It will be stricter - if you ask pip to install two packages with incompatible requirements, it will refuse (rather than installing a broken combination, like it did in previous versions).

So, if you have been using workarounds to force pip to deal with incompatible or inconsistent requirements combinations, now’s a good time to fix the underlying problem in the packages, because pip will be stricter from here on out.

We should explore that new dependency algorithm and consider upgrading.

Installation of models only works when tesseract is in module list

Test case:

$ make DISABLED_MODULES="cor-asv-fst opencv-python ocrd_kraken clstm ocrd_ocropy tesseract" install-models-ocropus
make: *** No rule to make target 'install-models-ocropus'.  Stop.

The rules for the install-models-* targets must be moved out of the ifneq ($(findstring tesseract, $(OCRD_MODULES)),) ... endif conditional.

leptonica dependency for tesseract - compile and/or deps-ubuntu?

I'm installing ocrd_all in a fresh WSL Ubuntu 18.04 instance. I notice that missing libleptonica-dev prevents tesseract compilation and is not installed with sudo make deps-ubuntu. Running sudo apt-get install libleptonica-dev remedies this for me.

tesseract is in OCRD_MODULES according to make show but CUSTOM_DEPS does not contain libleptonica-dev when running make deps-ubuntu.

Since it does work for the docker builds as it is supposed to, what am I doing wrong here?

why is Tesseract built with --disable-openmp

This goes back to the very first commit bringing Tesseract build rules by @stweil:

cd $(VIRTUAL_ENV)/build/tesseract && $(CURDIR)/tesseract/configure --disable-openmp --disable-shared --prefix="$(VIRTUAL_ENV)" CXXFLAGS="-g -O2 -fPIC"

Disabling OpenMP means loosing implicit CPU parallelization, which can speed up single-job workflows significantly.

Native system packages are usually built with OpenMP enabled.

Can we please drop --disable-openmp?

Let's not publish `latest` at all on DockerHub

I'd argue that it's easier to support users if they are forced to use an explicit tag of the docker image, instead of falling back on latest which is not built automatically anymore and out-of-date.

help target

Two proposals:

  1. Make help the default target
  2. simplify help to a series of echo commands
  1. is a convention I really like because when it is not completely obvious what the default should be (and I argue it is not for this repo) then it's better not to do anything for the default and just print usage info.

  2. Evaluating a variable that cats a heredoc string seems convoluted. Also why export HELP?

Allow building with thin module Docker containers

In #68 @bertsky :

But the real problem is that TF2 dependencies are lurking everywhere, so we will very soon have the unacceptable state that no catch-all venv (satisfying both TF1 and TF2 modules) is possible anymore. By then, a new solution needs to be in place, which (at least partially) isolates venvs from each other again.

Latest code fails to build all

There are several issues with the current git master:

  • make all tries to compile opencv-python although that is only needed for some architectures like ARM. Fix: PR #39.
  • make all fails to build cor-asv-fst.
  • make all -k does not continue with other modules after the failure with cor-asv-fst. Fix: PR #38.
  • make all with a restricted list of modules (no cro-asv-fst) does not build ocrd-tesserocr-*. Fix: PR #38.

offer more development options

I see the need for at least two more elaborate all-in-one tasks we could encapsulate as well:

  1. automatic tests (e.g. make tests): This would delegate to the modules' respective make test or whatever is available. We would sometimes have to install additional dependencies like make deps-test or clone more subrepos.
  2. download models (e.g. make models): This would delegate to the modules' respective model files or other data. In some cases this means cloning more subrepos, in others running web downloads.

Both would have to be encapsulated locally in each module section. When used in Dockerfiles during build-time these would produce larger development images.

Many OCR-D processors are not executable

After checkout of current master 18547e4 and make modules / make all my venv/bin directory looks like this:

(venv) jk@jk-XPS-13:~/Projekte/ocrd_all_2020$ ll venv/bin/ocrd*
-rwxrwxr-x 1 jk jk   237 Aug 14 20:03 venv/bin/ocrd*
-rw-rw-r-- 1 jk jk   196 Aug 15 11:48 venv/bin/ocrd-anybaseocr-binarize
-rw-rw-r-- 1 jk jk   206 Aug 15 11:48 venv/bin/ocrd-anybaseocr-block-segmentation
-rw-rw-r-- 1 jk jk   192 Aug 15 11:48 venv/bin/ocrd-anybaseocr-crop
-rw-rw-r-- 1 jk jk   194 Aug 15 11:48 venv/bin/ocrd-anybaseocr-deskew
-rw-rw-r-- 1 jk jk   194 Aug 15 11:48 venv/bin/ocrd-anybaseocr-dewarp
-rw-rw-r-- 1 jk jk   203 Aug 15 11:48 venv/bin/ocrd-anybaseocr-layout-analysis
-rw-rw-r-- 1 jk jk   196 Aug 15 11:48 venv/bin/ocrd-anybaseocr-textline
-rw-rw-r-- 1 jk jk   193 Aug 15 11:48 venv/bin/ocrd-anybaseocr-tiseg
-rw-rw-r-- 1 jk jk   193 Aug 14 20:19 venv/bin/ocrd-calamari-recognize
-rwxrwxr-x 1 jk jk   986 Aug 15 23:26 venv/bin/ocrd-cis-align*
-rwxrwxr-x 1 jk jk   984 Aug 15 23:26 venv/bin/ocrd-cis-data*
-rwxrwxr-x 1 jk jk  1006 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-binarize*
-rwxrwxr-x 1 jk jk   998 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-clip*
-rwxrwxr-x 1 jk jk  1004 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-denoise*
-rwxrwxr-x 1 jk jk  1002 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-deskew*
-rwxrwxr-x 1 jk jk  1002 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-dewarp*
-rwxrwxr-x 1 jk jk   996 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-rec*
-rwxrwxr-x 1 jk jk  1008 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-recognize*
-rwxrwxr-x 1 jk jk  1008 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-resegment*
-rwxrwxr-x 1 jk jk  1004 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-segment*
-rwxrwxr-x 1 jk jk  1000 Aug 15 23:26 venv/bin/ocrd-cis-ocropy-train*
-rwxrwxr-x 1 jk jk   998 Aug 15 23:26 venv/bin/ocrd-cis-postcorrect*
-rw-rw-r-- 1 jk jk   195 Aug 14 16:28 venv/bin/ocrd-cor-asv-ann-evaluate
-rw-rw-r-- 1 jk jk   194 Aug 14 16:28 venv/bin/ocrd-cor-asv-ann-process
-rwxrwxr-x 1 jk jk  1004 Aug 14 20:15 venv/bin/ocrd-dinglehopper*
-rwxrwxr-x 1 jk jk   267 Aug 14 20:03 venv/bin/ocrd-dummy*
-rwxrwxr-x 1 jk jk  6494 Aug 15 23:03 venv/bin/ocrd-export-larex*
-rwxrwxr-x 1 jk jk  2879 Aug 15 23:26 venv/bin/ocrd-fileformat-transform*
-rwxrwxr-x 1 jk jk  2976 Aug 14 19:51 venv/bin/ocrd-im6convert*
-rwxrwxr-x 1 jk jk  6285 Aug 15 23:03 venv/bin/ocrd-import*
-rw-rw-r-- 1 jk jk   187 Aug 14 17:25 venv/bin/ocrd-keraslm-rate
-rwxrwxr-x 1 jk jk  1690 Aug 15 23:03 venv/bin/ocrd-make*
-rwxrwxr-x 1 jk jk 17455 Aug 14 20:15 venv/bin/ocrd-olena-binarize*
-rwxrwxr-x 1 jk jk  5281 Aug 14 20:19 venv/bin/ocrd-pagetopdf*
-rw-rw-r-- 1 jk jk   192 Aug 15 02:43 venv/bin/ocrd-pc-segmentation
-rwxrwxr-x 1 jk jk  1003 Aug 14 20:00 venv/bin/ocrd-preprocess-image*
-rwxrwxr-x 1 jk jk  1069 Aug 15 23:03 venv/bin/ocrd-repair-inconsistencies*
-rw-rw-r-- 1 jk jk   196 Aug 15 12:22 venv/bin/ocrd-sbb-textline-detector
-rwxrwxr-x 1 jk jk   191 Aug 15 02:43 venv/bin/ocrd-segment-evaluate*
-rwxrwxr-x 1 jk jk   196 Aug 15 02:43 venv/bin/ocrd-segment-extract-lines*
-rwxrwxr-x 1 jk jk   196 Aug 15 02:43 venv/bin/ocrd-segment-extract-pages*
-rwxrwxr-x 1 jk jk   198 Aug 15 02:43 venv/bin/ocrd-segment-extract-regions*
-rwxrwxr-x 1 jk jk   192 Aug 15 02:43 venv/bin/ocrd-segment-from-coco*
-rwxrwxr-x 1 jk jk   193 Aug 15 02:43 venv/bin/ocrd-segment-from-masks*
-rwxrwxr-x 1 jk jk   189 Aug 15 02:43 venv/bin/ocrd-segment-repair*
-rwxrwxr-x 1 jk jk   199 Aug 15 02:43 venv/bin/ocrd-segment-replace-original*
-rwxrwxr-x 1 jk jk  1003 Aug 14 20:00 venv/bin/ocrd-skimage-binarize*
-rwxrwxr-x 1 jk jk  1001 Aug 14 20:00 venv/bin/ocrd-skimage-denoise*
-rwxrwxr-x 1 jk jk  1009 Aug 14 20:00 venv/bin/ocrd-skimage-denoise-raw*
-rwxrwxr-x 1 jk jk  1005 Aug 14 20:00 venv/bin/ocrd-skimage-normalize*
-rwxrwxr-x 1 jk jk  1022 Aug 14 20:18 venv/bin/ocrd-tesserocr-binarize*
-rwxrwxr-x 1 jk jk  1014 Aug 14 20:18 venv/bin/ocrd-tesserocr-crop*
-rwxrwxr-x 1 jk jk  1018 Aug 14 20:18 venv/bin/ocrd-tesserocr-deskew*
-rwxrwxr-x 1 jk jk  1024 Aug 14 20:18 venv/bin/ocrd-tesserocr-recognize*
-rwxrwxr-x 1 jk jk  1030 Aug 14 20:18 venv/bin/ocrd-tesserocr-segment-line*
-rwxrwxr-x 1 jk jk  1034 Aug 14 20:18 venv/bin/ocrd-tesserocr-segment-region*
-rwxrwxr-x 1 jk jk  1032 Aug 14 20:18 venv/bin/ocrd-tesserocr-segment-table*
-rwxrwxr-x 1 jk jk  1030 Aug 14 20:18 venv/bin/ocrd-tesserocr-segment-word*
-rw-rw-r-- 1 jk jk   204 Aug 15 12:21 venv/bin/ocrd-typegroups-classifier

I had several errors during make all, but as far as I remember these were all due to a bad internet connection.

check for system dependencies in a user-friendly way

Might be relevant to the spec itself: We now have deps-ubuntu as a means to encapsulate system dependencies of modules, but no way to check that these are met – not even for Ubuntu itself.

For Ubuntu itself, it would be easy to introduce an extra target check-deps-ubuntu which fails (with a good error message) when one of the packages (or even PPAs) is missing. Of course, that would also have to be implemented in all the modules which already have/need deps-ubuntu.

But I don't know of any OS-independent checks besides things like pkg-config. And then the (package) names of those dependencies might be different again, hence the need to lay hands on the individual modules' dependencies – which we wanted to encapsulate with deps-ubuntu (alone).

So for the weaker goal (Ubuntu only), I don't think it is worth the effort (as one could instead always install just in case). And for the stronger goal (OS-independent), I'm afraid we have to postpone that in the same way we postponed OCR-D/spec#131.

make all fails on clean minimal ubuntu eoan

Processing /home/jb/ocrd_all/clstm
Requirement already satisfied: wheel>=0.33 in /home/jb/ocrd_all/venv/lib/python3.7/site-packages (from clstm==0.1) (0.33.6)
Building wheels for collected packages: clstm
  Building wheel for clstm (setup.py) ... error
  ERROR: Command errored out with exit status 1:
...
  /bin/sh: 1: swig: not found
...

ok...

> apt-get install swig
> make all

then

...
    sh: 1: protoc: not found
...

now...

> apt install protobuf-compiler
> make all

then...

...
  Warning: Can't read registry to find the necessary compiler setting
...

make all fails to build on ubuntu 20.04 due to missing tensorflow-gpu version for Python 3.8

make all (after sudo make deps-ubuntu) using c3255b on Ubuntu 20.04/Python 3.8 fails to build due to missing tensorflow version 1.15.* - pypi has 1.15.0 and 1.15.2 available though (edit: only for older Python versions)

Successfully built ocrd
Installing collected packages: ocrd
Successfully installed ocrd-2.5.3
make[1]: Leaving directory '/home/cnd/ocrd_all/core'
[...]
. /home/cnd/ocrd/bin/activate && cd cor-asv-ann && pip3 install  .
Processing /home/cnd/ocrd_all/cor-asv-ann
Requirement already satisfied: ocrd>=2.0 in /home/cnd/ocrd/lib/python3.8/site-packages (from ocrd-cor-asv-ann==0.1.2) (2.5.3)
Requirement already satisfied: click in /home/cnd/ocrd/lib/python3.8/site-packages (from ocrd-cor-asv-ann==0.1.2) (7.1.2)
Collecting keras==2.3.*
  Downloading Keras-2.3.1-py2.py3-none-any.whl (377 kB)
     |████████████████████████████████| 377 kB 443 kB/s 
Requirement already satisfied: numpy in /home/cnd/ocrd/lib/python3.8/site-packages (from ocrd-cor-asv-ann==0.1.2) (1.18.4)
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.15.* (from ocrd-cor-asv-ann==0.1.2) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4)
ERROR: No matching distribution found for tensorflow-gpu==1.15.* (from ocrd-cor-asv-ann==0.1.2)
make: *** [Makefile:185: /home/cnd/ocrd/bin/ocrd-cor-asv-ann-evaluate] Error 1

Builds without TensorFlow

Pre-build packages of the different version of TF don't exist for every platform (for example not with Python 3.8, not for non-Intel hosts). A typical error message which terminates make all looks like this:

ERROR: No matching distribution found for tensorflow-gpu==1.15.* (from ocrd-keraslm==0.3.2)

I think it would be good to support builds on such platforms, too, by skipping problematic modules, without requiring hacker methods like make all -k. Possible solutions:

  • make all could detect whether required packages are available and skip modules which need unavailable packages.
  • make all could support a macro to disable groups of modules (like for example all modules which depend on TF1). It is already possible to define the desired modules, but that requires much knowledge about the different modules.

The first one of these solution could be really user friendly, especially when it also shows an informative message.

sem timing out

@mnoelte reports that make modules fails with timeouts from sem:

parallel: Warning: Semaphore stuck for 30 seconds. Consider using --semaphoretimeout.

I'm not able to reproduce the issue, open to any suggestions. Might be a permission issue.

However, since the semaphore file locks are only required when make is called with j n and n > 1, a simple workaround would be to check whether more than a single job is to be run and to remove the sem locks if n == 1.

DockerHub deployment

We have DockerHub integration set up and it kinda works, but it's both too inflexible (can only build directly from a Dockerfile, so no make docker(s)), and it takes forever (hours in the queue, ~90 minutes to build) and is a PITA to debug.

Hence I think it would be best to use CircleCI workflows to build the images and push to DockerHub from there. Article on how to implement this exact use case: https://circleci.com/blog/using-circleci-workflows-to-replicate-docker-hub-automated-builds/

If nobody objects (e.g. with a better idea :D) I will try to set this up ASAP.

Document `parallel --citation`

Users should run parallel --will-cite or they will be flooded with spam about citing the paper that GNU parallel is based on.

tessdata "fast" vs "best"

The current value for TESSDATA_URL is https://github.com/tesseract-ocr/tessdata_fast, in the context of OCR-D I propose to change the default to https://github.com/tesseract-ocr/tessdata_best instead.

add CI test with full realistic workflow on sample data

Most submodules are CI-tested individually. So when we make a new release here, as long as we don't negligently integrate failed versions, we can feel better with each update.

However, there are still two classes of errors which will evade this scheme:

  1. within-module regressions not covered by their automatic unit tests
  2. cross-module regressions (i.e. unmet implicit interdependencies between versions)

For the latter, we could reduce our risk by introducing workflow tests that run many different modules on a small set of sample data. Since the CI job for PRs is already set to use make docker-maximum, one could use any processor(s) after that. And we already have enough data in core/repo/assets/ and workflows in workflow-configuration/. So the test would run that workflow on (say) data/kant_aufklaerung_1784, check it did not crash, check the target file group exists for all pages, and perhaps validate the workspace.

(We don't include any models in the standard distribution though. So one would either have to use a very simple workflow, or need an extra step to install segmentation and recognition models.)

avoid race for git lockfile when syncing modules in parallel

There's a race condition for the .git/config.lock file between parallel jobs when they want to synchronize / update some submodules. It must have been around for some time already, and only surfaces when using make -j N (i.e. with some fixed number of jobs), but not make -j (i.e. relative to the current system load-level).

Originally posted by @bertsky in #121

git submodule logic changes remotes

I was wondering why the URLs of remotes of submodules where being changed until I realized the issue was with make modules.

To reproduce:

$ cd core
$ git remote add kba https://github.com/kba/ocrd-core
$ git checkout -t kba/logging-test
$ git remote -v
kba     https://github.com/kba/ocrd-core (fetch)
kba     https://github.com/kba/ocrd-core (push)
origin  https://github.com/OCR-D/core.git (fetch)
origin  https://github.com/OCR-D/core.git (push)

Now, if I run make modules and change back into core:

$ make modules OCRD_MODULES=core
sem --fg --id ocrd_all_git git submodule sync  core
Synchronizing submodule url for 'core'
if git submodule status  core | grep -qv '^ '; then \
        sem --fg --id ocrd_all_git git submodule update --init  core && \
        touch core; fi
Submodule path 'core': checked out '320a2fdeda6836bc4da522620bf169c22d471cbd'
$ cd core
$ git remote -v
kba     https://github.com/OCR-D/core.git (fetch)
kba     https://github.com/OCR-D/core.git (push)
origin  https://github.com/OCR-D/core.git (fetch)
origin  https://github.com/OCR-D/core.git (push)

I understand that the submodule update will change to the recorded submodule commit.

I do not understand how and why the remote URL is changed.

RFC: Add submodules and rules for model training?

I now have used several times a combination of OCR-D/ocrd_all and tesseract-ocr/tesstrain to train new models for Tesseract. Maybe this is also interesting for others, so integrating training support into OCR-D/ocrd_all could help. For my use case this would involve these steps:

  1. Add rule(s) to build and install the Tesseract training tools. This is rather simple. I suggest a new Makefile target install-tesseract-training.
  2. Add tesseract-ocr/tesstrain as a submodule of OCR-D/ocrd_all. The submodule could be fetched with make install-tesseract-training. The everything is ready for training, only GT data is needed.
  3. Add a wrapper for tesstrain/Makefile and install it in $(BIN), similar to ocrd_make. Maybe @bertsky has a good idea how to do that.

Suggestions are welcome. And of course supporting training for other tools would also be nice. I am not sure how Doreenruirui/okralact fits into this RFC.

sub-venv logic ineffective when main venv was already active

When the user already has a virtual environment set up for herself and activated, ocrd_all picks it up (does not create a new one). But under this circumstance, creating sub-venvs does not work as expected: python -m venv DIR does not create new pip / pip3 entry points then, which causes subsequent pip calls to use the main venv again.

This does not happen when using virtualenv proper to create the sub-venv – which makes me wonder why we moved back from python3-virtualenv to python3-venv in the first place. From virtualenv's manual:

The venv module does not offer all features of this library, to name just a few more prominent:
...

  • is not upgrade-able via pip,

This does also not happen when the main venv gets deactivated before the sub-venv is created, even with the venv module.

So which of these 2 options is preferable as a fix?

[Docker] Tensorflow conflict

Distribution

ocrd/all:maximum-git
Id: sha256:f4a95c7ce9533446f91de9bd076911e2ebbad4e265fa97bc05e5b84f7ccaa2aa
Created: 2020-06-17T13:50:54.535025462Z

Plattform

Ubuntu 18.04 LTS

Log

2020-06-18 10:21:01,534.534 INFO ocrd.workspace - Saving mets '/data/1000/mets.xml'
building OCR-D-SEG-PAGE-anyocr from OCR-D-BINPAGE-sauvola with pattern rule for ocrd-anybaseocr-crop
STAMP=`test -e OCR-D-SEG-PAGE-anyocr && date -Ins -r OCR-D-SEG-PAGE-anyocr`; ocrd-anybaseocr-crop   -I OCR-D-BINPAGE-sauvola -p OCR-D-SEG-PAGE-anyocr.json -O OCR-D-SEG-PAGE-anyocr --overwrite 2>&1 | tee OCR-D-SEG-PAGE-anyocr.log && touch -c OCR-D-SEG-PAGE-anyocr || { if test -z "$STAMP"; then rm -fr OCR-D-SEG-PAGE-anyocr; else touch -c -d "$STAMP" OCR-D-SEG-PAGE-anyocr; fi; false; }
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 583, in _build_master
    ws.require(__requires__)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 900, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 791, in resolve
    raise VersionConflict(dist, req).with_context(dependent_req)
pkg_resources.ContextualVersionConflict: (tensorflow 1.15.3 (/usr/lib/python3.6/site-packages), Requirement.parse('tensorflow>=2.0'), {'ocrd-anybaseocr'})

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/ocrd-anybaseocr-crop", line 33, in <module>
    sys.exit(load_entry_point('ocrd-anybaseocr', 'console_scripts', 'ocrd-anybaseocr-crop')())
  File "/usr/bin/ocrd-anybaseocr-crop", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/usr/lib/python3.6/site-packages/importlib_metadata/__init__.py", line 96, in load
    module = import_module(match.group('module'))
  File "/usr/lib/python3.6/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 994, in _gcd_import
  File "<frozen importlib._bootstrap>", line 971, in _find_and_load
  File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 678, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/build/ocrd_anybaseocr/ocrd_anybaseocr/cli/cli.py", line 3, in <module>
    from ocrd.decorators import ocrd_cli_options, ocrd_cli_wrap_processor
  File "/build/core/ocrd/ocrd/__init__.py", line 17, in <module>
    from ocrd.processor.base import run_processor, run_cli, Processor
  File "/build/core/ocrd/ocrd/processor/__init__.py", line 1, in <module>
    from .base import (
  File "/build/core/ocrd/ocrd/processor/base.py", line 6, in <module>
    from ocrd_utils import getLogger, VERSION as OCRD_VERSION
  File "/build/core/ocrd_utils/ocrd_utils/__init__.py", line 135, in <module>
    from .logging import * # pylint: disable=wildcard-import
  File "/build/core/ocrd_utils/ocrd_utils/logging.py", line 22, in <module>
    from .constants import LOG_FORMAT, LOG_TIMEFMT
  File "/build/core/ocrd_utils/ocrd_utils/constants.py", line 4, in <module>
    from pkg_resources import get_distribution
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3260, in <module>
    @_call_aside
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3244, in _call_aside
    f(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 3273, in _initialize_master_working_set
    working_set = WorkingSet._build_master()
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 585, in _build_master
    return cls._build_from_requirements(__requires__)
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 598, in _build_from_requirements
    dists = ws.resolve(reqs, Environment())
  File "/usr/lib/python3.6/site-packages/pkg_resources/__init__.py", line 786, in resolve
    raise DistributionNotFound(req, requirers)
pkg_resources.DistributionNotFound: The 'tensorflow>=2.0' distribution was not found and is required by ocrd-anybaseocr
Makefile:313: recipe for target 'OCR-D-SEG-PAGE-anyocr' failed
make[1]: *** [OCR-D-SEG-PAGE-anyocr] Error 1
make[1]: Leaving directory '/data/1000'
make: *** [1000] Error 2
Makefile:204: recipe for target '1000' failed
make: Leaving directory '/data'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.