Coder Social home page Coder Social logo

Comments (12)

bertsky avatar bertsky commented on June 26, 2024

Thanks @jbarth-ubhd for the detailled report and analysis.

Simple reason: osd.traineddata is missing. Used to get installed – checking why not.

from ocrd_all.

bertsky avatar bertsky commented on June 26, 2024

Got it!

TESSDATA = $(VIRTUAL_ENV)/share/tessdata/

… must now be $(VIRTUAL_ENV)/share/ocrd-resources/ocrd-tesserocr-recognize.

So we have a mismatch between the install-time location and the runtime/resmgr location.

from ocrd_all.

bertsky avatar bertsky commented on June 26, 2024

must now be $(VIRTUAL_ENV)/share/ocrd-resources/ocrd-tesserocr-recognize.

No, that would not work either, because we use configure --prefix=$(VIRTUAL_ENV), so Tesseract will be compiled for the share/tessdata.

Rather, there was a superflous environment variable override:

ENV TESSDATA_PREFIX $XDG_DATA_HOME/ocrd-resources/ocrd-tesserocr-recognize

from ocrd_all.

jbarth-ubhd avatar jbarth-ubhd commented on June 26, 2024

Just wanted to check ocrd resmgr list-available on my workstation (ubuntu 20.04, docker, docker pulled a lot of files for ocrd/all):

jb@pers16:~> alias docker_ocrd
alias docker_ocrd='sudo docker run --user $(id -u) --workdir /data --volume $PWD/data:/data --volume $PWD/models:/
►usr/local/share/ocrd-resources ocrd/all'

jb@pers16:~> docker_ocrd ocrd resmgr list-available
Traceback (most recent call last):
  File "/usr/local/bin/ocrd", line 33, in <module>
    sys.exit(load_entry_point('ocrd', 'console_scripts', 'ocrd')())
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/build/core/ocrd/ocrd/cli/resmgr.py", line 47, in list_available
    resmgr = OcrdResourceManager()
  File "/build/core/ocrd/ocrd/resource_manager.py", line 34, in __init__
    self.user_list.parent.mkdir(parents=True)
  File "/usr/lib/python3.6/pathlib.py", line 1248, in mkdir
    self._accessor.mkdir(self, mode)
  File "/usr/lib/python3.6/pathlib.py", line 387, in wrapped
    return strfunc(str(pathobj), *args)
PermissionError: [Errno 13] Permission denied: '/.config/ocrd'

from ocrd_all.

jbarth-ubhd avatar jbarth-ubhd commented on June 26, 2024

ah... with --volume $PWD/.config:/.config it works

jb@pers16:~> sudo docker run --user $(id -u) --workdir /data --volume $PWD/data:/data --volume $PWD/models:/usr/
►local/share/ocrd-resources --volume $PWD/.config:/.config ocrd/all ocrd resmgr list-available
ocrd-tesserocr-recognize
- Fraktur_GT4HistOCR.traineddata  (https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/Fraktur_5000000/
►tessdata_fast/Fraktur_50000000.334_450937.traineddata)
  Tesseract LSTM model trained on GT4HistOCR
- ONB.traineddata  (https://ub-backup.bib.uni-mannheim.de/~stweil/ocrd-train/data/ONB/tessdata_best/
►ONB_1.195_300718_989100.traineddata)
  Tesseract LSTM model based on Austrian National Library newspaper data
- equ.traineddata  (https://github.com/tesseract-ocr/tessdata_fast/raw/main/equ.traineddata)
  Tesseract equ model
...

from ocrd_all.

jbarth-ubhd avatar jbarth-ubhd commented on June 26, 2024

... almost

jb@pers16:~> docker_ocrd ocrd resmgr download ocrd-tesserocr-recognize configs
12:30:17.190 INFO ocrd.cli.resmgr - Downloading resource {'url': 'https://github.com/tesseract-ocr/tesseract/
►archive/main.tar.gz', 'name': 'configs', 'description': 'Tesseract configs (parameter sets) for use with the 
►standalone tesseract CLI', 'size': 1915529, 'type': 'tarball', 'path_in_archive': 'tesseract-main/tessdata/configs
►', 'parameter_usage': 'as-is', 'version_range': '>= 0.0.1'}
12:30:17.193 INFO ocrd.resource_manager._download_impl - Downloading https://github.com/tesseract-ocr/tesseract/
►archive/main.tar.gz to download.tar.xx
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connection.py", line 175, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/connection.py", line 72, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
...

Is this my ubuntu 20.04 with dnsmasq in NetworkManager.conf?

root@pers16:/home/jb# cat /etc/NetworkManager/NetworkManager.conf
[main]
plugins=ifupdown,keyfile,ofono
dns=dnsmasq

no-auto-default=00:01:02:12:40:C5,00:21:9B:5E:BE:17,90:1B:0E:42:7D:AE,

[ifupdown]
managed=false

from ocrd_all.

jbarth-ubhd avatar jbarth-ubhd commented on June 26, 2024

sudo docker run --dns A.B.C.D ... helped.

from ocrd_all.

jbarth-ubhd avatar jbarth-ubhd commented on June 26, 2024

BTW no osd.traineddata in ~/models/ocrd-tesserocr-recognize/

from ocrd_all.

bertsky avatar bertsky commented on June 26, 2024

ah... with --volume $PWD/.config:/.config it works

yes, sorry, we forgot to document this on https://ocr-d.de/en/models#models-and-docker

now tracking under OCR-D/ocrd-website#318

from ocrd_all.

bertsky avatar bertsky commented on June 26, 2024

BTW no osd.traineddata in ~/models/ocrd-tesserocr-recognize/

like I said above (see PR with fix), there must not be TESSDATA_PREFIX at install time (make all or make install-tesseract).

from ocrd_all.

bertsky avatar bertsky commented on June 26, 2024

sudo docker run --dns A.B.C.D ... helped.

I remember seeing this problem before. Also happens at build-time (docker build). You can also try with --network=host or --network=bridge.

from ocrd_all.

jbarth-ubhd avatar jbarth-ubhd commented on June 26, 2024

schnief (german)

from ocrd_all.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.