Sorry I am new to docker. I just pull the latest, and want to use language chi_sim in

I'll add more languages next time I update ocrmypdf. The <code class

How to add languages for tesseract-ocr in the image? about ocrmypdf HOT 2 CLOSED

ocrmypdf commented on May 18, 2024

How to add languages for tesseract-ocr in the image?

from ocrmypdf.

Comments (2)

jbarlow83 commented on May 18, 2024

I'll add more languages next time I update ocrmypdf.

The Dockerfile specifies how the container was built. It provides its own copy of tesseract and will not use the one on your machine, or anything else about your machine. It's like a lightweight virtual machine.

You can jump inside an ocrmypdf container, modify it, and save the changes as your own private image. (A container is an instance of image.)

In your case it would go something like this (not tested, made up on the spot):

$ docker run -t -i ocrmypdf /bin/bash
root@container:/# apt-get install tesseract-ocr-chi-sim
root@container:/# exit
$ docker commit -m "Added Chinese simplified" -a "Your Name"

See here:
https://docs.docker.com/engine/userguide/dockerimages/

from ocrmypdf.

jbarlow83 commented on May 18, 2024

I decided to produce a second version of the container which provides all Tesseract's languages.

You can use this command to download it. Then Chinese (Simplified and Traditional) will be available.

docker pull jbarlow83/ocrmypdf-polyglot

from ocrmypdf.

How to add languages for tesseract-ocr in the image? about ocrmypdf HOT 2 CLOSED

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent