OpenOCR makes it simple to host your own OCR REST API.
The heavy lifting OCR work is handled by Tesseract OCR.
Docker is used to containerize the various components of the service.
See Installing Docker on Ubuntu instructions.
$ ifconfig
eth0 Link encap:Ethernet HWaddr 08:00:27:43:40:c7
inet addr:10.0.2.15 Bcast:10.0.2.255 Mask:255.255.255.0
...
The ip address 10.0.2.15
will be used as the DOCKER_HOST
env variable below.
Here's how to launch the docker images needed for OpenOCR.
$ curl -O https://raw.githubusercontent.com/tleyden/open-ocr/master/launcher/launcher.sh
$ export DOCKER_HOST=10.0.2.15 RABBITMQ_PASS=supersecret2 HTTP_PORT=8080
$ chmod +x launcher.sh
$ ./launcher.sh
This will start three docker instances:
You are now ready to decode images → text via your REST API.
Request
$ curl -X POST -H "Content-Type: application/json" -d '{"img_url":"http://bit.ly/ocrimage","engine":"tesseract"}' http://$DOCKER_HOST:$HTTP_PORT/ocr
Response
It will return the decoded text for the test image:
< HTTP/1.1 200 OK
< Date: Tue, 13 May 2014 16:18:50 GMT
< Content-Length: 283
< Content-Type: text/plain; charset=utf-8
<
You can create local variables for the pipelines within the template by
prefixing the variable name with a “$" sign. Variable names have to be
composed of alphanumeric characters and the underscore. In the example
below I have used a few variations that work for variable names.
More coming soon ..
OpenOCR is Open Source and available under the Apache 2 License.