sirfz / tesserocr Goto Github PK
View Code? Open in Web Editor NEWA Python wrapper for the tesseract-ocr API
License: MIT License
A Python wrapper for the tesseract-ocr API
License: MIT License
Hi, I want to install tess with pip insatll -r requirements.txt
(in running tests with tox actually), but this scenario doesnt work. See:
$ virtualenv test
$ cd test/
$ source bin/activate
(test) $ echo Cython >> requirements.txt
(test) $ echo tesserocr >> requirements.txt
(test) $ pip install -r requirements.txt
I got error:
Collecting Cython (from -r requirements.txt (line 1))
Using cached Cython-0.25.2-cp35-cp35m-manylinux1_x86_64.whl
Collecting tesserocr (from -r requirements.txt (line 2))
Using cached tesserocr-2.1.3.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-ag5vqu0y/tesserocr/setup.py", line 11, in <module>
from Cython.Distutils import build_ext
ImportError: No module named 'Cython'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-ag5vqu0y/tesserocr/
I'm Python newbie, but I think this issue maybe due to lack of setup_requires
or install_requires
in https://github.com/sirfz/tesserocr/blob/master/setup.py?
How could I add white list for the recognition please? For example, if I know my image contains digits only. How could I set to the result limit to [0-9]?
$ pip install ./tesserocr --upgrade
Processing ./tesserocr
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/var/folders/1f/8hp6xg6j5wn8pzb94y56x4f00000gn/T/pip-kjLPF8-build/setup.py", line 6, in <module>
from Cython.Distutils import build_ext
ImportError: No module named Cython.Distutils
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /var/folders/1f/8hp6xg6j5wn8pzb94y56x4f00000gn/T/pip-kjLPF8-build/
git bisect
ff1c3c764269864c731c305a59e1d03a9a6ac821 is the first bad commit
commit ff1c3c764269864c731c305a59e1d03a9a6ac821
Author: FZ <[email protected]>
Date: Sat May 21 20:10:04 2016 +0300
setup now requires Cython and passes tesseract version as cython_compile_time_env
:100644 100644 3edde89bbaec3781147180d476778fb51ff15f6f 1153e1d0f3f30149cb0c98a419a4355e0b15e10a M setup.py
:000000 100644 0000000000000000000000000000000000000000 0ffc837bfff77266e17c0f48bb0fe52bb1733df7 A tesseractversion.pyx
:100644 000000 32375e79fcba5805a9654f036f32f0b224cdded6 0000000000000000000000000000000000000000 D tesserocr.cpp
platform:Ubuntu 15.1/Python 2.7 This is a demo which I used,from https://github.com/sirfz/tesserocr
import tesserocr
from PIL import Image
print tesserocr.tesseract_version() # print tesseract-ocr version
print tesserocr.get_languages() # prints tessdata path and list of available languages
image = Image.open('03.jpg') # I verify the file and directory is right
print tesserocr.image_to_text(image) # print ocr text from image
# or
print tesserocr.file_to_text('03.jpg')
And above get these output:
tesseract 3.04.00
leptonica-1.73
zlib 1.2.8
(u'/usr/share/tesseract-ocr/tessdata/', [u'eng', u'osd', u'equ'])
Traceback (most recent call last):
File "testImage.py", line 8, in <module>
print tesserocr.image_to_text(image) # print ocr text from image
File "tesserocr.pyx", line 2281, in tesserocr.image_to_text (tesserocr.cpp:20529)
RuntimeError: Failed to read picture
when i pass a image with letters and numbers to tesserocr,the result show that number and upper letter can not be recognize.
how can i fix this problem?
The compilation on slightly older system (Ubuntu Precise used on Travis CI) fails:
Supporting tesseract v3.04.01
Configs from pkg-config: {'libraries': ['lept', 'tesseract'], 'cython_compile_time_env': {'TESSERACT_VERSION': 197633}, 'library_dirs': ['/home/travis/build/WeblateOrg/weblate/.tesseract/lib'], 'include_dirs': ['/home/travis/build/WeblateOrg/weblate/.tesseract/include']}
running bdist_wheel
running build
running build_ext
building 'tesserocr' extension
creating build
creating build/temp.linux-x86_64-2.7
gcc -pthread -fno-strict-aliasing -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/travis/build/WeblateOrg/weblate/.tesseract/include -I/opt/python/2.7.9/include/python2.7 -c tesserocr.cpp -o build/temp.linux-x86_64-2.7/tesserocr.o -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
cc1plus: error: unrecognized command line option ‘-std=c++11’
error: command 'gcc' failed with exit status 1
The problem seems to be in setup.py line 130 as that compares version which is comparing TESSERACT_VERSION which is 197633
at this point with 4.
while it should probably compare with 0x40000
.
Hello,
I'm iterating over RIL.BLOCK and want to get the text of each BLOCK.
In tesserocr.pyx line 390 the following is written:
>>> for e in iterate_level(api.AnalyseLayout(), RIL.WORD):
... word = e.GetUTF8Text()
Unfortunately this does not work. I get the following error:
AttributeError: 'tesserocr.PyPageIterator' object has no attribute 'GetUTF8Text'
I found some code on the forum and a blog that may help:
OSResults *orientationStruct = new OSResults();
bool gotOrientation = myTess->DetectOS(orientationStruct);
int bestOrientation = -1;
float bestOrientationScore = 0;
if ((gotOrientation) && (orientationStruct->orientations != NULL)) {
for (int i=0; i<4; i++) {
if (orientationStruct->orientations[i] > bestOrientationScore) {
bestOrientation = i;
bestOrientationScore = orientationStruct->orientations[i];
}
}
}
// This is the result we were asked for
results.textOrientation = bestOrientation;
#include <tesseract/baseapi.h>
#include <tesseract/osdetect.h>
#include <leptonica/allheaders.h>
int main(int argc, char **argv) {
OSResults os_results;
tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
if (api->Init(NULL, "eng")) {
fprintf(stderr, "Could not initialize tesseract.\n");
exit(1);
}
Pix *image = pixRead(filename);
api->SetImage(image);
// To detect correct OS and flip images
api->DetectOS(&os_results);
OrientationDetector os_detector = OrientationDetector(&os_results);
int correct_orientation = os_detector.get_orientation();
// Had to add this condition because get_orientation result and
// pixRotateOrth were not in sync.
if (correct_orientation == 1) {
image = pixRotate90(image, -1);
}
else if (correct_orientation == 3) {
image = pixRotate90(image, 1);
}
else if (correct_orientation == 2) {
pixRotate180(image, image);
}
api->SetImage(image);
char* ocrResult = api->GetUTF8Text();
fprintf(stdout, "Recognized Text: %s\n", ocrResult);
api->End();
pixDestroy(&image);
delete [] ocrResult;
return 0;
}
Never mind. Issue fixed by switching to tesseract4 branch
I tried to install tesserocr in Ubuntu. I got following error. I have installed tesseract already. I donot know why it can not find.
Can someone help me out ?
$tesseract -v
tesseract 3.03
leptonica-1.70
libgif 4.1.6(?) : libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : webp 0.4.0
$ CPPFLAGS=-I/usr/lib pip install tesserocr
Using cached tesserocr-2.1.2.tar.gz
Complete output from command python setup.py egg_info:
running egg_info
creating pip-egg-info/tesserocr.egg-info
writing pip-egg-info/tesserocr.egg-info/PKG-INFO
writing top-level names to pip-egg-info/tesserocr.egg-info/top_level.txt
writing dependency_links to pip-egg-info/tesserocr.egg-info/dependency_links.txt
writing manifest file 'pip-egg-info/tesserocr.egg-info/SOURCES.txt'
warning: manifest_maker: standard file '-c' not found
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-DEBtw3/tesserocr/setup.py", line 166, in <module>
test_suite='tests'
File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
dist.run_commands()
File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 195, in run
self.find_sources()
File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 222, in find_sources
mm.run()
File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 306, in run
self.add_defaults()
File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/egg_info.py", line 335, in add_defaults
sdist.add_defaults(self)
File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/command/sdist.py", line 160, in add_defaults
build_ext = self.get_finalized_command('build_ext')
File "/usr/lib/python2.7/distutils/cmd.py", line 311, in get_finalized_command
cmd_obj = self.distribution.get_command_obj(command, create)
File "/usr/lib/python2.7/distutils/dist.py", line 846, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/home/eijmmmp/BCReader/.virtEnv/local/lib/python2.7/site-packages/setuptools/__init__.py", line 137, in __init__
_Command.__init__(self, dist)
File "/usr/lib/python2.7/distutils/cmd.py", line 64, in __init__
self.initialize_options()
File "/tmp/pip-build-DEBtw3/tesserocr/setup.py", line 120, in initialize_options
build_args = package_config()
File "/tmp/pip-build-DEBtw3/tesserocr/setup.py", line 59, in package_config
raise Exception(error)
Exception: Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
No package 'tesseract' found
Is there a way to set OpenCV images directly like the one in Python Tesseract? Opencv always comes in handy for preprocessing and it would be a waste of resources to save it to a file and read it again
Is any plan to porting tesseract 4.0 alpha, 4.0 add new OCR engine based on LSTM neural networks is more powerful and fast (Hardware acceleration)
i got error when i run the code
from tesserocr import PyTessBaseAPI
print(tesserocr.tesseract_version()) # print tesseract-ocr version
print(tesserocr.get_languages() )
TypeError Traceback (most recent call last)
in ()
----> 1 from tesserocr import PyTessBaseAPI
2
3
4 print(tesserocr.tesseract_version()) # print tesseract-ocr version
/home/parallels/tesserocr/tesserocr/tesserocr.pyx in init tesserocr (tesserocr.cpp:22946)()
42 cdef unicode _abs_path = abspath(join(_api.GetDatapath(), os.pardir)) + os.sep
43 cdef unicode _lang_s = _api.GetInitLanguagesAsString()
---> 44 cdef cchar_t *_DEFAULT_PATH = _abs_path
45 cdef cchar_t *_DEFAULT_LANG = _lang_s
46 _api.End()
TypeError: expected bytes, str found
If i use a image with several characters, it works. How ever it does't work if i want to recognize single character. What should I do?
Thanks!
I have been working on making some bots for programs that only run in windows and I was wondering if you had any pointers on compiling on windows.
I was actually able to build tesserocr.lib but I cannot get past that step.
I used https://github.com/peirick/VS2015_Tesseract to build libtesseract and used that to satisfy all of the imports.
Thank you.
I ran the sample code thats in the readme text with one of my images and interestingly, it gives two different results when using PIL and then image_to_text rather than going directly with file_to_text. The PIL version seems to perform better, and the images are just regular JPEGs.
Sample code being referenced and output is below
CODE
import tesserocr
from PIL import Image
print tesserocr.tesseract_version() # print tesseract-ocr version
print tesserocr.get_languages() # prints tessdata path and list of available languages
image = Image.open(l.blobname)
print tesserocr.image_to_text(image) # print ocr text from image
print "==================================================================="
# or
print tesserocr.file_to_text(l.blobname)
OUTPUT
tesseract 3.04.01
leptonica-1.73
libgif 5.1.2 : libjpeg 8d (libjpeg-turbo 1.4.2) : libpng 1.2.54 : libtiff 4.0.6 : zlib 1.2.8 : libwebp 0.4.4 : libopenjp2 2.1.0
(u'/usr/share/tesseract-ocr/tessdata/', [u'equ', u'eng', u'osd'])
Everyday
Low
Price
Hem n 796577
um pm 5»
murAflWuNNEHSM
===================================================================
1m mm umvtnm
s 1 7885333252;
Everyday
Low
PVICE
appearently it does not find the leptonica headers which are in /usr/local/include for BSD systems. so i guess this might be a problem on OSX too.
with pip the issue can be worked around with setting CPPFLAGS:
CPPFLAGS=-I/usr/local/include pip install git+https://github.com/sirfz/tesserocr.git
setup.py takes an -I paramater:
python setup.py build build_ext -I/usr/local/include
with pip:
Collecting git+https://github.com/sirfz/tesserocr.git
Cloning https://github.com/sirfz/tesserocr.git to /tmp/pip-uq2glx-build
Installing collected packages: tesserocr
Running setup.py install for tesserocr ... error
Complete output from command /usr/home/[...]/venv/bin/python2.7 -u -c "import setuptools, tokenize;__file__='/tmp/pip-uq2glx-build/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-Z6nb_e-record/install-record.txt --single-version-externally-managed --compile --install-headers /usr/home[...]/venv/include/site/python2.7/tesserocr:
/usr/home/ub/work/artfacts-scanner/venv/lib/python2.7/site-packages/setuptools/dist.py:285: UserWarning: Normalizing '2.0.2-beta' to '2.0.2b0'
normalized_version,
running install
running build
running build_ext
building 'tesserocr' extension
creating build
creating build/temp.freebsd-10.3-RELEASE-p3-amd64-2.7
cc -fno-strict-aliasing -O2 -pipe -fstack-protector -fno-strict-aliasing -DNDEBUG -fPIC -I/usr/local/include/python2.7 -c tesserocr.cpp -o build/temp.freebsd-10.3-RELEASE-p3-amd64-2.7/tesserocr.o
tesserocr.cpp:248:10: fatal error: 'leptonica/allheaders.h' file not found
#include "leptonica/allheaders.h"
^
1 error generated.
error: command 'cc' failed with exit status 1
----------------------------------------
Command "/usr/home/[...]/venv/bin/python2.7 -u -c "import setuptools, tokenize;__file__='/tmp/pip-uq2glx-build/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-Z6nb_e-record/install-record.txt --single-version-externally-managed --compile --install-headers /usr/home/ub/[...]/venv/include/site/python2.7/tesserocr" failed with error code 1 in /tmp/pip-uq2glx-build
This is kind of weird . The part of the document I am trying to ocr is the following:
image_to_text produces the folowing:
Test Results
Latest Result
, SINGAPORE
Parameter Unit Outcome
08-Jun-2017
17-021396-01
Viscosity @ 50°C cSt 367.8
Density @ 15°C kg/m® 989.0
Sulphur % (m/m) 2.69
Flash Point °C >67.0
Acid Number mg KOH /g 0.09
Total Sediment Ace. % (m/m) 0.05
Micro Carbon Residue % (m/m) 15.34
Pour Point °C <21
Water content % (V/V) 0.39
Ash % (m/m) 0.039
Vanadium mg/kg 112
Sodium mg/kg 21
Calcium mg/kg 9
Zinc mg/kg <1
Phosphorus mg/kg <1
Iron mg/kg 31
Nickel mg/kg 35
Magnesium mg/kg <1
Potassium mg/kg <1
Silicon mg/kg 11
Aluminium mg/kg 7
Aluminium + Silicon mg/kg 18
Quantity MT
Quantity loss/gain MT
CCAl 850
Net Specific Energy MJ/kg 40.18
When I perform the same operation using the PyTessBaseAPI , Iget the following bounding boxes:
The problem is that I need the relative position of the values, so that to extract a key to value inference, whle performing the same action for multiple documents, so i can not (and dont want to) manually interfere. I can not understand why 7 and 9 are not recognized. To add to this, I got a lot fewer results when I set the segmentation mode to AUTO (this result is given using SPARSE_TEXT). Is there a solution to this absurd problem or is it a matter of luck?
In Python 3.5.2 (in an ipython console) I've copied the file eurotext.tif from this repository to my working directory. I get an error trying to work with that image:
In [50]: from tesserocr import PyTessBaseAPI
In [51]: with PyTessBaseAPI as api:
...: api.SetImageFile('eurotext.tif')
...:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-51-252118aec8ba> in <module>()
----> 1 with PyTessBaseAPI as api:
2 api.SetImageFile('eurotext.tif')
3
AttributeError: __exit__
Also trying to use it directly:
In [52]: tesseract = PyTessBaseAPI()
In [53]: tesseract.SetImage('eurotext.tif')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-53-661b17ef8a1f> in <module>()
----> 1 tesseract.SetImage('eurotext.tif')
tesserocr.pyx in tesserocr.PyTessBaseAPI.SetImage (tesserocr.cpp:13256)()
tesserocr.pyx in tesserocr._image_buffer (tesserocr.cpp:2916)()
tesserocr.pyx in tesserocr._image_buffer (tesserocr.cpp:2780)()
AttributeError: 'str' object has no attribute 'save'
I'm able to open the image in PIL so it's a valid image:
In [54]: from PIL import Image
In [55]: im = Image.open('eurotext.tif')
In [56]: im
Out[56]: <PIL.TiffImagePlugin.TiffImageFile image mode=1 size=1024x800 at 0x7F7E100BF5F8>
What's going on here? Thanks in advance.
Here's what I have installed if that's helpful.
Cython==0.25.1
dask==0.12.0
decorator==4.0.10
ipython==5.1.0
ipython-genutils==0.1.0
networkx==1.11
numpy==1.11.2
pexpect==4.2.1
pickleshare==0.7.4
Pillow==3.4.2
prompt-toolkit==1.0.9
ptyprocess==0.5.1
Pygments==2.1.3
scikit-image==0.12.3
scipy==0.18.1
simplegeneric==0.8.1
six==1.10.0
tesserocr==2.1.3
toolz==0.8.1
traitlets==4.3.1
wcwidth==0.1.7
Also I am able to run tesserocr's tests (python3 setup.py test
) without any errors so I think tesserocr is installed ok.
Cython installed via conda.
My compiler version -
g++ (Ubuntu 4.8.4-2ubuntu1~14.04.1) 4.8.4
The full error message:
running install
running bdist_egg
running egg_info
creating tesserocr.egg-info
writing tesserocr.egg-info/PKG-INFO
writing top-level names to tesserocr.egg-info/top_level.txt
writing dependency_links to tesserocr.egg-info/dependency_links.txt
writing manifest file 'tesserocr.egg-info/SOURCES.txt'
reading manifest file 'tesserocr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'tesserocr.egg-info/SOURCES.txt'
installing library code to build/bdist.linux-x86_64/egg
running install_lib
running build_ext
skipping 'tesserocr.cpp' Cython extension (up-to-date)
building 'tesserocr' extension
creating build
creating build/temp.linux-x86_64-2.7
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/yonatan/anaconda/envs/scraper/include/python2.7 -c tesserocr.cpp -o build/temp.linux-x86_64-2.7/tesserocr.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
tesserocr.cpp: In function ‘PyObject* pyx_pf_9tesserocr_14PyPageIterator_20SetBoundingBoxComponents(_pyx_obj_9tesserocr_PyPageIterator, bool, bool)’:
tesserocr.cpp:3933:25: error: ‘class tesseract::PageIterator’ has no member named ‘SetBoundingBoxComponents’
__pyx_v_self->_piter->SetBoundingBoxComponents(__pyx_v_include_upper_dots, pyx_v_include_lower_dots);
^
tesserocr.cpp: In function ‘PyObject pyx_pf_9tesserocr_14PyPageIterator_34GetImage(pyx_obj_9tesserocr_PyPageIterator, tesseract::PageIteratorLevel, int, PyObject)’:
tesserocr.cpp:5195:125: error: no matching function for call to ‘tesseract::PageIterator::GetImage(tesseract::PageIteratorLevel&, int&, Pix&, int, int)’
__pyx_v_pix = __pyx_v_self->_piter->GetImage(__pyx_v_level, __pyx_v_padding, __pyx_v_opix, (&_pyx_v_left), (&pyx_v_top));
^
tesserocr.cpp:5195:125: note: candidate is:
In file included from tesserocr.cpp:258:0:
/usr/include/tesseract/pageiterator.h:239:8: note: Pix tesseract::PageIterator::GetImage(tesseract::PageIteratorLevel, int, int, int) const
Pix* GetImage(PageIteratorLevel level, int padding,
^
/usr/include/tesseract/pageiterator.h:239:8: note: candidate expects 4 arguments, 5 provided
tesserocr.cpp: In function ‘PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_74AnalyseLayout(_pyx_obj_9tesserocr_PyTessBaseAPI, bool)’:
tesserocr.cpp:15239:83: error: no matching function for call to ‘tesseract::TessBaseAPI::AnalyseLayout(bool&)’
__pyx_v_piter = __pyx_v_self->_baseapi.AnalyseLayout(_pyx_v_merge_similar_words);
^
tesserocr.cpp:15239:83: note: candidate is:
In file included from tesserocr.cpp:262:0:
/usr/include/tesseract/baseapi.h:489:17: note: tesseract::PageIterator tesseract::TessBaseAPI::AnalyseLayout()
PageIterator* AnalyseLayout();
^
/usr/include/tesseract/baseapi.h:489:17: note: candidate expects 0 arguments, 1 provided
tesserocr.cpp: In function ‘tesseract::TessResultRenderer* __pyx_f_9tesserocr_13PyTessBaseAPI__get_renderer(_pyx_obj_9tesserocr_PyTessBaseAPI, _pyx_t_9tesseract_cchar_t)’:
tesserocr.cpp:15592:88: error: no matching function for call to ‘tesseract::TessHOcrRenderer::TessHOcrRenderer(_pyx_t_9tesseract_cchar_t&, bool&)’
__pyx_t_2 = new tesseract::TessHOcrRenderer(__pyx_v_outputbase, __pyx_v_font_info);
^
tesserocr.cpp:15592:88: note: candidates are:
In file included from tesserocr.cpp:261:0:
/usr/include/tesseract/renderer.h:175:3: note: tesseract::TessHOcrRenderer::TessHOcrRenderer()
TessHOcrRenderer();
^
/usr/include/tesseract/renderer.h:175:3: note: candidate expects 0 arguments, 2 provided
/usr/include/tesseract/renderer.h:173:16: note: tesseract::TessHOcrRenderer::TessHOcrRenderer(const tesseract::TessHOcrRenderer&)
class TESS_API TessHOcrRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:173:16: note: candidate expects 1 argument, 2 provided
tesserocr.cpp:15635:106: error: no matching function for call to ‘tesseract::TessPDFRenderer::TessPDFRenderer(pyx_t_9tesseract_cchar_t&, const char)’
__pyx_t_3 = new tesseract::TessPDFRenderer(__pyx_v_outputbase, __pyx_v_self->baseapi.GetDatapath());
^
tesserocr.cpp:15635:106: note: candidates are:
In file included from tesserocr.cpp:261:0:
/usr/include/tesseract/renderer.h:188:3: note: tesseract::TessPDFRenderer::TessPDFRenderer(const char)
TessPDFRenderer(const char _datadir);
^
/usr/include/tesseract/renderer.h:188:3: note: candidate expects 1 argument, 2 provided
/usr/include/tesseract/renderer.h:186:16: note: tesseract::TessPDFRenderer::TessPDFRenderer(const tesseract::TessPDFRenderer&)
class TESS_API TessPDFRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:186:16: note: candidate expects 1 argument, 2 provided
tesserocr.cpp:15719:69: error: no matching function for call to ‘tesseract::TessUnlvRenderer::TessUnlvRenderer(_pyx_t_9tesseract_cchar_t&)’
__pyx_t_4 = new tesseract::TessUnlvRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:15719:69: note: candidates are:
In file included from tesserocr.cpp:261:0:
/usr/include/tesseract/renderer.h:227:3: note: tesseract::TessUnlvRenderer::TessUnlvRenderer()
TessUnlvRenderer();
^
/usr/include/tesseract/renderer.h:227:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:225:16: note: tesseract::TessUnlvRenderer::TessUnlvRenderer(const tesseract::TessUnlvRenderer&)
class TESS_API TessUnlvRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:225:16: note: no known conversion for argument 1 from ‘_pyx_t_9tesseract_cchar_t* {aka const char}’ to ‘const tesseract::TessUnlvRenderer&’
tesserocr.cpp:15803:72: error: no matching function for call to ‘tesseract::TessBoxTextRenderer::TessBoxTextRenderer(_pyx_t_9tesseract_cchar_t&)’
__pyx_t_5 = new tesseract::TessBoxTextRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:15803:72: note: candidates are:
In file included from tesserocr.cpp:261:0:
/usr/include/tesseract/renderer.h:238:3: note: tesseract::TessBoxTextRenderer::TessBoxTextRenderer()
TessBoxTextRenderer();
^
/usr/include/tesseract/renderer.h:238:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:236:16: note: tesseract::TessBoxTextRenderer::TessBoxTextRenderer(const tesseract::TessBoxTextRenderer&)
class TESS_API TessBoxTextRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:236:16: note: no known conversion for argument 1 from ‘_pyx_t_9tesseract_cchar_t* {aka const char}’ to ‘const tesseract::TessBoxTextRenderer&’
tesserocr.cpp:15887:69: error: no matching function for call to ‘tesseract::TessTextRenderer::TessTextRenderer(_pyx_t_9tesseract_cchar_t&)’
__pyx_t_6 = new tesseract::TessTextRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:15887:69: note: candidates are:
In file included from tesserocr.cpp:261:0:
/usr/include/tesseract/renderer.h:164:3: note: tesseract::TessTextRenderer::TessTextRenderer()
TessTextRenderer();
^
/usr/include/tesseract/renderer.h:164:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:162:16: note: tesseract::TessTextRenderer::TessTextRenderer(const tesseract::TessTextRenderer&)
class TESS_API TessTextRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:162:16: note: no known conversion for argument 1 from ‘pyx_t_9tesseract_cchar_t* {aka const char}’ to ‘const tesseract::TessTextRenderer&’
tesserocr.cpp: In function ‘PyObject __pyx_pf_9tesserocr_13PyTessBaseAPI_106IsValidCharacter(_pyx_obj_9tesserocr_PyTessBaseAPI, _pyx_t_9tesseract_cchar_t)’:
tesserocr.cpp:18170:60: error: ‘class tesseract::TessBaseAPI’ has no member named ‘IsValidCharacter’
__pyx_t_1 = __Pyx_PyBool_FromLong(__pyx_v_self->_baseapi.IsValidCharacter(__pyx_v_character)); if (unlikely(!__pyx_t_1)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 2045; __pyx_clineno = __LINE; goto pyx_L1_error;}
^
tesserocr.cpp:365:36: note: in definition of macro ‘__Pyx_PyBool_FromLong’
#define __Pyx_PyBool_FromLong(b) ((b) ? __Pyx_NewRef(Py_True) : __Pyx_NewRef(Py_False))
^
tesserocr.cpp: In function ‘void inittesserocr()’:
tesserocr.cpp:23205:67: error: ‘PSM_RAW_LINE’ is not a member of ‘tesseract’
__pyx_t_2 = __Pyx_PyInt_From_enum__tesseract_3a__3a_PageSegMode(tesseract::PSM_RAW_LINE); if (unlikely(!__pyx_t_2)) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 124; __pyx_clineno = __LINE; goto __pyx_L1_error;}
^
error: command 'gcc' failed with exit status 1
Hi guys,
I was just expermimenting around with the API and found something that I think is an issue.
Instead of executing a script, the API brings out a weird cursor(something shaped like a plus symbol) and freezes
Screenshot:
https://s13.postimg.org/ennhe0e1j/IMG_20170213_113348.jpg
Steps to replicate:
This is the script I wrote http://pastebin.com/JbFR7MaG
Instead of doing the normal python script.py to execute the script, I first made the script executable by doing chmod +x script.py .
I then executed the script by doing ./script.py image.png
The script doesn't execute after the import statement and stops with the + shaped cursor.
Is this an issue? Or am I doing something wrong?
Thanks,
Hi, I am trying to install tesserocr on my Unbuntu x64 14.04.
Unfortunately, I failed to install tesserocr.
Here is the output:
ubuntu@ubuntu-MS:~/tmp/tesserocr$ sudo python setup.py build_ext -I/usr/local/include
running build_ext
Traceback (most recent call last):
File "setup.py", line 166, in
test_suite='tests'
File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
dist.run_commands()
File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib/python2.7/distutils/dist.py", line 970, in run_command
cmd_obj = self.get_command_obj(command)
File "/usr/lib/python2.7/distutils/dist.py", line 846, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/usr/lib/python2.7/dist-packages/setuptools/init.py", line 82, in init
_Command.init(self,dist)
File "/usr/lib/python2.7/distutils/cmd.py", line 64, in init
self.initialize_options()
File "setup.py", line 120, in initialize_options
build_args = package_config()
File "setup.py", line 59, in package_config
raise Exception(error)
Exception: Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
No package 'tesseract' found
Any idea?
Hi, I've encountered a segfault in pixa_to_list
and I can reproduce it consistently. I don't have any idea how to fix this though.
This image here will always make tesserocr segfault:
This image on the other hand works fine:
The code I'm using for testing is simple:
import tesserocr
from PIL import Image
import sys
print(tesserocr.tesseract_version())
print(tesserocr.get_languages())
png = Image.open(sys.argv[1]).convert('L')
# print(tesserocr.image_to_text(png))
with tesserocr.PyTessBaseAPI() as api:
api.SetImage(png)
boxes = api.GetComponentImages(tesserocr.RIL.WORD, True)
for _, box, _, _ in boxes:
pad = box['h'] * 0.2
api.SetRectangle(box['x']-pad, box['y']-pad, box['w']+pad, box['h']+pad)
text = api.GetUTF8Text().strip()
confidence=api.MeanTextConf()
print(text, confidence)
Here is a crash report from OS X: crashreport.txt
Here is the output from a succesful run (including version numbers and so on): success.txt
I'm using tesseract 3.05.00 which I compiled myself as I had this problem with the 3.04 also and I thought maybe the new version would fix the issue.
Here are the relevant environment variables I used when I executed python setup.py install
for tesserocr:
declare -x CFLAGS="-g -fno-omit-frame-pointer -UNDEBUG -O0"
declare -x CPPFLAGS="-I/Users/otimpe/dev/tesseract-3.05.00/dist/include"
declare -x DYLD_LIBRARY_PATH="/Users/otimpe/dev/tesseract-3.05.00/dist/lib"
declare -x LDFLAGS="-L/Users/otimpe/dev/tesseract-3.05.00/dist/lib -g"
declare -x TESSDATA_PREFIX="/usr/local/share"
I successfully installed tesserocr using the tesseract4 branch.
This is the output of tesseract -v on my system (Ubuntu 14.04)
tesseract 4.00.00alpha
leptonica-1.74.1
libgif 4.1.6(?) : libjpeg 8d (libjpeg-turbo 1.3.0) : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8 : libwebp 0.4.0
Found AVX
Found SSE
tesserocr, tesseract and leptonica have been built from source.
I get a segmentation fault when I import tesserocr in python. Here is the entire core dump I obtained using gdb if it helps. Please let me know the steps I should take to fix this.
buralako@puck:~/git/tesserocr$ echo "import tesserocr" > trial.py
buralako@puck:~/git/tesserocr$ gdb python
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.2) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...(no debugging symbols found)...done.
(gdb) run trial.py
Starting program: /usr/bin/python trial.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
GenericVector<int>::clear (this=this@entry=0x5d31) at ../ccutil/genericvector.h:857
857 if (size_reserved_ > 0) {
(gdb) backtrace
#0 GenericVector<int>::clear (this=this@entry=0x5d31) at ../ccutil/genericvector.h:857
#1 0x00007ffff5be5941 in ~GenericVector (this=0x5d31, __in_chrg=<optimized out>) at genericvector.h:666
#2 ~GenericVectorEqEq (this=0x5d31, __in_chrg=<optimized out>) at genericvector.h:642
#3 tesseract::UnicharCompress::Cleanup (this=this@entry=0xbc0f08) at unicharcompress.cpp:432
#4 0x00007ffff5be5f76 in tesseract::UnicharCompress::SetupDecoder (this=this@entry=0xbc0f08) at unicharcompress.cpp:387
#5 0x00007ffff5be639f in tesseract::UnicharCompress::DeSerialize (this=this@entry=0xbc0f08, fp=fp@entry=0x7fffffffce30)
at unicharcompress.cpp:321
#6 0x00007ffff5b779ff in tesseract::LSTMRecognizer::DeSerialize (this=this@entry=0xbc0cb0, fp=fp@entry=0x7fffffffce30)
at lstmrecognizer.cpp:111
#7 0x00007ffff5a69ccb in tesseract::Tesseract::init_tesseract_lang_data (this=this@entry=0xb765f0, arg0=arg0@entry=0xb6d0b8 "",
textbase=textbase@entry=0x0, language=language@entry=0xb4c928 "eng", oem=oem@entry=tesseract::OEM_DEFAULT, configs=configs@entry=0x0,
configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0x0, vars_values=vars_values@entry=0x0,
set_only_non_debug_params=set_only_non_debug_params@entry=false, mgr=mgr@entry=0x7fffffffd090) at tessedit.cpp:193
#8 0x00007ffff5a6a216 in tesseract::Tesseract::init_tesseract_internal (this=this@entry=0xb765f0, arg0=arg0@entry=0xb6d0b8 "",
textbase=textbase@entry=0x0, language=language@entry=0xb4c928 "eng", oem=oem@entry=tesseract::OEM_DEFAULT, configs=configs@entry=0x0,
configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0x0, vars_values=vars_values@entry=0x0,
set_only_non_debug_params=set_only_non_debug_params@entry=false, mgr=mgr@entry=0x7fffffffd090) at tessedit.cpp:402
#9 0x00007ffff5a6abb4 in tesseract::Tesseract::init_tesseract (this=0xb765f0, arg0=0xb6d0b8 "", textbase=textbase@entry=0x0,
language=language@entry=0x7ffff5bf050f "eng", oem=oem@entry=tesseract::OEM_DEFAULT, configs=configs@entry=0x0,
configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0x0, vars_values=vars_values@entry=0x0,
set_only_non_debug_params=set_only_non_debug_params@entry=false, mgr=mgr@entry=0x7fffffffd090) at tessedit.cpp:324
#10 0x00007ffff5a1da46 in tesseract::TessBaseAPI::Init (this=this@entry=0x7ffff67eb720 <__pyx_v_9tesserocr__api>, data=data@entry=0x0,
data_size=data_size@entry=0, language=0x7ffff5bf050f "eng", language@entry=0x0, oem=oem@entry=tesseract::OEM_DEFAULT,
configs=configs@entry=0x0, configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0x0, vars_values=vars_values@entry=0x0,
set_only_non_debug_params=set_only_non_debug_params@entry=false, reader=reader@entry=0x0) at baseapi.cpp:330
#11 0x00007ffff5a1ddde in tesseract::TessBaseAPI::Init (this=this@entry=0x7ffff67eb720 <__pyx_v_9tesserocr__api>,
datapath=datapath@entry=0x0, language=language@entry=0x0, oem=oem@entry=tesseract::OEM_DEFAULT, configs=configs@entry=0x0,
configs_size=configs_size@entry=0, vars_vec=vars_vec@entry=0x0, vars_values=vars_values@entry=0x0,
set_only_non_debug_params=set_only_non_debug_params@entry=false) at baseapi.cpp:284
#12 0x00007ffff65ca23d in Init (language=0x0, datapath=0x0, this=0x7ffff67eb720 <__pyx_v_9tesserocr__api>)
at /usr/local/include/tesseract/baseapi.h:239
#13 inittesserocr () at tesserocr.cpp:25141
#14 0x000000000042266c in _PyImport_LoadDynamicModule ()
#15 0x0000000000540948 in ?? ()
#16 0x0000000000540d08 in ?? ()
#17 0x000000000054111b in ?? ()
#18 0x000000000051dc50 in ?? ()
#19 0x00000000004dc9cb in PyEval_CallObjectWithKeywords ()
#20 0x000000000049b87e in PyEval_EvalFrameEx ()
#21 0x00000000004a1634 in ?? ()
#22 0x000000000044e4a5 in PyRun_FileExFlags ()
#23 0x000000000044ec9f in PyRun_SimpleFileExFlags ()
#24 0x000000000044f904 in Py_Main ()
#25 0x00007ffff7815f45 in __libc_start_main (main=0x44f9c2 <main>, argc=2, argv=0x7fffffffde18, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffde08) at libc-start.c:287
#26 0x0000000000578c4e in _start ()
I checkout this project and run the test_api.py.
All tests that have test_image* failure
tesserocr/tests/test_api.py", line 70, in test_image_file
self._api.SetImageFile(self._image_file)
File "tesserocr.pyx", line 1545, in tesserocr.PyTessBaseAPI.SetImageFile (tesserocr.cpp:13568)
raise RuntimeError('Error reading image')
RuntimeError: Error reading image
Misspelling error, from the lines below:
import tesserocr
tesserocr.PSM()
I am trying to do a pip3 and pip install of tesserocr on Debian stretch and getting the following error:
# pip3 install tesserocr
Collecting tesserocr
Using cached tesserocr-2.1.3.tar.gz
Building wheels for collected packages: tesserocr
Running setup.py bdist_wheel for tesserocr ... error
Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-3coz3mk5/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" bdist_wheel -d /tmp/tmpmrdg6puopip-wheel- --python-tag cp35:
running bdist_wheel
running build
running build_ext
Supporting tesseract v3.04.01
Configs from pkg-config: {'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 197633}, 'include_dirs': ['/usr/include']}
cythoning tesserocr.pyx to tesserocr.cpp
building 'tesserocr' extension
creating build
creating build/temp.linux-x86_64-3.5
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.5-MLq5fN/python3.5-3.5.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include/python3.5m -c tesserocr.cpp -o build/temp.linux-x86_64-3.5/tesserocr.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
tesserocr.cpp: In function ‘PyObject* PyInit_tesserocr()’:
tesserocr.cpp:24651:18: error: ‘L_SEVERITY_NONE’ was not declared in this scope
setMsgSeverity(L_SEVERITY_NONE);
^~~~~~~~~~~~~~~
tesserocr.cpp:24651:33: error: ‘setMsgSeverity’ was not declared in this scope
setMsgSeverity(L_SEVERITY_NONE);
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Failed building wheel for tesserocr
Running setup.py clean for tesserocr
Failed to build tesserocr
Installing collected packages: tesserocr
Running setup.py install for tesserocr ... error
Complete output from command /usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-3coz3mk5/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-j2y3y173-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
Supporting tesseract v3.04.01
Configs from pkg-config: {'cython_compile_time_env': {'TESSERACT_VERSION': 197633}, 'include_dirs': ['/usr/include'], 'libraries': ['lept', 'tesseract']}
skipping 'tesserocr.cpp' Cython extension (up-to-date)
building 'tesserocr' extension
creating build
creating build/temp.linux-x86_64-3.5
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fdebug-prefix-map=/build/python3.5-MLq5fN/python3.5-3.5.3=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include -I/usr/include/python3.5m -c tesserocr.cpp -o build/temp.linux-x86_64-3.5/tesserocr.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
tesserocr.cpp: In function ‘PyObject* PyInit_tesserocr()’:
tesserocr.cpp:24651:18: error: ‘L_SEVERITY_NONE’ was not declared in this scope
setMsgSeverity(L_SEVERITY_NONE);
^~~~~~~~~~~~~~~
tesserocr.cpp:24651:33: error: ‘setMsgSeverity’ was not declared in this scope
setMsgSeverity(L_SEVERITY_NONE);
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Command "/usr/bin/python3 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-3coz3mk5/tesserocr/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-j2y3y173-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-3coz3mk5/tesserocr/
It seems that only Python 2.7 is supported
https://pypi.python.org/pypi/tesserocr/2.0.0
Any plan to support Python 3.x in the near future?
I suggest you to add a paragraph to the README that declares which Python versions are supported.
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-lqe1qy/tesserocr/
Collecting tesserocr
Downloading tesserocr-2.1.1.tar.gz (47kB)
100% |████████████████████████████████| 51kB 3.1MB/s
Complete output from command python setup.py egg_info:
running egg_info
creating pip-egg-info/tesserocr.egg-info
writing pip-egg-info/tesserocr.egg-info/PKG-INFO
writing dependency_links to pip-egg-info/tesserocr.egg-info/dependency_links.txt
writing top-level names to pip-egg-info/tesserocr.egg-info/top_level.txt
writing manifest file 'pip-egg-info/tesserocr.egg-info/SOURCES.txt'
warning: manifest_maker: standard file '-c' not found
Package lept was not found in the pkg-config search path.
Perhaps you should add the directory containing `lept.pc'
to the PKG_CONFIG_PATH environment variable
No package 'lept' found
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-lqe1qy/tesserocr/setup.py", line 163, in <module>
test_suite='tests'
File "/opt/rh/python33/root/usr/lib64/python3.3/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/rh/python33/root/usr/lib64/python3.3/distutils/dist.py", line 929, in run_commands
self.run_command(cmd)
File "/opt/rh/python33/root/usr/lib64/python3.3/distutils/dist.py", line 948, in run_command
cmd_obj.run()
File "/var/lib/openshift/573147277628e1669600012d/python/virtenv/venv/lib/python3.3/site-packages/setuptools-22.0.5-py3.3.egg/setuptools/command/egg_info.py", line 193, in run
File "/var/lib/openshift/573147277628e1669600012d/python/virtenv/venv/lib/python3.3/site-packages/setuptools-22.0.5-py3.3.egg/setuptools/command/egg_info.py", line 216, in find_sources
File "/var/lib/openshift/573147277628e1669600012d/python/virtenv/venv/lib/python3.3/site-packages/setuptools-22.0.5-py3.3.egg/setuptools/command/egg_info.py", line 300, in run
File "/var/lib/openshift/573147277628e1669600012d/python/virtenv/venv/lib/python3.3/site-packages/setuptools-22.0.5-py3.3.egg/setuptools/command/egg_info.py", line 329, in add_defaults
File "/var/lib/openshift/573147277628e1669600012d/python/virtenv/venv/lib/python3.3/site-packages/setuptools-22.0.5-py3.3.egg/setuptools/command/sdist.py", line 132, in add_defaults
File "/opt/rh/python33/root/usr/lib64/python3.3/distutils/cmd.py", line 298, in get_finalized_command
cmd_obj = self.distribution.get_command_obj(command, create)
File "/opt/rh/python33/root/usr/lib64/python3.3/distutils/dist.py", line 821, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/var/lib/openshift/573147277628e1669600012d/python/virtenv/venv/lib/python3.3/site-packages/setuptools-22.0.5-py3.3.egg/setuptools/__init__.py", line 132, in __init__
File "/opt/rh/python33/root/usr/lib64/python3.3/distutils/cmd.py", line 62, in __init__
self.initialize_options()
File "/tmp/pip-build-lqe1qy/tesserocr/setup.py", line 117, in initialize_options
build_args = package_config()
File "/tmp/pip-build-lqe1qy/tesserocr/setup.py", line 72, in package_config
opt = options[f[:2]]
KeyError: '-g'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-lqe1qy/tesserocr/
WordFontAttributes
can return a NULL pointer. This crashes the Python string conversion.
Hello,
I am using the following code in order to perform OCR on an image (attached):
from tesserocr import PyTessBaseAPI
from PIL import Image
DEFAULT_LANGUAGE = "spa"
filePath = "/home/jorge/Desktop/prueba_tess/c1.png"
if __name__ == '__main__':
img = Image.open(filePath)
tesseract = PyTessBaseAPI(lang=DEFAULT_LANGUAGE)
tesseract.SetImage(img)
tesseract.Recognize()
print tesseract.GetUTF8Text()
tesseract.End()
but what I am getting with this particular image in console is the following:
start >= 0 && start + num <= length_:Error:Assert failed:in file ratngs.cpp, line 321
Here is what I am using
tesserocr 2.2.1
tesseract 3.04.01
leptonica-1.71
libjpeg 8d : libpng 1.2.50 : libtiff 4.0.3 : zlib 1.2.8
I think I have all correctly set up because I have extracted text from other images. but with this one throws that "error"...
Any help is appreciated.
Thanks in advance!
here is my part code:
import sys
from PIL import Image
from tesserocr import PyTessBaseAPI, RIL,iterate_level
import cv2
import numpy as np
def processText(image):
image = cv2.imread(image)
iimage = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
kernel = np.ones((1, 1), np.uint8)
#iimage = cv2.dilate(iimage, kernel, iterations=1)
#iimage = cv2.erode(iimage, kernel, iterations=1)
cv2.imwrite("7.png", iimage)
img = cv2.adaptiveThreshold(iimage, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 11, 2)
cv2.imwrite("8.png", img)
image = Image.open('8.png')
im1 = cv2.imread("midResult/init3.jpeg")
wordList = []
with PyTessBaseAPI(psm=1) as api:
api.SetImage(image)
api.Recognize()
boxes = api.GetComponentImages(RIL.TEXTLINE, False)
print 'Found {} textline image components.'.format(len(boxes))
for i, (im, box, _, _) in enumerate(boxes):
radioSize = 0.08
api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
ocrResult = api.GetUTF8Text()
conf = api.MeanTextConf()
print(u"Box[{0}]:x={x},y={y},w={w},h={h},""confidence:{1},text:{2}").format(i, conf, ocrResult, **box)
cv2.rectangle(im1, (box['x'], box['y']),
(box['x'] + box['w'],
box['y']+box['h']), (0, 255, 0), 2)
cv2.imwrite("11.png", im1)
processText("midResult/init3.jpeg")
I have debugged that when running into api.Recognize(), the error in the title occured. But when i changed the pic paremeter into 3.jpeg, the error disappeared. But these two pictures are nearly the same.
what's wrong with this?
In my tesseract installation (version 4.00), when i do tesseract -v
the result is printed in the stdout, so the version
in get_tesseract_version() in setup.py will be always equal to an empty string
A quick fix can be:
p = subprocess.Popen(['tesseract', '-v'], stderr=subprocess.PIPE, stdout=subprocess.PIPE)
stdout_version, version = p.communicate()
if version == '':
version = stdout_version
Could anybody provide an example of how to OCR a PDF?.
Thanks guys!
Is it possible to restrict the character set which is recognized?
The tesseract project already has configuraton files for such things (see e.g. how-to-recognize-only-digits), but I wasn't able to figure out how to do this with this project.
I compiled tesseract from source so my tessdata path is not "/usr/local/share/tessdata". How could I change the tessdata path so that a function like tesserocr.get_languages() will look in the right place?
reubano@tokpro [~]⚡ convert eurotext.tif -rotate 3 +repage eurotext_ang.tif
reubano@tokpro [~]⚡ tesseract eurotext_ang.tif - -psm 0
Orientation: 0
Orientation in degrees: 0
Orientation confidence: 20.66
Script: 1
Script confidence: 39.58
image = Image.open('eurotext_ang.tif')
with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:
api.SetImage(image)
api.Recognize()
it = api.AnalyseLayout()
it.Orientation()
output
AttributeError: 'NoneType' object has no attribute 'Orientation'
After successfull compilation the example code in the readme fails while importing PyTessBaseAPI
from tesserocr import PyTessBaseAPI
File "build/bdist.linux-x86_64/egg/tesserocr.py", line 7, in <module>
File "build/bdist.linux-x86_64/egg/tesserocr.py", line 6, in __bootstrap__
ImportError: /home/leonardo/.python-eggs/tesserocr-2.0.2b0-py2.7-linux-x86_64.egg-tmp/tesserocr.so: undefined symbol: _ZN9tesseract11TessBaseAPI13AnalyseLayoutEb
I've noticed the orientation example doesn't distinguish between upside down/rightside up and clockwise/counter clockwise orientations.
reubano@tokpro [~]⚡ tesseract -psm 0 up.jpg -
Orientation: 0
Orientation in degrees: 0
Orientation confidence: 0.23
Script: 1
Script confidence: 0.98
reubano@tokpro [~]⚡ tesseract -psm 0 down.jpg -
Orientation: 2
Orientation in degrees: 180
Orientation confidence: 0.21
Script: 1
Script confidence: 0.61
with PyTessBaseAPI(psm=PSM.AUTO_OSD) as api:
for path in ['up.jpg', 'down.jpg']:
image = Image.open(path)
api.SetImage(image)
api.Recognize()
it = api.AnalyseLayout()
print it.Orientation()
(0, 0, 2, 0.0)
(0, 0, 2, 0.0)
I submitted this repo to hacker news and there's a question there you might want to answer.
Ideally with static libraries included so that all that is required for installation in pip
.
Hi !
I'm trying to use tesserocr with french language but I keep getting errors on Unicode decoder
api = PyTessBaseAPI(lang='fra')
api.SetImage(Image.open("20170509_182040.jpg"))
api.SetSourceResolution(300)
api.GetUTF8Text()
Returns:
Traceback (most recent call last):
File "", line 1, in
File "tesserocr.pyx", line 2033, in tesserocr.PyTessBaseAPI.GetUTF8Text (tesserocr.cpp:18137)
File "tesserocr.pyx", line 294, in tesserocr._free_str (tesserocr.cpp:2567)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 341: invalid continuation byte
Although the english version is working:
api = PyTessBaseAPI()
api.SetImage(Image.open("20170509_182040.jpg"))
api.SetSourceResolution(300)
api.GetUTF8Text()
Returns :
'The text that I want'
This is my installation :
tesserocr.version
'2.1.3'
tesserocr.tesseract_version()
'tesseract 3.05.00\n leptonica-1.74.1\n libjpeg 8d : libpng 1.6.29 : libtiff 4.0.7 : zlib 1.2.8\n'
MacOS Sierra
Is it a known issue or do I need to change something to get it to work ?
Thanks for your help !
Hi, i tried to install the Tesseract 4.0 on ubuntu as described on the website
all worked fine except for the the last command pip install tesserocr although i already have python 2.7 installed,
im attaching a snapshot of the error that i have when i run the above command
Any ideas on how to solve this issue?
Thanks
Hi,
I have a problem installing tesserocr in CentOS.
It gives me this error
Configs from pkg-config: {'libraries': ['lept', 'tesseract'], 'cython_compile_time_env': {'TESSERACT_VERSION': 262144}, 'library_dirs': ['/usr/local/lib'], 'include_dirs': ['/usr/local/include']}
cythoning tesserocr.pyx to tesserocr.cpp
building 'tesserocr' extension
creating build
creating build/temp.linux-x86_64-2.7
gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include -I/usr/local/include/python2.7 -c tesserocr.cpp -o build/temp.linux-x86_64-2.7/tesserocr.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from /usr/local/include/tesseract/genericvector.h:29:0,
from tesserocr.cpp:417:
/usr/local/include/tesseract/helpers.h: In member function ‘void tesseract::TRand::set_seed(const string&)’:
/usr/local/include/tesseract/helpers.h:50:5: error: ‘hash’ is not a member of ‘std’
std::hash<std::string> hasher;
^
/usr/local/include/tesseract/helpers.h:50:26: error: expected primary-expression before ‘>’ token
std::hash<std::string> hasher;
^
/usr/local/include/tesseract/helpers.h:50:28: error: ‘hasher’ was not declared in this scope
std::hash<std::string> hasher;
^
error: command 'gcc' failed with exit status 1
It worked just fine in Ubuntu. Any idea what's wrong?
I thougt maybe it's a problem of gcc version, so I updated it to 5.4.0 but it didn't help.
Hi, I'm trying to install tesserocr in a Docker container that's set up as follows:
FROM ubuntu:14.04
RUN apt-get -y update && apt-get install -y tesseract-ocr python3-imaging python3-pip python3-skimage libtesseract-dev libleptonica-dev
RUN pip3 install pytesseract ipython Cython
Then inside the container I manually run the command:
pip3 install tesserocr
It fails with the following. Any tips for how to get past this? Thanks.
Downloading/unpacking tesserocr
Downloading tesserocr-2.1.3.tar.gz (49kB): 49kB downloaded
Running setup.py (path:/tmp/pip_build_root/tesserocr/setup.py) egg_info for package tesserocr
/usr/local/lib/python3.4/dist-packages/Cython/Distutils/old_build_ext.py:30: UserWarning: Cython.Distutils.old_build_ext does not properly handle dependencies and is deprecated.
"Cython.Distutils.old_build_ext does not properly handle dependencies "
Supporting tesseract v3.03
Building with configs: {'libraries': ['tesseract', 'lept'], 'cython_compile_time_env': {'TESSERACT_VERSION': 771}}
Installing collected packages: tesserocr
Running setup.py install for tesserocr
/usr/local/lib/python3.4/dist-packages/Cython/Distutils/old_build_ext.py:30: UserWarning: Cython.Distutils.old_build_ext does not properly handle dependencies and is deprecated.
"Cython.Distutils.old_build_ext does not properly handle dependencies "
Supporting tesseract v3.03
Building with configs: {'cython_compile_time_env': {'TESSERACT_VERSION': 771}, 'libraries': ['tesseract', 'lept']}
cythoning tesserocr.pyx to tesserocr.cpp
building 'tesserocr' extension
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c tesserocr.cpp -o build/temp.linux-x86_64-3.4/tesserocr.o
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ [enabled by default]
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_14PyPageIterator_20SetBoundingBoxComponents(__pyx_obj_9tesserocr_PyPageIterator*, bool, bool)':
tesserocr.cpp:4610:25: error: 'class tesseract::PageIterator' has no member named 'SetBoundingBoxComponents'
__pyx_v_self->_piter->SetBoundingBoxComponents(__pyx_v_include_upper_dots, __pyx_v_include_lower_dots);
^
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_14PyPageIterator_34GetImage(__pyx_obj_9tesserocr_PyPageIterator*, tesseract::PageIteratorLevel, int, PyObject*)':
tesserocr.cpp:5842:125: error: no matching function for call to 'tesseract::PageIterator::GetImage(tesseract::PageIteratorLevel&, int&, Pix*&, int*, int*)'
__pyx_v_pix = __pyx_v_self->_piter->GetImage(__pyx_v_level, __pyx_v_padding, __pyx_v_opix, (&__pyx_v_left), (&__pyx_v_top));
^
tesserocr.cpp:5842:125: note: candidate is:
In file included from tesserocr.cpp:424:0:
/usr/include/tesseract/pageiterator.h:239:8: note: Pix* tesseract::PageIterator::GetImage(tesseract::PageIteratorLevel, int, int*, int*) const
Pix* GetImage(PageIteratorLevel level, int padding,
^
/usr/include/tesseract/pageiterator.h:239:8: note: candidate expects 4 arguments, 5 provided
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_74AnalyseLayout(__pyx_obj_9tesserocr_PyTessBaseAPI*, bool)':
tesserocr.cpp:16244:83: error: no matching function for call to 'tesseract::TessBaseAPI::AnalyseLayout(bool&)'
__pyx_v_piter = __pyx_v_self->_baseapi.AnalyseLayout(__pyx_v_merge_similar_words);
^
tesserocr.cpp:16244:83: note: candidate is:
In file included from tesserocr.cpp:429:0:
/usr/include/tesseract/baseapi.h:489:17: note: tesseract::PageIterator* tesseract::TessBaseAPI::AnalyseLayout()
PageIterator* AnalyseLayout();
^
/usr/include/tesseract/baseapi.h:489:17: note: candidate expects 0 arguments, 1 provided
tesserocr.cpp: In function 'tesseract::TessResultRenderer* __pyx_f_9tesserocr_13PyTessBaseAPI__get_renderer(__pyx_obj_9tesserocr_PyTessBaseAPI*, __pyx_t_9tesseract_cchar_t*)':
tesserocr.cpp:16588:88: error: no matching function for call to 'tesseract::TessHOcrRenderer::TessHOcrRenderer(__pyx_t_9tesseract_cchar_t*&, bool&)'
__pyx_t_2 = new tesseract::TessHOcrRenderer(__pyx_v_outputbase, __pyx_v_font_info);
^
tesserocr.cpp:16588:88: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:175:3: note: tesseract::TessHOcrRenderer::TessHOcrRenderer()
TessHOcrRenderer();
^
/usr/include/tesseract/renderer.h:175:3: note: candidate expects 0 arguments, 2 provided
/usr/include/tesseract/renderer.h:173:16: note: tesseract::TessHOcrRenderer::TessHOcrRenderer(const tesseract::TessHOcrRenderer&)
class TESS_API TessHOcrRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:173:16: note: candidate expects 1 argument, 2 provided
tesserocr.cpp:16631:106: error: no matching function for call to 'tesseract::TessPDFRenderer::TessPDFRenderer(__pyx_t_9tesseract_cchar_t*&, const char*)'
__pyx_t_3 = new tesseract::TessPDFRenderer(__pyx_v_outputbase, __pyx_v_self->_baseapi.GetDatapath());
^
tesserocr.cpp:16631:106: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:188:3: note: tesseract::TessPDFRenderer::TessPDFRenderer(const char*)
TessPDFRenderer(const char *datadir);
^
/usr/include/tesseract/renderer.h:188:3: note: candidate expects 1 argument, 2 provided
/usr/include/tesseract/renderer.h:186:16: note: tesseract::TessPDFRenderer::TessPDFRenderer(const tesseract::TessPDFRenderer&)
class TESS_API TessPDFRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:186:16: note: candidate expects 1 argument, 2 provided
tesserocr.cpp:16715:69: error: no matching function for call to 'tesseract::TessUnlvRenderer::TessUnlvRenderer(__pyx_t_9tesseract_cchar_t*&)'
__pyx_t_4 = new tesseract::TessUnlvRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:16715:69: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:227:3: note: tesseract::TessUnlvRenderer::TessUnlvRenderer()
TessUnlvRenderer();
^
/usr/include/tesseract/renderer.h:227:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:225:16: note: tesseract::TessUnlvRenderer::TessUnlvRenderer(const tesseract::TessUnlvRenderer&)
class TESS_API TessUnlvRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:225:16: note: no known conversion for argument 1 from '__pyx_t_9tesseract_cchar_t* {aka const char*}' to 'const tesseract::TessUnlvRenderer&'
tesserocr.cpp:16799:72: error: no matching function for call to 'tesseract::TessBoxTextRenderer::TessBoxTextRenderer(__pyx_t_9tesseract_cchar_t*&)'
__pyx_t_5 = new tesseract::TessBoxTextRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:16799:72: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:238:3: note: tesseract::TessBoxTextRenderer::TessBoxTextRenderer()
TessBoxTextRenderer();
^
/usr/include/tesseract/renderer.h:238:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:236:16: note: tesseract::TessBoxTextRenderer::TessBoxTextRenderer(const tesseract::TessBoxTextRenderer&)
class TESS_API TessBoxTextRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:236:16: note: no known conversion for argument 1 from '__pyx_t_9tesseract_cchar_t* {aka const char*}' to 'const tesseract::TessBoxTextRenderer&'
tesserocr.cpp:16883:69: error: no matching function for call to 'tesseract::TessTextRenderer::TessTextRenderer(__pyx_t_9tesseract_cchar_t*&)'
__pyx_t_6 = new tesseract::TessTextRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:16883:69: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:164:3: note: tesseract::TessTextRenderer::TessTextRenderer()
TessTextRenderer();
^
/usr/include/tesseract/renderer.h:164:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:162:16: note: tesseract::TessTextRenderer::TessTextRenderer(const tesseract::TessTextRenderer&)
class TESS_API TessTextRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:162:16: note: no known conversion for argument 1 from '__pyx_t_9tesseract_cchar_t* {aka const char*}' to 'const tesseract::TessTextRenderer&'
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_108IsValidCharacter(__pyx_obj_9tesserocr_PyTessBaseAPI*, PyObject*)':
tesserocr.cpp:19649:60: error: 'class tesseract::TessBaseAPI' has no member named 'IsValidCharacter'
__pyx_t_1 = __Pyx_PyBool_FromLong(__pyx_v_self->_baseapi.IsValidCharacter(__pyx_t_2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2161, __pyx_L1_error)
^
tesserocr.cpp:532:36: note: in definition of macro '__Pyx_PyBool_FromLong'
#define __Pyx_PyBool_FromLong(b) ((b) ? __Pyx_NewRef(Py_True) : __Pyx_NewRef(Py_False))
^
tesserocr.cpp: In function 'PyObject* PyInit_tesserocr()':
tesserocr.cpp:25002:67: error: 'PSM_RAW_LINE' is not a member of 'tesseract'
__pyx_t_2 = __Pyx_PyInt_From_enum__tesseract_3a__3a_PageSegMode(tesseract::PSM_RAW_LINE); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 132, __pyx_L1_error)
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
Complete output from command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/tesserocr/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-cwirkbsc-record/install-record.txt --single-version-externally-managed --compile:
running install
running build
running build_ext
/usr/local/lib/python3.4/dist-packages/Cython/Distutils/old_build_ext.py:30: UserWarning: Cython.Distutils.old_build_ext does not properly handle dependencies and is deprecated.
"Cython.Distutils.old_build_ext does not properly handle dependencies "
Supporting tesseract v3.03
Building with configs: {'cython_compile_time_env': {'TESSERACT_VERSION': 771}, 'libraries': ['tesseract', 'lept']}
cythoning tesserocr.pyx to tesserocr.cpp
building 'tesserocr' extension
creating build
creating build/temp.linux-x86_64-3.4
x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -fPIC -I/usr/include/python3.4m -c tesserocr.cpp -o build/temp.linux-x86_64-3.4/tesserocr.o
cc1plus: warning: command line option '-Wstrict-prototypes' is valid for C/ObjC but not for C++ [enabled by default]
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_14PyPageIterator_20SetBoundingBoxComponents(__pyx_obj_9tesserocr_PyPageIterator*, bool, bool)':
tesserocr.cpp:4610:25: error: 'class tesseract::PageIterator' has no member named 'SetBoundingBoxComponents'
__pyx_v_self->_piter->SetBoundingBoxComponents(__pyx_v_include_upper_dots, __pyx_v_include_lower_dots);
^
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_14PyPageIterator_34GetImage(__pyx_obj_9tesserocr_PyPageIterator*, tesseract::PageIteratorLevel, int, PyObject*)':
tesserocr.cpp:5842:125: error: no matching function for call to 'tesseract::PageIterator::GetImage(tesseract::PageIteratorLevel&, int&, Pix*&, int*, int*)'
__pyx_v_pix = __pyx_v_self->_piter->GetImage(__pyx_v_level, __pyx_v_padding, __pyx_v_opix, (&__pyx_v_left), (&__pyx_v_top));
^
tesserocr.cpp:5842:125: note: candidate is:
In file included from tesserocr.cpp:424:0:
/usr/include/tesseract/pageiterator.h:239:8: note: Pix* tesseract::PageIterator::GetImage(tesseract::PageIteratorLevel, int, int*, int*) const
Pix* GetImage(PageIteratorLevel level, int padding,
^
/usr/include/tesseract/pageiterator.h:239:8: note: candidate expects 4 arguments, 5 provided
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_74AnalyseLayout(__pyx_obj_9tesserocr_PyTessBaseAPI*, bool)':
tesserocr.cpp:16244:83: error: no matching function for call to 'tesseract::TessBaseAPI::AnalyseLayout(bool&)'
__pyx_v_piter = __pyx_v_self->_baseapi.AnalyseLayout(__pyx_v_merge_similar_words);
^
tesserocr.cpp:16244:83: note: candidate is:
In file included from tesserocr.cpp:429:0:
/usr/include/tesseract/baseapi.h:489:17: note: tesseract::PageIterator* tesseract::TessBaseAPI::AnalyseLayout()
PageIterator* AnalyseLayout();
^
/usr/include/tesseract/baseapi.h:489:17: note: candidate expects 0 arguments, 1 provided
tesserocr.cpp: In function 'tesseract::TessResultRenderer* __pyx_f_9tesserocr_13PyTessBaseAPI__get_renderer(__pyx_obj_9tesserocr_PyTessBaseAPI*, __pyx_t_9tesseract_cchar_t*)':
tesserocr.cpp:16588:88: error: no matching function for call to 'tesseract::TessHOcrRenderer::TessHOcrRenderer(__pyx_t_9tesseract_cchar_t*&, bool&)'
__pyx_t_2 = new tesseract::TessHOcrRenderer(__pyx_v_outputbase, __pyx_v_font_info);
^
tesserocr.cpp:16588:88: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:175:3: note: tesseract::TessHOcrRenderer::TessHOcrRenderer()
TessHOcrRenderer();
^
/usr/include/tesseract/renderer.h:175:3: note: candidate expects 0 arguments, 2 provided
/usr/include/tesseract/renderer.h:173:16: note: tesseract::TessHOcrRenderer::TessHOcrRenderer(const tesseract::TessHOcrRenderer&)
class TESS_API TessHOcrRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:173:16: note: candidate expects 1 argument, 2 provided
tesserocr.cpp:16631:106: error: no matching function for call to 'tesseract::TessPDFRenderer::TessPDFRenderer(__pyx_t_9tesseract_cchar_t*&, const char*)'
__pyx_t_3 = new tesseract::TessPDFRenderer(__pyx_v_outputbase, __pyx_v_self->_baseapi.GetDatapath());
^
tesserocr.cpp:16631:106: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:188:3: note: tesseract::TessPDFRenderer::TessPDFRenderer(const char*)
TessPDFRenderer(const char *datadir);
^
/usr/include/tesseract/renderer.h:188:3: note: candidate expects 1 argument, 2 provided
/usr/include/tesseract/renderer.h:186:16: note: tesseract::TessPDFRenderer::TessPDFRenderer(const tesseract::TessPDFRenderer&)
class TESS_API TessPDFRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:186:16: note: candidate expects 1 argument, 2 provided
tesserocr.cpp:16715:69: error: no matching function for call to 'tesseract::TessUnlvRenderer::TessUnlvRenderer(__pyx_t_9tesseract_cchar_t*&)'
__pyx_t_4 = new tesseract::TessUnlvRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:16715:69: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:227:3: note: tesseract::TessUnlvRenderer::TessUnlvRenderer()
TessUnlvRenderer();
^
/usr/include/tesseract/renderer.h:227:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:225:16: note: tesseract::TessUnlvRenderer::TessUnlvRenderer(const tesseract::TessUnlvRenderer&)
class TESS_API TessUnlvRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:225:16: note: no known conversion for argument 1 from '__pyx_t_9tesseract_cchar_t* {aka const char*}' to 'const tesseract::TessUnlvRenderer&'
tesserocr.cpp:16799:72: error: no matching function for call to 'tesseract::TessBoxTextRenderer::TessBoxTextRenderer(__pyx_t_9tesseract_cchar_t*&)'
__pyx_t_5 = new tesseract::TessBoxTextRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:16799:72: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:238:3: note: tesseract::TessBoxTextRenderer::TessBoxTextRenderer()
TessBoxTextRenderer();
^
/usr/include/tesseract/renderer.h:238:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:236:16: note: tesseract::TessBoxTextRenderer::TessBoxTextRenderer(const tesseract::TessBoxTextRenderer&)
class TESS_API TessBoxTextRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:236:16: note: no known conversion for argument 1 from '__pyx_t_9tesseract_cchar_t* {aka const char*}' to 'const tesseract::TessBoxTextRenderer&'
tesserocr.cpp:16883:69: error: no matching function for call to 'tesseract::TessTextRenderer::TessTextRenderer(__pyx_t_9tesseract_cchar_t*&)'
__pyx_t_6 = new tesseract::TessTextRenderer(__pyx_v_outputbase);
^
tesserocr.cpp:16883:69: note: candidates are:
In file included from tesserocr.cpp:427:0:
/usr/include/tesseract/renderer.h:164:3: note: tesseract::TessTextRenderer::TessTextRenderer()
TessTextRenderer();
^
/usr/include/tesseract/renderer.h:164:3: note: candidate expects 0 arguments, 1 provided
/usr/include/tesseract/renderer.h:162:16: note: tesseract::TessTextRenderer::TessTextRenderer(const tesseract::TessTextRenderer&)
class TESS_API TessTextRenderer : public TessResultRenderer {
^
/usr/include/tesseract/renderer.h:162:16: note: no known conversion for argument 1 from '__pyx_t_9tesseract_cchar_t* {aka const char*}' to 'const tesseract::TessTextRenderer&'
tesserocr.cpp: In function 'PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_108IsValidCharacter(__pyx_obj_9tesserocr_PyTessBaseAPI*, PyObject*)':
tesserocr.cpp:19649:60: error: 'class tesseract::TessBaseAPI' has no member named 'IsValidCharacter'
__pyx_t_1 = __Pyx_PyBool_FromLong(__pyx_v_self->_baseapi.IsValidCharacter(__pyx_t_2)); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 2161, __pyx_L1_error)
^
tesserocr.cpp:532:36: note: in definition of macro '__Pyx_PyBool_FromLong'
#define __Pyx_PyBool_FromLong(b) ((b) ? __Pyx_NewRef(Py_True) : __Pyx_NewRef(Py_False))
^
tesserocr.cpp: In function 'PyObject* PyInit_tesserocr()':
tesserocr.cpp:25002:67: error: 'PSM_RAW_LINE' is not a member of 'tesseract'
__pyx_t_2 = __Pyx_PyInt_From_enum__tesseract_3a__3a_PageSegMode(tesseract::PSM_RAW_LINE); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 132, __pyx_L1_error)
^
error: command 'x86_64-linux-gnu-gcc' failed with exit status 1
----------------------------------------
Cleaning up...
Command /usr/bin/python3 -c "import setuptools, tokenize;__file__='/tmp/pip_build_root/tesserocr/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-cwirkbsc-record/install-record.txt --single-version-externally-managed --compile failed with error code 1 in /tmp/pip_build_root/tesserocr
Storing debug log for failure in /root/.pip/pip.log
Hi.
Thanks for your previous messages.
I have just tested with your image file again.
Unfortunately, it says "Segmentation fault(core dumped)" after finding 12 textlines.
Error occured in ocrResult = api.GetUTF8Text()
line.
Seems like I have not installed tesserocr properly.
I am eager to hear from you soon.
Thanks.
Traceback (most recent call last):
File "setup.py", line 147, in <module>
test_suite='tests'
File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
dist.run_commands()
File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib/python2.7/distutils/command/build.py", line 128, in run
self.run_command(cmd_name)
File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib/python2.7/distutils/dist.py", line 970, in run_command
cmd_obj = self.get_command_obj(command)
File "/usr/lib/python2.7/distutils/dist.py", line 846, in get_command_obj
cmd_obj = self.command_obj[command] = klass(self)
File "/home/leonardo/.virtualenvs/pbot/local/lib/python2.7/site-packages/setuptools/__init__.py", line 132, in __init__
_Command.__init__(self, dist)
File "/usr/lib/python2.7/distutils/cmd.py", line 64, in __init__
self.initialize_options()
File "setup.py", line 107, in initialize_options
build_args = package_config()
File "setup.py", line 74, in package_config
config['cython_compile_time_env'] = {'TESSERACT_VERSION': version_to_int(version.strip())}
File "setup.py", line 43, in version_to_int
return int(''.join(version.split('.')), 16)
ValueError: invalid literal for int() with base 16: '30500dev'
Hello!
I have an image which consists of one pixel (1x1). When I try to call:
api.SetImage(image)
all_lines = api.GetComponentImages(RIL.TEXTLINE, True)
The Python process crashes:
*** Error in `/home/peter/Projects/ContentTagging/env/bin/python': munmap_chunk(): invalid pointer: 0x00007f1d296a54b0 ***
======= Backtrace: =========
/usr/lib/libc.so.6(+0x722ab)[0x7f1d322a92ab]
/usr/lib/libc.so.6(+0x7890e)[0x7f1d322af90e]
/home/peter/Projects/ContentTagging/env/lib/python3.6/site-packages/tesserocr.cpython-36m-x86_64-linux-gnu.so(+0x1bb3d)[0x7f1d28448b3d]
/usr/lib/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x12c)[0x7f1d31e309bc]
/usr/lib/libpython3.6m.so.1.0(+0x168bdd)[0x7f1d31e3fbdd]
/usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7f1d31dfbd77]
/usr/lib/libpython3.6m.so.1.0(+0x16853a)[0x7f1d31e3f53a]
/usr/lib/libpython3.6m.so.1.0(+0x168af3)[0x7f1d31e3faf3]
/usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7f1d31dfbd77]
/usr/lib/libpython3.6m.so.1.0(+0x16853a)[0x7f1d31e3f53a]
/usr/lib/libpython3.6m.so.1.0(+0x168af3)[0x7f1d31e3faf3]
/usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7f1d31dfbd77]
/usr/lib/libpython3.6m.so.1.0(+0x16853a)[0x7f1d31e3f53a]
/usr/lib/libpython3.6m.so.1.0(+0x168af3)[0x7f1d31e3faf3]
/usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7f1d31dfbd77]
/usr/lib/libpython3.6m.so.1.0(PyEval_EvalCodeEx+0x277)[0x7f1d31e3ff47]
/usr/lib/libpython3.6m.so.1.0(PyEval_EvalCode+0x1b)[0x7f1d31dfba5b]
/usr/lib/libpython3.6m.so.1.0(+0x11c871)[0x7f1d31df3871]
/usr/lib/libpython3.6m.so.1.0(_PyCFunction_FastCallDict+0x8f)[0x7f1d31e3091f]
/usr/lib/libpython3.6m.so.1.0(+0x168bdd)[0x7f1d31e3fbdd]
/usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7f1d31dfbd77]
/usr/lib/libpython3.6m.so.1.0(+0x167291)[0x7f1d31e3e291]
/usr/lib/libpython3.6m.so.1.0(+0x16878a)[0x7f1d31e3f78a]
/usr/lib/libpython3.6m.so.1.0(+0x168af3)[0x7f1d31e3faf3]
/usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7f1d31dfbd77]
/usr/lib/libpython3.6m.so.1.0(+0x167291)[0x7f1d31e3e291]
/usr/lib/libpython3.6m.so.1.0(+0x16878a)[0x7f1d31e3f78a]
/usr/lib/libpython3.6m.so.1.0(+0x168af3)[0x7f1d31e3faf3]
/usr/lib/libpython3.6m.so.1.0(_PyEval_EvalFrameDefault+0x317)[0x7f1d31dfbd77]
/usr/lib/libpython3.6m.so.1.0(PyEval_EvalCodeEx+0x277)[0x7f1d31e3ff47]
/usr/lib/libpython3.6m.so.1.0(PyEval_EvalCode+0x1b)[0x7f1d31dfba5b]
/usr/lib/libpython3.6m.so.1.0(+0x1eddc2)[0x7f1d31ec4dc2]
/usr/lib/libpython3.6m.so.1.0(PyRun_FileExFlags+0x9d)[0x7f1d31ec762d]
/usr/lib/libpython3.6m.so.1.0(PyRun_SimpleFileExFlags+0x1a7)[0x7f1d31ec7817]
/usr/lib/libpython3.6m.so.1.0(Py_Main+0x6b1)[0x7f1d31ebc6f1]
/home/peter/Projects/ContentTagging/env/bin/python(main+0xfd)[0x400a5d]
/usr/lib/libc.so.6(__libc_start_main+0xf1)[0x7f1d32257511]
/home/peter/Projects/ContentTagging/env/bin/python(_start+0x2a)[0x400b9a]
I am using version 2.1.3
with Python 3.6.0
If you do not have installed tessdata or TESSDATA_PREFIX is wrong, Python with tesserocr segfaults. I think it should rather fail with some error.
The command line tesseract handles this gracefully with error:
Error opening data file /invalid/tessdata/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Could not initialize tesseract.
To reproduce this, just set environment variable TESSDATA_PREFIX to non existing directory.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.