Coder Social home page Coder Social logo

pero-ocr's Issues

training model

Hello again, just wondering where I can find the code that can be used to train a handwritten text recognition model.
I only find in this repository code which can be used to score an existing image, not for training a model.

Music pull request feedback

@vlachvojta :

  • Layout_parser: Line categories (LINE_CATEGORIES)
  • Layout_parser: Filter output categories (CATEGORIES)
  • decoder filter categories
  • music_dictionary -> output_substitution_table (change order of key, value)
  • Add minimalistic CLI for export music to user_scripts
  • Render with categories box (just name of category)
  • Normalize category names in render using code from pero/unicode_normalization
  • OCR Engine get line confidence
  • Test new YOLO model
  • Check all changed logging statements
  • Test OCR with old configs
  • check backward compatibility of custom tag in page-xml
  • Add Atomic option to output substitution + add setting options to config

@vlachvojta with @ikiss-fit :

  • Check API and web compatibility (after adding line confidence in the OCR engine)

Failed line cropping in page_parser

Line crop fails. Job saved at /mnt/matylda1/hradis/PERO/BUGS/a9ccd42b-9b26-40ae-9c3b-6e4d26c21ee0

Processing 4/24 (16.67 %) [id: b0a89e97-5c8a-4511-94db-7fed583bcba9]
Traceback (most recent call last):
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/user_scripts/parse_folder.py", line 172, in
main()
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/user_scripts/parse_folder.py", line 150, in main
page_layout = page_parser.process_page(image, page_layout)
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/pero_ocr/document_ocr/page_parser.py", line 256, in process_page
page_layout = self.line_cropper.process_page(image, page_layout)
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/pero_ocr/document_ocr/page_parser.py", line 201, in process_page
line.crop = self.crop_engine.crop(img, line.baseline, line.heights)
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/pero_ocr/document_ocr/crop_engine.py", line 70, in crop
line_crop = cv2.remap(img_crop, coords[:, :, 0], coords[:, :, 1], interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_TRANSPARENT)
cv2.error: OpenCV(4.0.0) /io/opencv/modules/imgproc/src/imgwarp.cpp:666: error: (-215:Assertion failed) !ssize.empty() in function 'remapBilinear'

.

.

ALTO export BUG

Export fails when text line has no points?

For exmple document c1951833-8440-4851-93b5-6dfc6c3663bf, second page fe55b56c-341e-48d3-82ac-e3a971a0a124.

Error:
Aug 31 07:59:00 pero-ocr gunicorn[12175]: Traceback (most recent call last):
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/flask/app.py", line 2447, in wsgi_app
Aug 31 07:59:00 pero-ocr gunicorn[12175]: response = self.full_dispatch_request()
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/flask/app.py", line 1952, in full_dispatch_request
Aug 31 07:59:00 pero-ocr gunicorn[12175]: rv = self.handle_user_exception(e)
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/flask/app.py", line 1821, in handle_user_exception
Aug 31 07:59:00 pero-ocr gunicorn[12175]: reraise(exc_type, exc_value, tb)
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/flask/_compat.py", line 39, in reraise
Aug 31 07:59:00 pero-ocr gunicorn[12175]: raise value
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/flask/app.py", line 1950, in full_dispatch_request
Aug 31 07:59:00 pero-ocr gunicorn[12175]: rv = self.dispatch_request()
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/flask/app.py", line 1936, in dispatch_request
Aug 31 07:59:00 pero-ocr gunicorn[12175]: return self.view_functionsrule.endpoint
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/flask_login/utils.py", line 272, in decorated_view
Aug 31 07:59:00 pero-ocr gunicorn[12175]: return func(*args, **kwargs)
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/pero/pero_ocr_web/app/document/routes.py", line 185, in get_alto_xml
Aug 31 07:59:00 pero-ocr gunicorn[12175]: return create_string_response(filename, page_layout.to_altoxml_string(), minetype='text/xml')
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/pero/pero-ocr/pero_ocr/document_ocr/layout.py", line 335, in to_altoxml_string
Aug 31 07:59:00 pero-ocr gunicorn[12175]: string.set("HEIGHT", str(int((np.max(all_y) - np.min(all_y)))))
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "<array_function internals>", line 6, in amax
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 2668, in amax
Aug 31 07:59:00 pero-ocr gunicorn[12175]: keepdims=keepdims, initial=initial, where=where)
Aug 31 07:59:00 pero-ocr gunicorn[12175]: File "/home/pero/env/pero-ocr/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 90, in _wrapreduction
Aug 31 07:59:00 pero-ocr gunicorn[12175]: return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Aug 31 07:59:00 pero-ocr gunicorn[12175]: ValueError: zero-size array to reduction operation maximum which has no identity

Page processing fail in line detection

Processing 20/25 (80.00 %) [id: 371eaaf3-a3e7-45c9-8410-0e0f9ac872da]
Traceback (most recent call last):
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/user_scripts/parse_folder.py", line 172, in
main()
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/user_scripts/parse_folder.py", line 150, in main
page_layout = page_parser.process_page(image, page_layout)
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/pero_ocr/document_ocr/page_parser.py", line 246, in process_page
page_layout = self.line_parser.process_page(image, page_layout)
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/pero_ocr/document_ocr/page_parser.py", line 129, in process_page
region = self.assign_lines_to_region(baseline_list, heights_list, textline_list, region)
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/pero_ocr/document_ocr/page_parser.py", line 115, in assign_lines_to_region
baseline_intersection, textline_intersection = linepp.mask_textline_by_region(baseline, textline, region.polygon)
File "/home/ihradis/projects/2018-01-15_PERO/pero-ocr-live/pero_ocr/line_engine/line_postprocessing.py", line 179, in mask_textline_by_region
baseline_is = region_shpl.intersection(baseline_shpl)
File "/home/ihradis/env/tf/lib/python3.6/site-packages/shapely/geometry/base.py", line 620, in intersection
return geom_factory(self.impl['intersection'](self, other))
File "/home/ihradis/env/tf/lib/python3.6/site-packages/shapely/topology.py", line 70, in call
self._check_topology(err, this, other)
File "/home/ihradis/env/tf/lib/python3.6/site-packages/shapely/topology.py", line 38, in _check_topology
self.fn.name, repr(geom)))
shapely.errors.TopologicalError: The operation 'GEOSIntersection_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7f57dc052be0>

Transcription

For old latin transcription, which model should i select to generate the OCR of the below image please?
image

Problem with the pretrained model not available

File "/usr/local/lib/python3.9/dist-packages/torch/jit/_serialization.py", line 149, in load
raise ValueError(f"The provided filename {f} does not exist") # type: ignore[str-bytes-safe]
ValueError: The provided filename /opt/pero/pero-ocr/ocr_model/checkpoint_646000.ckpt does not exist

Can't install through pip

Hi, I'm trying to use this repository in a college project, but I'm can't seem to do pip install pero-ocr.

I'm getting the following error

The conflict is caused by:
    pero-ocr 0.5 depends on tensorflow-gpu==1.15
    pero-ocr 0.4 depends on tensorflow-gpu==1.15
    pero-ocr 0.3 depends on tensorflow-gpu==1.15
    pero-ocr 0.2 depends on tensorflow-gpu==1.14
    pero-ocr 0.1.1 depends on tensorflow-gpu==1.14

But when trying to install that version of tensorflow-gpu, I can't seem to get a valid version.

Thank you.

Website typo Layout Analysis

I suppose website related issues can also be mentioned here.

I noticed a typo for selecting the layout analysis.
Shouldn't Select baseline detector be Select layout detector?


Capture

Add region categories

Internal export: (pseudo PageXML)

  • All regions are RegionLayout with category attribute (saved to XML as TextRegion element with category in custom attribute)
  • Set OCR/OMR Engines to work only with some types of lines
  • Set Layout Engines to work only with some types of regions
    Merging overlapping regions. (Text layout engine which detects region/line inside of other region, adds its lines the given region. Using geometry and coords to determine if some region/line is inside of some region) - not usefull feature

Line crop fails probably due empty mapping

Error log:
line_coords = self.get_crop_inputs(baseline, height, self.line_height)
Traceback (most recent call last):
File "/home/pero/PERO/pero-ocr/user_scripts/parse_folder.py", line 176, in main
page_layout = page_parser.process_page(image, page_layout)
File "/home/pero/PERO/pero-ocr/pero_ocr/document_ocr/page_parser.py", line 408, in process_page
page_layout = self.line_cropper.process_page(image, page_layout)
File "/home/pero/PERO/pero-ocr/pero_ocr/document_ocr/page_parser.py", line 348, in process_page
line.crop = self.crop_engine.crop(img, line.baseline, line.heights)
File "/home/pero/PERO/pero-ocr/pero_ocr/document_ocr/crop_engine.py", line 78, in crop
interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)
cv2.error: OpenCV(4.2.0) /io/opencv/modules/imgproc/src/imgwarp.cpp:1703: error: (-215:Assertion failed) !_map1.empty() in function 'remap'

XML headers

As mentioned in issue #49, Pero generates ALTO files without proper XML headers (<?xml version='1.0' encoding='utf-8'?>). Was that intended, or could that be fixed?

problem of numpy version

Hello, when running the Integration of the pero-ocr python module, I encountered a problem with the numpy version, the error showed:

AttributeError: module 'numpy' has no attribute 'float'.
np.float was a deprecated alias for the builtin float. To avoid this error in existing code, use float by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use np.float64 here.
The aliases was originally deprecated in NumPy 1.20; for more details and guidance see the original release note at:
https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations

If I want to lower the numpy version, scipy, numba, etc. also need to lower the version for compatibility, but many lower versions cannot be installed on my computer. What suggestions do you have? Thanks in advance!

Getting KeyError

I was trying the pero-ocr on a png image with table and text but got the error below. Please, how do I resolve this?

Screenshot 2023-09-07 at 2 42 19 PM

.

.

Clustering layout probably fails on pages/regions with no lines?

Data in BUGS/a69eb9c4-ae17-4429-aa70-c636ee0051b0
log:
ERROR: Failed to process file 9d24471a-280b-4e2b-a175-d65910c7c548.
need at least one array to concatenate
Traceback (most recent call last):
File "/home/pero/PERO/pero-ocr/user_scripts/parse_folder.py", line 176, in main
page_layout = page_parser.process_page(image, page_layout)
File "/home/pero/PERO/pero-ocr/pero_ocr/document_ocr/page_parser.py", line 404, in process_page
page_layout = self.layout_parser.process_page(image, page_layout)
File "/home/pero/PERO/pero-ocr/pero_ocr/document_ocr/page_parser.py", line 141, in process_page
polygons_list, baselines_list, heights_list, textlines_list = self.region_engine.detect(img)
File "/home/pero/PERO/pero-ocr/pero_ocr/region_engine/region_engine_splic.py", line 65, in detect
region_poly_points = np.concatenate(region_textlines, axis=0)
File "<array_function internals>", line 6, in concatenate
ValueError: need at least one array to concatenate

Where does model for region detector place?

I run script with layout detection.
In the class EngineRegionDetector
It has error
Cannot interpret feed_dict key as Tensor: The name 'inference_input:0' refers to a Tensor which does not exist. The operation, 'inference_input', does not exist in the graph. in line 75

OMR transformers produce nonsense transcriptions

  • Could be due to different input size
  1. Test if OCR Transformers work.
  2. Train OCR Transformer with different input size and test it.
  3. Re-check network input.
  4. If 2 works and 3 is not conclusive, re-train OMR models.

Layout analysis crashes

Crashed on two files in my new collection. Problem in live system.

Job ID: fb48773658124afab23ac9854ea5e56d
Document ID: 1e4d33dc189c4a2bb93eaebf722432e4
Image: 9823218f-12c1-4ede-ba68-897e055e5580
Errors:
Processing 9823218f-12c1-4ede-ba68-897e055e5580
ERROR: Failed to process file 9823218f-12c1-4ede-ba68-897e055e5580.
The operation 'GEOSUnion_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7f249c0cd050>

7Traceback (most recent call last):
File "/home/pero/pero/pero-ocr/user_scripts/parse_folder.py", line 205, in main
page_layout = page_parser.process_page(image, page_layout)
File "/home/pero/pero/pero-ocr/pero_ocr/document_ocr/page_parser.py", line 372, in process_page
page_layout = layout_parser.process_page(image, page_layout)
File "/home/pero/pero/pero-ocr/pero_ocr/document_ocr/page_parser.py", line 169, in process_page
p_list, b_list, h_list, t_list = self.engine.detect(img, rot=rot)
File "/home/pero/pero/pero-ocr/pero_ocr/layout_engines/cnn_layout_engine.py", line 127, in detect
region_poly = helpers.region_from_textlines(region_textlines)
File "/home/pero/pero/pero-ocr/pero_ocr/layout_engines/layout_helpers.py", line 100, in region_from_textlines
region_poly = region_poly.union(textline_poly)
File "/home/pero/python_environment/pero_ocr_web_clients/lib/python3.7/site-packages/shapely/geometry/base.py", line 658, in union
return geom_factory(self.impl['union'](self, other))
File "/home/pero/python_environment/pero_ocr_web_clients/lib/python3.7/site-packages/shapely/topology.py", line 70, in call
self._check_topology(err, this, other)
File "/home/pero/python_environment/pero_ocr_web_clients/lib/python3.7/site-packages/shapely/topology.py", line 38, in _check_topology
self.fn.name, repr(geom)))
shapely.errors.TopologicalError: The operation 'GEOSUnion_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.polygon.Polygon object at 0x7f249c0cd050>
TopologyException: Input geom 1 is invalid: Self-intersection at or near point 2347.0777238895662 -44.069123013668701 at 2347.0777238895662 -44.069123013668701

Website: correct textlines

We can correct the layout model (text regions) and the OCR.
Isn't there also a need to be able to correct the text lines?

I understand that this is difficult as text line detection is done together with OCR'ing and I will now use Transkribus to correct the text lines as a post-correction.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.