Coder Social home page Coder Social logo

Comments (7)

jbarlow83 avatar jbarlow83 commented on June 3, 2024 3

@Mark-Joy Thanks, it looks like it's actually just changing -1 to

-text.text_transform(Matrix(1, 0, 0, 1, box.llx, 0))
+text.text_transform(Matrix(1, 0, 0, -1, box.llx, 0))

Translation: the invisible font is upside down, and some PDF viewers freak out. 🙃

But I need to test it everywhere first, since the first one slipped through.

v16.0.2 temporarily makes sandwich the main renderer again.

from ocrmypdf.

Mark-Joy avatar Mark-Joy commented on June 3, 2024 1

A quick fix shall be:

Change:

bottom_left_corner = line_box.llx, line_box.ury

to bottom_left_corner = line_box.llx, line_box.lly

And:

fontsize = line_box_height + intercept

to fontsize = line_box_height - intercept

from ocrmypdf.

jbarlow83 avatar jbarlow83 commented on June 3, 2024 1

16.0.2 is a temporary fix - I'll close this issue when there's a full solution and the new renderer can be reinstated as default.

The sandwich renderer (default in 16.0.2 and <16) has a number of issues like wordsegmentationproblemsinsomecases and registration (aligning selected text to actual). The new renderer, which is mostly a reimplementation of sandwich to fix sandwich's issues, doesn't work universally yet, but does have significant improvements.

from ocrmypdf.

Rosti2022 avatar Rosti2022 commented on June 3, 2024

Thanks for 16.0.2 - I can confirm that I am witnessing the same issue with my PDFs, as described in #1214. The Persian text issues persist in 16.0.2 unfortunately.

from ocrmypdf.

advert665 avatar advert665 commented on June 3, 2024

Thanks all -- v16.0.2 fixes the issue for me, very pleased!

from ocrmypdf.

Pugio avatar Pugio commented on June 3, 2024

Thanks for the fix. Can confirm that this bug caused several hours of head scratching today before I saw this issue. Really appreciate all the great work!

EDIT: I was also seeing a huge number of \ufeff characters in the output. Moving from 16.0.0 -> 16.0.2 also fixed this.

from ocrmypdf.

jbarlow83 avatar jbarlow83 commented on June 3, 2024

v16.0.4 should contain significant fixes when running in --output-type hocr mode. It has not made been made the default.

from ocrmypdf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.