Coder Social home page Coder Social logo

Comments (29)

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024 1

This problem should soon be resolved by the way, as I have moved archive-hocr-tools away from lxml entirely. So it simply won't be required anymore. See internetarchive/archive-hocr-tools#5

With the next release of archive-pdf-tools, I'll require that specific version of archive-hocr-tools (or higher), and then we can close this issue.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

A few things:

  1. I don't think you need to get leptonica, openjpeg, libxml, libxslt, jbig2enc for basic functionality (Pillow can compress JPEG2000 and the wheel comes with it I guess, leptonica is only for jbig2, libxml/libxslt I think will just come with pip for python)
  2. The pip3 install ... line seems to attempt to build archive-pdf-tools rather than just install the binary, I think it might be because you're on arm64. I don't know if we already build a binary for that.

For completeness sake, can you share with me your OS and Python version? (Seems to be Python 3.11, but would like to check)

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

Also, searching for cython #include "longintrepr.h" clang on google seems to suggest this is an error that happens for many Python packages on macOS/clang, so there might be some hints there. Let me see what I can get done in the next few days.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

It looks like just upgrading to a newer Cython version will solve the problem, but I will still need to see if I can make the CI build these releases.

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

I will see if I can take care of it on my end with upgrades etc, I'll try a bit harder, and get back to you. Thanks for the attention!

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

I may have just gotten this to install on a different Mac that has the latest OS -- it turns out my Mac that I was having trouble on is still on MacOS 12 instead of 13.

I'm not sure if other things that are significant may differ between them too. I need to spend more time with it. But it may be a common python issue and not something special to this code, indeed, I'm not really sure.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

I made a test branch for arm wheels for mac. Can you download the artifact.zip from here and try it?

https://github.com/internetarchive/archive-pdf-tools/actions/runs/4602441350

$ ls | grep arm | grep mac
archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl
archive_pdf_tools-1.5.3-cp38-cp38-macosx_11_0_arm64.whl
archive_pdf_tools-1.5.3-cp39-cp39-macosx_11_0_arm64.whl

I can also build for other mac os x versions if 11 is not the right one. I was building for macOS-10.15 before.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

https://github.com/internetarchive/archive-pdf-tools/actions/runs/4608630138/jobs/8144663900

this one contains wheels for mac OS 10.15 and 12 as well.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

(Doesn't look like the macos 12 wheels actually made it)

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

Thank you! I'm sorry if I'm sending you on a distraction here, this may not be a priority. Some things:

  • MacOS 13 is the latest MacOS
  • I have two laptops, one with MacOS 12 and one with MacOS 13
  • pip install archive-pdf-tools on my MacOS 13 laptop did appear to succeed, it turns out. When I filed this ticket I hadn't tried yet, and didn't realize my main laptop wasn't the latest MacOS 13.
  • pip install archive-pdf-tools on my MacOS 12 laptop did not, as above (This is not the newest OS)
  • I am not very experienced at python, I may not have set up either/both of these machines properly for python, or there may be other differences between them than just MacOS version

Since I have demonstrated it installing succesfully on one MacOS laptop, I'm inclined to think the problem might be mine,, not yours. Although there may be things you can do to make it install more reliably, I'm no expert here.

Yeah, I am not able to find the artifact.zip on that Github Actions build page -- I'm not sure I"d know what to do with it even if i did. Not very python-comfortable here. If you'd like to me to test a build artifact, and it's not totally obvious how, please provide instructions -- but I'm wondering if this is actually my problem not yours.

I need to find more time to update my laptop to MacOS 13 (it's not old on purpose), and re-install dependencies etc, and maybe make sure I am setting up python in a best-practices way on MacOS, and see what happens.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

It looks like the zip from the action is not visible for others, please find it attached in this message.

artifact.zip

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

Other than that, there are a few things to mention:

  1. You can install the wheel files in the zip like this: pip install --force-reinstall -U /path/to/wheel
  2. pip install pkgnamehere will try to fetch an online binary package typically, based on your OS and architecture, and if no binary package exists, it will fall back to try to building from source. Before you made this issue, I was not building any arm64 macOS wheels (aka Apple Silicon), and I still haven't uploaded these wheels to the place that pip gets them from.

if you can verify if these wheels work for macOS, then I can upload them to pypi (where pip gets them from).

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

The problem you were encountering before was definitely caused by an older Cython version, which I have raised in a separate branch where I am trying to build these wheels. When I know that the wheels work, I can merge those changes to the master branch, and include them in a new release.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

For testing various versions, you could also consider setting up a virtualenv: https://docs.python.org/3/library/venv.html

it might be easier.

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

Hi! Trying to spend more time on this to give you feedback!

OK, I am now using venv.

I'm sorry I'm new to python, so not totally sure how to test what you'd like me to test. Thank you for your tips earlier.

You can install the wheel files in the zip like this: pip install --force-reinstall -U /path/to/wheel

I have unzipped artifact.zip... I get a bunch of .whl files. I am supposed to manually identify which one is appropriate to my system?

I have an M1 Pro MacBook, I am running MacOS 12.6.3 (note this is still not the latest MacOS, the latest is MacOS 13).

If I understand the conventional naming right, it looks like you have wheels in the artifact.zip for macosx_10_9 and macosx_11_0 -- if those numbers are version numbers, neither of those are me, but maybe I'll try the most recent one, so _11_0?

I believe my M1 Pro Macbook is arm64 rather than x86_64. I still see three candidates, I don't know how to choose from, what's the difference between cp39, cp38, and cp310?

  • archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl
  • archive_pdf_tools-1.5.3-cp38-cp38-macosx_11_0_arm64.whl
  • archive_pdf_tools-1.5.3-cp39-cp39-macosx_11_0_arm64.whl

On the the theory that bigger is better, maybe I'll try 310. So in an activated venv:

pip install --force-reinstall -U wheel_artifact/archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl
ERROR: archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl is not a supported wheel on this platform.

OK, not that one. Try the other two? Nope, same result.

Maybe the problem is that I'm on MacOS 12? Sorry I'm really flying by touch here, I don't know what I'm doing. If you want me to choose a different .whl file, just let me know which one and I can try it!

I'm still not convinced there is necessarily anything wrong you had to fix, the problem might have been my system from the start?

Now that I am more intentional about exactly what version of python I am using (3.11.2) and I'm using a venv, let me try an official release again:

pip install archive-pdf-tools

Hm, alas that one still failed, on I think the same error, #include "longintrepr.h".

I did get the install to work on my personal MacBook though -- which was on MacOS 13. I wonder if I upgraded this laptop to MacOS 13 if it would just work. (Sorry I will not be downgrading to MacOS 11 or 10!). Or if there's something else that differs between this laptop and my personal one where I think pip install archive-pdf-tools worked. I'm sorry, I don't have time this week to try every possible combination of everything (or to upgrade my laptop this week), but I can try a few more things if you'd like!

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

If you can tell me the Python version you are using on MacOS 12? python --version will tell you. the cp3x corresponds to the CPython version.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

Ah, sorry, I just saw that you told me what version you are using. I don't think I build wheels for 3.11 yet, let me see if I can do that.

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

Thanks! This is all just me trying out demos, please know that whatever python version I am using is just what I happen to be using right now to try things out, it's not a commitment to using it forever or what have you!

That was just me installing the "latest" python because I had to pick one and it seemed like a good idea?

If you really need to build a wheel for every possible version of python (combined with OS etc!), that seems pretty untenable!

I can also go back to python 3.10 for the purpose of testing if it's easier for you! I don't totally underestand what we are testing! I didn't pick 3.11 with intention, i just installed the latest version thinking that was the thing to do!

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

Yeah, it gets a little tedious, but 3.7 - 3.11 is not too bad. I'm almost ready to get a build for 3.11, but unfortunately a bug in lxml has had me pin specific versions on lxml for archive-hocr-tools and these are not available for 3.11, so I need to figure out how to make this work.

If you could give it a try on 3.10 if that is not too much work, that would be great. You would use archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl with that.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

(btw, I am running on the assumption that macosx_11_0 would work on 11.0+)

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

OK, thanks!

I had to start over in a new directory with a new venv, cause I didn't know how else to do it (not sure if there is another way to do it!)

Then... it seems to have installed!

It did warn:

DEPRECATION: lxml is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at pypa/pip#8559

It installed enough to run recode_pdf --version and get 1.5.3 anyway! (Took almost 3 seconds for it to be able to print the version number, I guess it just had to load a lot of code first, and this is expected).

MacOS 12.6.3, Python 3.10.10, Apple M1 Pro chip, archive_pdf_tools-1.5.3-cp310-cp310-macosx_11_0_arm64.whl

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

Great news, thanks for testing the MacOS ARM64 version on Python 3.10

I didn't raise the version further, so getting 1.5.3 makes sense for this test. I have tried to build a version for Python 3.11 here, but it doesn't depend on archive-hocr-tools, so you will have to install that with pip manually, if you'd be up for another test.

artifact.zip

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

(From the above archive, you'd need archive_pdf_tools-1.5.3-cp311-cp311-macosx_11_0_arm64.whl)

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

Okay! in a venv using python 3.11.2. Still on a M1 Pro MacBook running MacOS 12.6.3.

pip install archive-hocr-tools
pip install --force-reinstall -U wheel_artifacts/archive_pdf_tools-1.5.3-cp311-cp311-macosx_11_0_arm64.whl

I'm afraid that did not install.

console output
      clang -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.sdk -DCYTHON_CLINE_IN_TRACEBACK=0 -Isrc -Isrc/lxml/includes -I/Users/jrochkind/code/archive-pdf-tools-311/env/include -I/opt/homebrew/opt/[email protected]/Frameworks/Python.framework/Versions/3.11/include/python3.11 -c src/lxml/etree.c -o build/temp.macosx-12-arm64-cpython-311/src/lxml/etree.o -w -flat_namespace
      src/lxml/etree.c:261877:23: error: no member named 'exc_type' in 'struct _err_stackitem'
          while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) &&
                  ~~~~~~~~  ^
      src/lxml/etree.c:261877:53: error: no member named 'exc_type' in 'struct _err_stackitem'
          while ((exc_info->exc_type == NULL || exc_info->exc_type == Py_None) &&
                                                ~~~~~~~~  ^
      src/lxml/etree.c:261891:23: error: no member named 'exc_type' in 'struct _err_stackitem'
          *type = exc_info->exc_type;
                  ~~~~~~~~  ^
      src/lxml/etree.c:261893:21: error: no member named 'exc_traceback' in 'struct _err_stackitem'
          *tb = exc_info->exc_traceback;
                ~~~~~~~~  ^
      src/lxml/etree.c:261907:26: error: no member named 'exc_type' in 'struct _err_stackitem'
          tmp_type = exc_info->exc_type;
                     ~~~~~~~~  ^
      src/lxml/etree.c:261909:24: error: no member named 'exc_traceback' in 'struct _err_stackitem'
          tmp_tb = exc_info->exc_traceback;
                   ~~~~~~~~  ^
      src/lxml/etree.c:261910:15: error: no member named 'exc_type' in 'struct _err_stackitem'
          exc_info->exc_type = type;
          ~~~~~~~~  ^
      src/lxml/etree.c:261912:15: error: no member named 'exc_traceback' in 'struct _err_stackitem'
          exc_info->exc_traceback = tb;
          ~~~~~~~~  ^
      src/lxml/etree.c:261994:30: error: no member named 'exc_type' in 'struct _err_stackitem'
              tmp_type = exc_info->exc_type;
                         ~~~~~~~~  ^
      src/lxml/etree.c:261996:28: error: no member named 'exc_traceback' in 'struct _err_stackitem'
              tmp_tb = exc_info->exc_traceback;
                       ~~~~~~~~  ^
      src/lxml/etree.c:261997:19: error: no member named 'exc_type' in 'struct _err_stackitem'
              exc_info->exc_type = local_type;
              ~~~~~~~~  ^
      src/lxml/etree.c:261999:19: error: no member named 'exc_traceback' in 'struct _err_stackitem'
              exc_info->exc_traceback = local_tb;
              ~~~~~~~~  ^
      src/lxml/etree.c:262185:26: error: no member named 'exc_type' in 'struct _err_stackitem'
          tmp_type = exc_info->exc_type;
                     ~~~~~~~~  ^
      src/lxml/etree.c:262187:24: error: no member named 'exc_traceback' in 'struct _err_stackitem'
          tmp_tb = exc_info->exc_traceback;
                   ~~~~~~~~  ^
      src/lxml/etree.c:262188:15: error: no member named 'exc_type' in 'struct _err_stackitem'
          exc_info->exc_type = *type;
          ~~~~~~~~  ^
      src/lxml/etree.c:262190:15: error: no member named 'exc_traceback' in 'struct _err_stackitem'
          exc_info->exc_traceback = *tb;
          ~~~~~~~~  ^
      src/lxml/etree.c:264391:20: error: no member named 'exc_type' in 'struct _err_stackitem'
          t = exc_state->exc_type;
              ~~~~~~~~~  ^
      src/lxml/etree.c:264393:21: error: no member named 'exc_traceback' in 'struct _err_stackitem'
          tb = exc_state->exc_traceback;
               ~~~~~~~~~  ^
      src/lxml/etree.c:264394:16: error: no member named 'exc_type' in 'struct _err_stackitem'
          exc_state->exc_type = NULL;
          ~~~~~~~~~  ^
      fatal error: too many errors emitted, stopping now [-ferror-limit=]
      20 errors generated.
      Compile failed: command '/usr/bin/clang' failed with exit code 1
      creating var
      creating var/folders
      creating var/folders/_1
      creating var/folders/_1/89lqv5550mx2tggl22z27_p18516fz
      creating var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T
      cc -I/usr/include/libxml2 -c /var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInit6_cfwx5a.c -o var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInit6_cfwx5a.o
      cc var/folders/_1/89lqv5550mx2tggl22z27_p18516fz/T/xmlXPathInit6_cfwx5a.o -lxml2 -o a.out
      error: command '/usr/bin/clang' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> lxml

(I think what I'm learning is it's best not to use the very latest python release maybe?)

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

Thanks for testing. I will get Python 3.11.x installed on my laptop and see if with the latest lxml the grave bugs I was seeing are gone. If that is the case, then we increase the requirement for hocr tools and then we should be all set for 3.11.x.

Support wise, it's probably also a matter of this project not having that many users on Python 3.11. :)

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

I just checked, with lxml 4.9.2 the bug is still there: https://bugs.launchpad.net/lxml/+bug/1970741 - I'll see what I can do, but meanwhile, yeah, probably better to use 3.10.

from archive-pdf-tools.

jrochkind avatar jrochkind commented on June 11, 2024

Hm, bug reported to lxml a year ago, there doesn't seem to be anyone in a hurry to fix it.

The bug on lxml doesn't mention python 3.11 specifically... is the issue that in order to use lxml on python 3.11, you need to use a newer version of lxml that exhibits the bug, while on 3.10 you can use an older version of lxml that does not?

This is a bit irritating indeed!

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

That's right, there does not seem to be a lxml 4.6.5 Python 3.11, and all the new ones are broken currently.

from archive-pdf-tools.

MerlijnWajer avatar MerlijnWajer commented on June 11, 2024

The latest 1.4.x branch and master now ought to work with Python 3.11 as well. Please give it go if you can.

from archive-pdf-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.