Coder Social home page Coder Social logo

Cannot infer top-level imports from packages installed by PDM w/`install.cache=true` (aka. symlinked packages) about importlib_metadata HOT 10 CLOSED

jherland avatar jherland commented on June 16, 2024
Cannot infer top-level imports from packages installed by PDM w/`install.cache=true` (aka. symlinked packages)

from importlib_metadata.

Comments (10)

jaraco avatar jaraco commented on June 16, 2024 1

Same with pip install of colorama - no top_level.txt.

Now I understand better. colorama isn't built with the top_level.txt metadata, so importlib metadata falls back to inferring from the RECORD file, which doesn't work in a install.cache install by PDM.

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024 1

I have confirmed that by adding a top_level.txt does allow importlib metadata to resolve the package:

<user>@7be185f19e0f / # cat > .venv/lib/python3.11/site-packages/colorama-0.4.6.dist-info/top_level.txt
colorama
<user>@7be185f19e0f / # py -c "import importlib.metadata as md; print(md.packages_distributions()['colorama'])"
['colorama']

So, pdm could support this case by generating that top_level.txt from the linked package's RECORD (similar to what importlib metadata does.

I've also observed that pdm does in fact include the top_level.txt in the installed package's metadata if it was present in the wheel.

I see three options:

  • importlib metadata doesn't support this form of installation at all and it would be up to PDM to generate the top_level.txt for packages that don't supply it.
  • importlib metadata could infer that any symlink in RECORD is also importable and thus infer those top-level names as it does for regular files and directories.
  • importlib metadata could add first-class support for PDM's technique of installation and use the RECORD from the linked package for resolving files (and thus top-level names).

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024 1

Fixed in #453 and released as v6.7.0.

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024

The packaging spec is weak about RECORD and top-level.txt. For that matter, it's probably also weak about the relationship between installed assets and linked assets.

My first instinct is that PDM should supply a top-level.txt, so that importlib metadata doesn't have to infer it. Is that feasible?

I also wonder if the inference shouldn't be updated to honor just a directory. That is, the name colorama in the RECORD does suggest a package named colorama, even if it doesn't contain any additional files, but only if it's a directory. Unfortunately, RECORD doesn't indicate if it's a directory.

I'm reluctant to go further into the inference by inspecting the environment because that process doesn't generalize well. Although PackagePath.locate and thus Distribution.locate_file return paths on the file system, they're not generally the canonical location of the path (which could be virtual, such as in-memory or in the cloud). I want to be cautious about adding more logic that relies on the assumption that a package is found on the local disk.

from importlib_metadata.

jherland avatar jherland commented on June 16, 2024

My first instinct is that PDM should supply a top-level.txt, so that importlib metadata doesn't have to infer it. Is that feasible?

I don't disagree, but since I'm developing a tool that is run in the context of a user's project that happens to be PDM-managed, I'm not really in a position to change how the user manages their project. Also, Isn't the top_level.txt generated at wheel build time? As such, I'm not sure PDM can be expected to manufacture a top_level.txt when installing a wheel that was built elsewhere and downloaded via PyPI?

Unfortunately, RECORD doesn't indicate if it's a directory.

True, although if it were a regular file, it should have an associated hash (and size). Quoting https://packaging.python.org/en/latest/specifications/binary-distribution-format/#the-dist-info-directory:
"every file except RECORD, which cannot contain a hash of itself, must include its hash".

Is it too far-fetched to deduce from this that a RECORD entry without a hash (and size) (except RECORD itself) therefore must point to a directory (or more generally: something that might turn out to be importable)?

I want to be cautious about adding more logic that relies on the assumption that a package is found on the local disk.

Understood.

Do you consider PDM's behavior here a bug, and would it be appropriate to raise an issue with PDM about this? I suspect it may be hard to argue this given that Python itself apparently has no problem importing packages that are installed this way.

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024

I tried replicating the environment, but I'm running up against an issue with pdm. It seems to get confused if there's more than one Python interpreter on the system:

 draft $ docker build --no-cache .
[+] Building 5.6s (7/7) FINISHED                                                                                                                
 => [internal] load .dockerignore                                                                                                          0.0s
 => => transferring context: 2B                                                                                                            0.0s
 => [internal] load build definition from Dockerfile                                                                                       0.0s
 => => transferring dockerfile: 134B                                                                                                       0.0s
 => [internal] load metadata for docker.io/jaraco/multipy-tox:latest                                                                       0.0s
 => CACHED [1/4] FROM docker.io/jaraco/multipy-tox                                                                                         0.0s
 => [2/4] RUN pipx install pdm                                                                                                             3.4s
 => [3/4] RUN pdm init --non-interactive                                                                                                   1.5s 
 => ERROR [4/4] RUN pdm add colorama                                                                                                       0.6s 
------                                                                                                                                          
 > [4/4] RUN pdm add colorama:                                                                                                                  
#0 0.370 Adding group default to lockfile                                                                                                       
#0 0.376 Adding packages to default dependencies: colorama                                                                                      
#0 0.392 The saved Python interpreter doesn't match the project's requirement. Trying to find another one.
#0 0.407 python.use_venv is on, creating a virtualenv for this project...
#0 0.600 [VirtualenvCreateError]: Can't resolve python interpreter
------
Dockerfile:4
--------------------
   2 |     run pipx install pdm
   3 |     run pdm init --non-interactive
   4 | >>> run pdm add colorama
   5 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c pdm add colorama" did not complete successfully: exit code: 1

Although pdm is installed in a virtualenv (using pipx) on Python 3.11, the virtual env it creates in ./.venv is for Python 3.12, even though py and python and python3 all refer to Python 3.11.

Aha. It does appear that if I pass --python 3.11 to pdm init, the errors go away.

from jaraco/multipy-tox
run pipx install pdm
run pdm init --non-interactive --python 3.11
run pdm add colorama

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024

Unfortunately, even after attempting to replicate the indicated failure, I cannot. The symlinks and record are not being created:

 draft $ docker run -it @$(docker build -q .) cat .venv/lib/python3.11/site-packages/colorama-0.4.6.dist-info/RECORD
colorama/__init__.py,sha256=wePQA4U20tKgYARySLEC047ucNX-g8pRLpYBuiHlLb8,266
colorama/ansi.py,sha256=Top4EeEuaQdBWdteKMEcGOTeKeF19Q-Wo_6_Cj5kOzQ,2522
colorama/ansitowin32.py,sha256=vPNYa3OZbxjbuFyaVo0Tmhmy1FZ1lKMWCnT7odXpItk,11128
colorama/initialise.py,sha256=-hIny86ClXo39ixh5iSCfUIa2f_h_bgKRDW7gqs-KLU,3325
colorama/win32.py,sha256=YQOKwMTwtGBbsY4dL5HYTvwTeP9wIQra5MvPNddpxZs,6181
colorama/winterm.py,sha256=XCQFDHjPi6AHYNdZwy0tA02H-Jh48Jp-HvCjeLeLp3U,7134
colorama/tests/__init__.py,sha256=MkgPAEzGQd-Rq0w0PZXSX2LadRWhUECcisJY8lSrm4Q,75
colorama/tests/ansi_test.py,sha256=FeViDrUINIZcr505PAxvU4AjXz1asEiALs9GXMhwRaE,2839
colorama/tests/ansitowin32_test.py,sha256=RN7AIhMJ5EqDsYaCjVo-o4u8JzDD4ukJbmevWKS70rY,10678
colorama/tests/initialise_test.py,sha256=BbPy-XfyHwJ6zKozuQOvNvQZzsx9vdb_0bYXn7hsBTc,6741
colorama/tests/isatty_test.py,sha256=Pg26LRpv0yQDB5Ac-sxgVXG7hsA1NYvapFgApZfYzZg,1866
colorama/tests/utils.py,sha256=1IIRylG39z5-dzq09R_ngufxyPZxgldNbrxKxUGwGKE,1079
colorama/tests/winterm_test.py,sha256=qoWFPEjym5gm2RuMwpf3pOis3a5r_PJZFCzK254JL8A,3709
colorama-0.4.6.dist-info/METADATA,sha256=e67SnrUMOym9sz_4TjF3vxvAV4T3aF7NyqRHHH3YEMw,17158
colorama-0.4.6.dist-info/WHEEL,sha256=cdcF4Fbd0FPtw2EMIOwH-3rSOTUdTCeOSXRMD1iLUb8,105
colorama-0.4.6.dist-info/licenses/LICENSE.txt,sha256=ysNcAmhuXQSlpxQL-zs25zrtSWZW6JEQLkKIhteTAxg,1491
colorama-0.4.6.dist-info/RECORD,,

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024

Found my mistake. Through all my troubleshooting, I'd missed the config setting. After adding the setting to the Dockerfile, I'm able to replicate the issue:

from jaraco/multipy-tox
run pipx install pdm
run pdm config install.cache true
run pdm init --non-interactive --python 3.11
run pdm add colorama
 draft $ docker run -it @$(docker build -q .) cat .venv/lib/python3.11/site-packages/colorama-0.4.6.dist-info/RECORD
colorama-0.4.6.dist-info/METADATA,sha256=e67SnrUMOym9sz_4TjF3vxvAV4T3aF7NyqRHHH3YEMw,17158
colorama-0.4.6.dist-info/WHEEL,sha256=cdcF4Fbd0FPtw2EMIOwH-3rSOTUdTCeOSXRMD1iLUb8,105
colorama-0.4.6.dist-info/licenses/LICENSE.txt,sha256=ysNcAmhuXQSlpxQL-zs25zrtSWZW6JEQLkKIhteTAxg,1491
colorama-0.4.6.dist-info/REFER_TO,sha256=78po-M2GCc5fiAv64DwmnzCULroRpLQBU3QvCI4MXGI,57
colorama-0.4.6.dist-info/RECORD,,
colorama,,

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024

I notice that even the cached wheel doesn't have a top_level.txt in its metadata:

<user>@1012b0756cc9 / # ls ~/.cache/pdm/packages/colorama-0.4.6-py2.py3-none-any/lib/colorama-0.4.6.dist-info/
METADATA  RECORD  WHEEL  licenses

And even if I install colorama without install.cache, it doesn't have a top_level.txt.

from importlib_metadata.

jaraco avatar jaraco commented on June 16, 2024

Isn't the top_level.txt generated at wheel build time? As such, I'm not sure PDM can be expected to manufacture a top_level.txt when installing a wheel that was built elsewhere and downloaded via PyPI?

I've confirmed that file is generated by setuptools (and possibly other tools) at build time as egg-info and (probably) carried over to the wheel and dist-info by wheel.

I don't think that precludes PDM from generating one, though it does beg the question - does this failure indicate that maybe these packages should be building with tools that provide this useful metadata? I'm not sure we want to add a fallback to a fallback to support these packages. Maybe PDM should warn when installing a package without top_level.txt into an environment with install.cache=true so that the user can report to that package that it isn't supplying the top_level.txt file.

Unfortunately, RECORD doesn't indicate if it's a directory.

True, although if it were a regular file, it should have an associated hash (and size). Quoting https://packaging.python.org/en/latest/specifications/binary-distribution-format/#the-dist-info-directory: "every file except RECORD, which cannot contain a hash of itself, must include its hash".

Is it too far-fetched to deduce from this that a RECORD entry without a hash (and size) (except RECORD itself) therefore must point to a directory (or more generally: something that might turn out to be importable)?

That seems reasonable, and from my perspective a modest tweak to the existing fallback, so more palatable.

Do you consider PDM's behavior here a bug, and would it be appropriate to raise an issue with PDM about this? I suspect it may be hard to argue this given that Python itself apparently has no problem importing packages that are installed this way.

Not precisely a bug, but definitely a missed expectation caused by challenging assumptions about how packages are installed. By creating a new abstraction layer between the metadata in site-packages and the package's original metadata, it's taken on the burden of owning that abstraction. That is, it's written a new RECORD to represent the virtual package and creating symlinks of all top-level names to the original package. That means importlib metadata no longer has a clear model to follow for "files supplied by this package" - should it supply the files actually installed in site-packages (the virtual package and its links) or should it supply files represented by the original package (in the cache)?

Both PDM and importlib metadata are operating outside the bounds of specifications here, providing best-effort solutions that nominally work, but which make assumptions and attempt to provide a reasonable experience in an ill-defined space.

from importlib_metadata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.