Annotate ImportGraph.get_import_details with a TypedDict

It would be great to improve the type annotations on get_import_details by using a TypedDict.

This will help downstream packages such as Import Linter better work with the results of that method.

Note: TypedDict is only available in typing from Python 3.8, so for the time being we'll need to import it from typing_extensions.

Fix importlib.util issue

In src/grimp/adaptors/packagefinder.py, should be import importlib.util.

Change downstream/upstream to importer/imported

determine_package_directory doesn't support path as argument

When creating a new graph, the determine_package_directory in the ImportLibPackageFinder adaptor doesn't support the path argument. Which means if you try to create a new Graph via grimp.build_graph outside the directory containing the package, it will throw a ValueError. Programmatically changing the directory doesn't appear to address the issue, which may be due to peculiarities of importlib.utils.find_spec.

Potential Solutions:

Demonstrate a working example of programmatically changing a directory (via shell commands, os.chdir, ctx.cd (invoke), or other systems). Maybe it's just my understanding of the shell?
Add a filepath argument to build_graph and the importlib.utils.find_spec call in determine_package_directory.

as_package parameter for direct import methods

find_modules_directly_imported_by('foo.bar', as_package=True)
find_modules_that_directly_import('foo.bar', as_package=True)
direct_import_exists(importer='foo.bar', imported='foo.baz', as_packages=True)

RuntimeError: Set changed size during iteration

Certain methods return sets from the graph's internals, e.g.

find_modules_directly_imported_by

If the graph is mutated while iterating through these sets, this can can a RuntimeError, e.g.

for imported_module in graph.find_modules_directly_imported_by("foo"):
   graph.remove_import(importer="foo", imported=imported_module)

This can be mitigated by copying the set before iterating over it, e.g.

for imported_module in graph.find_modules_directly_imported_by("foo").copy():
   graph.remove_import(importer="foo", imported=imported_module)

But probably it would be better for the method to return a copy.

Better support for multiline import statements

>>> g.get_import_details(importer='django.core.checks', imported='django.core.checks.messages')
[{'importer': 'django.core.checks', 'imported': 'django.core.checks.messages', 'line_number': 1, 'line_contents': 'from .messages import ('}]

Improve handling of subpackages passed to `build_graph`

build_graph was built with the intention of passing top level packages only. It does (unintentionally) accept subpackages - but the behaviour is not coherent.

We should decide whether to formally support this, and if so how it should behave.

In the current version of Grimp, compare building the graph using the root, and doing the same with a subpackage. Here's the root first:

>>> graph = grimp.build_graph("django")
>>> graph.find_modules_directly_imported_by("django.db.models.base")
{'django.db.models.deletion', 'django.db', 'django.utils.encoding', 'django.core.exceptions', 'django.utils.hashable', 'django.db.models.signals', 'django.db.models.constants', 'django.db.models.utils', 'django', 'django.core.checks', 'django.db.models.fields.related', 'django.db.models.constraints', 'django.conf', 'django.db.models', 'django.db.models.functions', 'django.db.transaction', 'django.utils.text', 'django.db.models.manager', 'django.db.models.options', 'django.db.models.query', 'django.utils.translation', 'django.apps'}

All well and good. This time we build using a subpackage. Note we use include_external_packages so that we can see any imports happening outside that subpackage that are still in Django:

>>> graph = grimp.build_graph("django.db", include_external_packages=True)

The first undesirable behaviour is that with the modules in the graph are not under the django namespace but rather the subpackage's namespace:

>>> sorted(graph.modules)[20:25]
['db.backends.base.introspection', 'db.backends.base.operations', 'db.backends.base.schema', 'db.backends.base.validation', 'db.backends.ddl_references']

Secondly (because of this) the imports don't correctly map on to the models:

>>> graph.find_modules_directly_imported_by("db.models.base")
{'django', 'functools', 'copy', 'warnings', 'inspect', 'itertools'}

This should be including the imports within django.db, such as django.db.models.deletion.

Course of action

I think the best thing should probably be to raise an exception if a subpackage (i.e. anything with a .) is passed to build_graph. Although this means users will be forced to build a larger graph than they may need (if they're only interested in a subpackage), it keeps the API simple, which I think is more important.

Open to other opinions...

return_import_paths parameter

We want the graph to make it easy to inspect dependencies between modules as packages rather than just as individual modules.

This needs more thought.

Correct typo in usage example

https://grimp.readthedocs.io/en/v1.0b5/usage.html#building-the-graph

graph = grimp.build_graph('name_)

Should be

graph = grimp.build_graph('mypackage')

Set up mypy

Add support for detecting whether an import is only made during type checking

Specifically, be able to detect when imports are used under typing.TYPE_CHECKING.

Raising this in Grimp, as it appears the functionality isn't in place, and so can't be implemented with a custom contract.

While I understand that this is solved by introducing an interface, this results in a lot of boilerplate code when all you're seeking is basic static analysis + hints in an editor.

Given a Contract:

name = Prevent cross-reliance on API Clients
type = independence
modules =
    ulama.api_client_1
    ulama.api_client_2
    ulama.remote_transfer

and modules of:

# ulama/api_client_1.py
class ApiClient1: ...

# ulama/api_client_2.py
class ApiClient2: ...

# ulama/remote_transfer.py
from typing import TYPE_CHECKING
  from .api_client_1 import ApiClient1
  from .api_client_2 import ApiClient2

class RemoteTransfer:
  from_client: ApiClient1
  to_client: ApiClient2

I would expect this to pass the contract.

to reiterate, understand it'd be better to introduce a stop gap, but feel there's a use-case for keeping things simple in the code, while still being able to offer an external contract to prevent mistakes.

Bug with _trim_to_known_module

For example:

grimp.build_graph('django')
...
ValueError: Could not trim django.core.messages.CRITICAL.

Warn instead of raising an exception on import failure

Example:

graph = grimp.build_graph('django')

ValueError: Could not trim django.contrib.auth.management.commands.changepassword.

What's happening here is it can't find the module because there is no __init__.py file within django.contrib.auth.management.commands. Instead, we should warn that we couldn't find the module, and skip it.

Ability to count dependencies in the graph

This is needed if we're going to use it in Layer Linter.

Adding an import should always set import_details

If you add an import using add_import without supplying a line number and contents, then get_import_details will not include it in the results (contrary to the documentation).

Support for building stdlib packages

Currently, grimp fails to build a graph of a stdlib package:

>>> import grimp
>>> g = grimp.build_graph('sys')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/grimp/src/grimp/application/usecases.py", line 26, in build_graph
    file_system=file_system,
  File ".../lib/grimp/src/grimp/adaptors/packagefinder.py", line 23, in determine_package_directory
    assert package_filename.origin  # For type checker.
AssertionError

Functional tests

Support for namespace packages

Python provides the ability to organize a package so that it exists in multiple directory hierarchies. These are known as 'namespace packages'.

It is not a priority to support such packages in Grimp for the time being, though pull requests would be considered!

Feature request: Expose cycles as part of the public API

Presently I would like to detect dependency cycles in my graphs which requires either access to private use of networkx

from typing import Iterator, List

import networkx
from grimp.adaptors.graph import ImportGraph


def dependency_cycles(graph: ImportGraph) -> Iterator[List[str]]:
    # Violates private internals of grimp by accessing private implementation
    cycles = networkx.simple_cycles(graph._networkx_graph)

    for cycle in cycles:
        if len(cycle) > 1:
            yield cycle

Alternatively I would need to re-implement topsort on-top of walking through grimp.

Benefits: Less redundant code, less violations of internals.
Drawbacks: Your project adds a new algo/feature to the public API.

Caching

This would make a huge difference to the usability of Import Linter for large graphs.

Better handling of modules that are not in the graph

For example:

File ".../grimp/adaptors/graph.py", line 201, in find_shortest_chain
    target=imported))
  File ".../networkx/algorithms/shortest_paths/generic.py", line 170, in shortest_path
    paths = nx.bidirectional_shortest_path(G, source, target)
  File ".../networkx/algorithms/shortest_paths/unweighted.py", line 223, in bidirectional_shortest_path
    raise nx.NodeNotFound(msg.format(source, target))
networkx.exception.NodeNotFound: Either source myproject.foo or target myproject.bar is not in G

This should raise a specific grimp exception class ModuleNotInGraph with a nicer error message.

depth parameter

It would be useful to be able to specify the module depth of certain results.

For example:

find_upstream_modules('mypackage.foo') == {
    'mypackage.bar.one',
    'mypackage.baz.two.green',
    'mypackage.baz.two.blue',
}

# Possibly this should error as it's not very useful?
find_upstream_modules('mypackage.foo', depth=1) == {
    'mypackage',
}

find_upstream_modules('mypackage.foo', depth=2) == {
    'mypackage.bar',
    'mypackage.baz',
}

find_upstream_modules('mypackage.foo', depth=3) == {
    'mypackage.bar.one',
    'mypackage.baz.two.green',
    'mypackage.baz.two.blue',
}

Add examples page to readme

It would be good to have an example of how to use it.

API for getting details about a module

E.g. if it's squashed (or external?)

Raise exception if non top level package is passed to build_graph

Hide domain.valueobjects

The domain level classes (e.g. Modules) should be pushed down so that the Graph does not require the user to interact with them.

Raise exception if calling methods on non-existent modules.

E.g. graph.find_children('mypackage.nonexistent') just returns an empty set - it should raise an exception.

Create minimal README

Better handling of installable packages without init.py

If you have a package that's installed via a setup.py, but it doesn't have an __init__.py file at the top level, you get the following error:

...
File ".../grimp/adaptors/packagefinder.py", line 23, in determine_package_directory
    assert package_filename.origin  # For type checker.
AssertionError

The simple fix at the user's end is to add the missing __init__.py file. But it's a confusing error message.

The problem is, it's assuming that the origin attribute on the returned ModuleSpec will always be set. That's not true, see the docs: https://docs.python.org/3.6/library/importlib.html#importlib.machinery.ModuleSpec.origin

Feature request: Scan packages from external project

I have a project where I want to scan a target directory of another project that is not in the project grimp is installed in. I was reading through the code and it appears to be possible as the import graph is determined by AST parsing of files in a directory. Has a feature like this been considered or even seen as desirable to have (in the scope of grimp's goals)?

I was thinking a lot could be possible if we changed some behaviour inside def build_graph(

grimp/src/grimp/application/usecases.py

Line 15 in 8b4e33a

def build_graph(

or provided a different "usecase" that is filesystem based for the parameter. What do you think?

find_shortest_chains method

graph.find_shortest_chains(importer, imported)

The as_packages equivalent to find_shortest_chain. This would always treat the importer and imported as packages.

Warn when encountering a module with a dot in the name

As reported here, if there is a module with a dot in the name, Grimp will raise a FileNotFoundError. While these modules are not validly named Python modules, it should still be possible to build a graph, ignoring the module in question.

Probably Grimp should log a warning in this case.

Sort out readme rendering on Pypi and release

Excludable paths

It needs to be possible to exclude certain paths from the graph when testing for chains.

Reverse import arrow notation

Arrows should point from importer -> imported. This is the more conventional notational, and it is also nicer when including alongside UML implementation arrows (as both arrows face in the same direction).

Ability to squash a module in the graph

Ability to remove a module from the graph

This would allow features like this: seddonym/import-linter#44

Set up travis

Set up coverage

Add support for external packages

Set up flake8

Correct documentation for find_shortest_path

The order of the tuple is described wrongly.

While doing this, it's probably worth thinking about the ordering in general around upstream/downstream, importer/imported and make sure it's standardised and intuitive.

Starred imports

g = grimp.build_graph('networkx')

Gives:
ValueError: Could not trim networkx.layout.*.

Set up tox

Failure to handle some modules on Windows due to encoding

Here is how to reproduce the error:

Use Windows
Install grimp.
Create a python package test_encoding. Inside __init__.py add this content : print('變量').
Open a python shell and do:

>>> import grimp
>>> graph = grimp.build_graph('test_encoding')

Fails with :

  File "C:\Users\adam\Anaconda3\lib\site-packages\grimp\adaptors\filesystem.py", line 26, in read
    return file.read()
  File "C:\Users\adam\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 12: character maps to <undefined>

Feature: Add a use-case to handle multiple projects at run-time.

I've been working with grimp to experiment with mapping project dependencies and structures over multiple projects at run-time, including previous versions of a project to watch changes in dependencies over time.

I have a fork that lets me do this but if there is interest in the broader collection of people using grimp it might be useful to make it a proper part of the project. This issue is mostly here to anchor any interest in a feature like that and discuss if anyone else wants to use that kind of feature.

seddonym / grimp Goto Github PK

grimp's People

Contributors

Stargazers

Watchers

Forkers

grimp's Issues

Course of action

Recommend Projects

Recommend Topics

Recommend Org