seddonym / grimp Goto Github PK
View Code? Open in Web Editor NEWBuilds a graph of a Python project's internal dependencies.
Home Page: https://grimp.readthedocs.io
License: BSD 2-Clause "Simplified" License
Builds a graph of a Python project's internal dependencies.
Home Page: https://grimp.readthedocs.io
License: BSD 2-Clause "Simplified" License
It would be great to improve the type annotations on get_import_details by using a TypedDict.
This will help downstream packages such as Import Linter better work with the results of that method.
Note: TypedDict is only available in typing
from Python 3.8, so for the time being we'll need to import it from typing_extensions
.
In src/grimp/adaptors/packagefinder.py, should be import importlib.util
.
When creating a new graph, the determine_package_directory
in the ImportLibPackageFinder
adaptor doesn't support the path argument. Which means if you try to create a new Graph via grimp.build_graph
outside the directory containing the package, it will throw a ValueError
. Programmatically changing the directory doesn't appear to address the issue, which may be due to peculiarities of importlib.utils.find_spec
.
Potential Solutions:
os.chdir
, ctx.cd
(invoke), or other systems). Maybe it's just my understanding of the shell?build_graph
and the importlib.utils.find_spec
call in determine_package_directory
.find_modules_directly_imported_by('foo.bar', as_package=True)
find_modules_that_directly_import('foo.bar', as_package=True)
direct_import_exists(importer='foo.bar', imported='foo.baz', as_packages=True)
Certain methods return sets from the graph's internals, e.g.
find_modules_directly_imported_by
If the graph is mutated while iterating through these sets, this can can a RuntimeError, e.g.
for imported_module in graph.find_modules_directly_imported_by("foo"):
graph.remove_import(importer="foo", imported=imported_module)
This can be mitigated by copying the set before iterating over it, e.g.
for imported_module in graph.find_modules_directly_imported_by("foo").copy():
graph.remove_import(importer="foo", imported=imported_module)
But probably it would be better for the method to return a copy.
>>> g.get_import_details(importer='django.core.checks', imported='django.core.checks.messages')
[{'importer': 'django.core.checks', 'imported': 'django.core.checks.messages', 'line_number': 1, 'line_contents': 'from .messages import ('}]
build_graph
was built with the intention of passing top level packages only. It does (unintentionally) accept subpackages - but the behaviour is not coherent.
We should decide whether to formally support this, and if so how it should behave.
In the current version of Grimp, compare building the graph using the root, and doing the same with a subpackage. Here's the root first:
>>> graph = grimp.build_graph("django")
>>> graph.find_modules_directly_imported_by("django.db.models.base")
{'django.db.models.deletion', 'django.db', 'django.utils.encoding', 'django.core.exceptions', 'django.utils.hashable', 'django.db.models.signals', 'django.db.models.constants', 'django.db.models.utils', 'django', 'django.core.checks', 'django.db.models.fields.related', 'django.db.models.constraints', 'django.conf', 'django.db.models', 'django.db.models.functions', 'django.db.transaction', 'django.utils.text', 'django.db.models.manager', 'django.db.models.options', 'django.db.models.query', 'django.utils.translation', 'django.apps'}
All well and good. This time we build using a subpackage. Note we use include_external_packages so that we can see any imports happening outside that subpackage that are still in Django:
>>> graph = grimp.build_graph("django.db", include_external_packages=True)
The first undesirable behaviour is that with the modules in the graph are not under the django namespace but rather the subpackage's namespace:
>>> sorted(graph.modules)[20:25]
['db.backends.base.introspection', 'db.backends.base.operations', 'db.backends.base.schema', 'db.backends.base.validation', 'db.backends.ddl_references']
Secondly (because of this) the imports don't correctly map on to the models:
>>> graph.find_modules_directly_imported_by("db.models.base")
{'django', 'functools', 'copy', 'warnings', 'inspect', 'itertools'}
This should be including the imports within django.db
, such as django.db.models.deletion
.
I think the best thing should probably be to raise an exception if a subpackage (i.e. anything with a .
) is passed to build_graph
. Although this means users will be forced to build a larger graph than they may need (if they're only interested in a subpackage), it keeps the API simple, which I think is more important.
Open to other opinions...
We want the graph to make it easy to inspect dependencies between modules as packages rather than just as individual modules.
This needs more thought.
https://grimp.readthedocs.io/en/v1.0b5/usage.html#building-the-graph
graph = grimp.build_graph('name_)
Should be
graph = grimp.build_graph('mypackage')
Specifically, be able to detect when imports are used under typing.TYPE_CHECKING
.
Raising this in Grimp, as it appears the functionality isn't in place, and so can't be implemented with a custom contract.
While I understand that this is solved by introducing an interface, this results in a lot of boilerplate code when all you're seeking is basic static analysis + hints in an editor.
Given a Contract:
name = Prevent cross-reliance on API Clients
type = independence
modules =
ulama.api_client_1
ulama.api_client_2
ulama.remote_transfer
and modules of:
# ulama/api_client_1.py
class ApiClient1: ...
# ulama/api_client_2.py
class ApiClient2: ...
# ulama/remote_transfer.py
from typing import TYPE_CHECKING
from .api_client_1 import ApiClient1
from .api_client_2 import ApiClient2
class RemoteTransfer:
from_client: ApiClient1
to_client: ApiClient2
I would expect this to pass the contract.
to reiterate, understand it'd be better to introduce a stop gap, but feel there's a use-case for keeping things simple in the code, while still being able to offer an external contract to prevent mistakes.
For example:
grimp.build_graph('django')
...
ValueError: Could not trim django.core.messages.CRITICAL.
Example:
graph = grimp.build_graph('django')
ValueError: Could not trim django.contrib.auth.management.commands.changepassword.
What's happening here is it can't find the module because there is no __init__.py
file within django.contrib.auth.management.commands
. Instead, we should warn that we couldn't find the module, and skip it.
This is needed if we're going to use it in Layer Linter.
If you add an import using add_import
without supplying a line number and contents, then get_import_details
will not include it in the results (contrary to the documentation).
Currently, grimp fails to build a graph of a stdlib package:
>>> import grimp
>>> g = grimp.build_graph('sys')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File ".../lib/grimp/src/grimp/application/usecases.py", line 26, in build_graph
file_system=file_system,
File ".../lib/grimp/src/grimp/adaptors/packagefinder.py", line 23, in determine_package_directory
assert package_filename.origin # For type checker.
AssertionError
Python provides the ability to organize a package so that it exists in multiple directory hierarchies. These are known as 'namespace packages'.
It is not a priority to support such packages in Grimp for the time being, though pull requests would be considered!
Presently I would like to detect dependency cycles in my graphs which requires either access to private use of networkx
from typing import Iterator, List
import networkx
from grimp.adaptors.graph import ImportGraph
def dependency_cycles(graph: ImportGraph) -> Iterator[List[str]]:
# Violates private internals of grimp by accessing private implementation
cycles = networkx.simple_cycles(graph._networkx_graph)
for cycle in cycles:
if len(cycle) > 1:
yield cycle
Alternatively I would need to re-implement topsort on-top of walking through grimp.
Benefits: Less redundant code, less violations of internals.
Drawbacks: Your project adds a new algo/feature to the public API.
This would make a huge difference to the usability of Import Linter for large graphs.
For example:
File ".../grimp/adaptors/graph.py", line 201, in find_shortest_chain
target=imported))
File ".../networkx/algorithms/shortest_paths/generic.py", line 170, in shortest_path
paths = nx.bidirectional_shortest_path(G, source, target)
File ".../networkx/algorithms/shortest_paths/unweighted.py", line 223, in bidirectional_shortest_path
raise nx.NodeNotFound(msg.format(source, target))
networkx.exception.NodeNotFound: Either source myproject.foo or target myproject.bar is not in G
This should raise a specific grimp exception class ModuleNotInGraph
with a nicer error message.
It would be useful to be able to specify the module depth of certain results.
For example:
find_upstream_modules('mypackage.foo') == {
'mypackage.bar.one',
'mypackage.baz.two.green',
'mypackage.baz.two.blue',
}
# Possibly this should error as it's not very useful?
find_upstream_modules('mypackage.foo', depth=1) == {
'mypackage',
}
find_upstream_modules('mypackage.foo', depth=2) == {
'mypackage.bar',
'mypackage.baz',
}
find_upstream_modules('mypackage.foo', depth=3) == {
'mypackage.bar.one',
'mypackage.baz.two.green',
'mypackage.baz.two.blue',
}
It would be good to have an example of how to use it.
E.g. if it's squashed (or external?)
The domain level classes (e.g. Modules) should be pushed down so that the Graph does not require the user to interact with them.
E.g. graph.find_children('mypackage.nonexistent') just returns an empty set - it should raise an exception.
If you have a package that's installed via a setup.py, but it doesn't have an __init__.py
file at the top level, you get the following error:
...
File ".../grimp/adaptors/packagefinder.py", line 23, in determine_package_directory
assert package_filename.origin # For type checker.
AssertionError
The simple fix at the user's end is to add the missing __init__.py
file. But it's a confusing error message.
The problem is, it's assuming that the origin
attribute on the returned ModuleSpec will always be set. That's not true, see the docs: https://docs.python.org/3.6/library/importlib.html#importlib.machinery.ModuleSpec.origin
I have a project where I want to scan a target directory of another project that is not in the project grimp is installed in. I was reading through the code and it appears to be possible as the import graph is determined by AST parsing of files in a directory. Has a feature like this been considered or even seen as desirable to have (in the scope of grimp's goals)?
I was thinking a lot could be possible if we changed some behaviour inside def build_graph(
grimp/src/grimp/application/usecases.py
Line 15 in 8b4e33a
graph.find_shortest_chains(importer, imported)
The as_packages equivalent to find_shortest_chain
. This would always treat the importer and imported as packages.
As reported here, if there is a module with a dot in the name, Grimp will raise a FileNotFoundError. While these modules are not validly named Python modules, it should still be possible to build a graph, ignoring the module in question.
Probably Grimp should log a warning in this case.
It needs to be possible to exclude certain paths from the graph when testing for chains.
Arrows should point from importer -> imported. This is the more conventional notational, and it is also nicer when including alongside UML implementation arrows (as both arrows face in the same direction).
This would allow features like this: seddonym/import-linter#44
The order of the tuple is described wrongly.
While doing this, it's probably worth thinking about the ordering in general around upstream/downstream, importer/imported and make sure it's standardised and intuitive.
g = grimp.build_graph('networkx')
Gives:
ValueError: Could not trim networkx.layout.*.
Here is how to reproduce the error:
grimp
.test_encoding
. Inside __init__.py
add this content : print('่ฎ้')
.>>> import grimp
>>> graph = grimp.build_graph('test_encoding')
Fails with :
File "C:\Users\adam\Anaconda3\lib\site-packages\grimp\adaptors\filesystem.py", line 26, in read
return file.read()
File "C:\Users\adam\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 12: character maps to <undefined>
I've been working with grimp to experiment with mapping project dependencies and structures over multiple projects at run-time, including previous versions of a project to watch changes in dependencies over time.
I have a fork that lets me do this but if there is interest in the broader collection of people using grimp it might be useful to make it a proper part of the project. This issue is mostly here to anchor any interest in a feature like that and discuss if anyone else wants to use that kind of feature.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.