Coder Social home page Coder Social logo

grimp's People

Contributors

kasium avatar mwgamble avatar peter554 avatar seddonym avatar yakmm avatar zipfile avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

grimp's Issues

determine_package_directory doesn't support path as argument

When creating a new graph, the determine_package_directory in the ImportLibPackageFinder adaptor doesn't support the path argument. Which means if you try to create a new Graph via grimp.build_graph outside the directory containing the package, it will throw a ValueError. Programmatically changing the directory doesn't appear to address the issue, which may be due to peculiarities of importlib.utils.find_spec.

Potential Solutions:

  • Demonstrate a working example of programmatically changing a directory (via shell commands, os.chdir, ctx.cd (invoke), or other systems). Maybe it's just my understanding of the shell?
  • Add a filepath argument to build_graph and the importlib.utils.find_spec call in determine_package_directory.

as_package parameter for direct import methods

find_modules_directly_imported_by('foo.bar', as_package=True)
find_modules_that_directly_import('foo.bar', as_package=True)
direct_import_exists(importer='foo.bar', imported='foo.baz', as_packages=True)

RuntimeError: Set changed size during iteration

Certain methods return sets from the graph's internals, e.g.

  • find_modules_directly_imported_by

If the graph is mutated while iterating through these sets, this can can a RuntimeError, e.g.

for imported_module in graph.find_modules_directly_imported_by("foo"):
   graph.remove_import(importer="foo", imported=imported_module)

This can be mitigated by copying the set before iterating over it, e.g.

for imported_module in graph.find_modules_directly_imported_by("foo").copy():
   graph.remove_import(importer="foo", imported=imported_module)

But probably it would be better for the method to return a copy.

Better support for multiline import statements

>>> g.get_import_details(importer='django.core.checks', imported='django.core.checks.messages')
[{'importer': 'django.core.checks', 'imported': 'django.core.checks.messages', 'line_number': 1, 'line_contents': 'from .messages import ('}]

Improve handling of subpackages passed to `build_graph`

build_graph was built with the intention of passing top level packages only. It does (unintentionally) accept subpackages - but the behaviour is not coherent.

We should decide whether to formally support this, and if so how it should behave.

In the current version of Grimp, compare building the graph using the root, and doing the same with a subpackage. Here's the root first:

>>> graph = grimp.build_graph("django")
>>> graph.find_modules_directly_imported_by("django.db.models.base")
{'django.db.models.deletion', 'django.db', 'django.utils.encoding', 'django.core.exceptions', 'django.utils.hashable', 'django.db.models.signals', 'django.db.models.constants', 'django.db.models.utils', 'django', 'django.core.checks', 'django.db.models.fields.related', 'django.db.models.constraints', 'django.conf', 'django.db.models', 'django.db.models.functions', 'django.db.transaction', 'django.utils.text', 'django.db.models.manager', 'django.db.models.options', 'django.db.models.query', 'django.utils.translation', 'django.apps'}

All well and good. This time we build using a subpackage. Note we use include_external_packages so that we can see any imports happening outside that subpackage that are still in Django:

>>> graph = grimp.build_graph("django.db", include_external_packages=True)

The first undesirable behaviour is that with the modules in the graph are not under the django namespace but rather the subpackage's namespace:

>>> sorted(graph.modules)[20:25]
['db.backends.base.introspection', 'db.backends.base.operations', 'db.backends.base.schema', 'db.backends.base.validation', 'db.backends.ddl_references']

Secondly (because of this) the imports don't correctly map on to the models:

>>> graph.find_modules_directly_imported_by("db.models.base")
{'django', 'functools', 'copy', 'warnings', 'inspect', 'itertools'}

This should be including the imports within django.db, such as django.db.models.deletion.

Course of action

I think the best thing should probably be to raise an exception if a subpackage (i.e. anything with a .) is passed to build_graph. Although this means users will be forced to build a larger graph than they may need (if they're only interested in a subpackage), it keeps the API simple, which I think is more important.

Open to other opinions...

return_import_paths parameter

We want the graph to make it easy to inspect dependencies between modules as packages rather than just as individual modules.

This needs more thought.

Add support for detecting whether an import is only made during type checking

Specifically, be able to detect when imports are used under typing.TYPE_CHECKING.

Raising this in Grimp, as it appears the functionality isn't in place, and so can't be implemented with a custom contract.

While I understand that this is solved by introducing an interface, this results in a lot of boilerplate code when all you're seeking is basic static analysis + hints in an editor.

Given a Contract:

name = Prevent cross-reliance on API Clients
type = independence
modules =
    ulama.api_client_1
    ulama.api_client_2
    ulama.remote_transfer

and modules of:

# ulama/api_client_1.py
class ApiClient1: ...

# ulama/api_client_2.py
class ApiClient2: ...

# ulama/remote_transfer.py
from typing import TYPE_CHECKING
  from .api_client_1 import ApiClient1
  from .api_client_2 import ApiClient2

class RemoteTransfer:
  from_client: ApiClient1
  to_client: ApiClient2

I would expect this to pass the contract.

to reiterate, understand it'd be better to introduce a stop gap, but feel there's a use-case for keeping things simple in the code, while still being able to offer an external contract to prevent mistakes.

Warn instead of raising an exception on import failure

Example:

graph = grimp.build_graph('django')

ValueError: Could not trim django.contrib.auth.management.commands.changepassword.

What's happening here is it can't find the module because there is no __init__.py file within django.contrib.auth.management.commands. Instead, we should warn that we couldn't find the module, and skip it.

Support for building stdlib packages

Currently, grimp fails to build a graph of a stdlib package:

>>> import grimp
>>> g = grimp.build_graph('sys')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/grimp/src/grimp/application/usecases.py", line 26, in build_graph
    file_system=file_system,
  File ".../lib/grimp/src/grimp/adaptors/packagefinder.py", line 23, in determine_package_directory
    assert package_filename.origin  # For type checker.
AssertionError

Support for namespace packages

Python provides the ability to organize a package so that it exists in multiple directory hierarchies. These are known as 'namespace packages'.

It is not a priority to support such packages in Grimp for the time being, though pull requests would be considered!

Feature request: Expose cycles as part of the public API

Presently I would like to detect dependency cycles in my graphs which requires either access to private use of networkx

from typing import Iterator, List

import networkx
from grimp.adaptors.graph import ImportGraph


def dependency_cycles(graph: ImportGraph) -> Iterator[List[str]]:
    # Violates private internals of grimp by accessing private implementation
    cycles = networkx.simple_cycles(graph._networkx_graph)

    for cycle in cycles:
        if len(cycle) > 1:
            yield cycle

Alternatively I would need to re-implement topsort on-top of walking through grimp.

Benefits: Less redundant code, less violations of internals.
Drawbacks: Your project adds a new algo/feature to the public API.

Caching

This would make a huge difference to the usability of Import Linter for large graphs.

Better handling of modules that are not in the graph

For example:

File ".../grimp/adaptors/graph.py", line 201, in find_shortest_chain
    target=imported))
  File ".../networkx/algorithms/shortest_paths/generic.py", line 170, in shortest_path
    paths = nx.bidirectional_shortest_path(G, source, target)
  File ".../networkx/algorithms/shortest_paths/unweighted.py", line 223, in bidirectional_shortest_path
    raise nx.NodeNotFound(msg.format(source, target))
networkx.exception.NodeNotFound: Either source myproject.foo or target myproject.bar is not in G

This should raise a specific grimp exception class ModuleNotInGraph with a nicer error message.

depth parameter

It would be useful to be able to specify the module depth of certain results.

For example:

find_upstream_modules('mypackage.foo') == {
    'mypackage.bar.one',
    'mypackage.baz.two.green',
    'mypackage.baz.two.blue',
}

# Possibly this should error as it's not very useful?
find_upstream_modules('mypackage.foo', depth=1) == {
    'mypackage',
}

find_upstream_modules('mypackage.foo', depth=2) == {
    'mypackage.bar',
    'mypackage.baz',
}

find_upstream_modules('mypackage.foo', depth=3) == {
    'mypackage.bar.one',
    'mypackage.baz.two.green',
    'mypackage.baz.two.blue',
}

Hide domain.valueobjects

The domain level classes (e.g. Modules) should be pushed down so that the Graph does not require the user to interact with them.

Better handling of installable packages without __init__.py

If you have a package that's installed via a setup.py, but it doesn't have an __init__.py file at the top level, you get the following error:

...
File ".../grimp/adaptors/packagefinder.py", line 23, in determine_package_directory
    assert package_filename.origin  # For type checker.
AssertionError

The simple fix at the user's end is to add the missing __init__.py file. But it's a confusing error message.

The problem is, it's assuming that the origin attribute on the returned ModuleSpec will always be set. That's not true, see the docs: https://docs.python.org/3.6/library/importlib.html#importlib.machinery.ModuleSpec.origin

Feature request: Scan packages from external project

I have a project where I want to scan a target directory of another project that is not in the project grimp is installed in. I was reading through the code and it appears to be possible as the import graph is determined by AST parsing of files in a directory. Has a feature like this been considered or even seen as desirable to have (in the scope of grimp's goals)?

I was thinking a lot could be possible if we changed some behaviour inside def build_graph(

def build_graph(
or provided a different "usecase" that is filesystem based for the parameter. What do you think?

find_shortest_chains method

graph.find_shortest_chains(importer, imported)

The as_packages equivalent to find_shortest_chain. This would always treat the importer and imported as packages.

Warn when encountering a module with a dot in the name

As reported here, if there is a module with a dot in the name, Grimp will raise a FileNotFoundError. While these modules are not validly named Python modules, it should still be possible to build a graph, ignoring the module in question.

Probably Grimp should log a warning in this case.

Excludable paths

It needs to be possible to exclude certain paths from the graph when testing for chains.

Reverse import arrow notation

Arrows should point from importer -> imported. This is the more conventional notational, and it is also nicer when including alongside UML implementation arrows (as both arrows face in the same direction).

Correct documentation for find_shortest_path

The order of the tuple is described wrongly.

While doing this, it's probably worth thinking about the ordering in general around upstream/downstream, importer/imported and make sure it's standardised and intuitive.

Starred imports

g = grimp.build_graph('networkx')

Gives:
ValueError: Could not trim networkx.layout.*.

Failure to handle some modules on Windows due to encoding

Here is how to reproduce the error:

  1. Use Windows
  2. Install grimp.
  3. Create a python package test_encoding. Inside __init__.py add this content : print('่ฎŠ้‡').
  4. Open a python shell and do:
>>> import grimp
>>> graph = grimp.build_graph('test_encoding')

Fails with :

  File "C:\Users\adam\Anaconda3\lib\site-packages\grimp\adaptors\filesystem.py", line 26, in read
    return file.read()
  File "C:\Users\adam\Anaconda3\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 12: character maps to <undefined>

Feature: Add a use-case to handle multiple projects at run-time.

I've been working with grimp to experiment with mapping project dependencies and structures over multiple projects at run-time, including previous versions of a project to watch changes in dependencies over time.

I have a fork that lets me do this but if there is interest in the broader collection of people using grimp it might be useful to make it a proper part of the project. This issue is mostly here to anchor any interest in a feature like that and discuss if anyone else wants to use that kind of feature.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.