Coder Social home page Coder Social logo

catalogue's Introduction

catalogue: Super lightweight function registries for your library

catalogue is a tiny, zero-dependencies library that makes it easy to add function (or object) registries to your code. Function registries are helpful when you have objects that need to be both easily serializable and fully customizable. Instead of passing a function into your object, you pass in an identifier name, which the object can use to lookup the function from the registry. This makes the object easy to serialize, because the name is a simple string. If you instead saved the function, you'd have to use Pickle for serialization, which has many drawbacks.

tests Current Release Version pypi Version conda Version Code style: black

⏳ Installation

pip install catalogue
conda install -c conda-forge catalogue

⚠️ Important note: catalogue v2.0+ is only compatible with Python 3.6+. For Python 2.7+ compatibility, use catalogue v1.x.

👩‍💻 Usage

Let's imagine you're developing a Python package that needs to load data somewhere. You've already implemented some loader functions for the most common data types, but you want to allow the user to easily add their own. Using catalogue.create you can create a new registry under the namespace your_packageloaders.

# YOUR PACKAGE
import catalogue

loaders = catalogue.create("your_package", "loaders")

This gives you a loaders.register decorator that your users can import and decorate their custom loader functions with.

# USER CODE
from your_package import loaders

@loaders.register("custom_loader")
def custom_loader(data):
    # Load something here...
    return data

The decorated function will be registered automatically and in your package, you'll be able to access all loaders by calling loaders.get_all.

# YOUR PACKAGE
def load_data(data, loader_id):
    print("All loaders:", loaders.get_all()) # {"custom_loader": <custom_loader>}
    loader = loaders.get(loader_id)
    return loader(data)

The user can now refer to their custom loader using only its string name ("custom_loader") and your application will know what to do and will use their custom function.

# USER CODE
from your_package import load_data

load_data(data, loader_id="custom_loader")

❓ FAQ

But can't the user just pass in the custom_loader function directly?

Sure, that's the more classic callback approach. Instead of a string ID, load_data could also take a function, in which case you wouldn't need a package like this. catalogue helps you when you need to produce a serializable record of which functions were passed in. For instance, you might want to write a log message, or save a config to load back your object later. With catalogue, your functions can be parameterized by strings, so logging and serialization remains easy – while still giving you full extensibility.

How do I make sure all of the registration decorators have run?

Decorators normally run when modules are imported. Relying on this side-effect can sometimes lead to confusion, especially if there's no other reason the module would be imported. One solution is to use entry points.

For instance, in spaCy we're starting to use function registries to make the pipeline components much more customizable. Let's say one user, Jo, develops a better tagging model using new machine learning research. End-users of Jo's package should be able to write spacy.load("jo_tagging_model"). They shouldn't need to remember to write import jos_tagged_model first, just to run the function registries as a side-effect. With entry points, the registration happens at install time – so you don't need to rely on the import side-effects.

🎛 API

function catalogue.create

Create a new registry for a given namespace. Returns a setter function that can be used as a decorator or called with a name and func keyword argument. If entry_points=True is set, the registry will check for Python entry points advertised for the given namespace, e.g. the entry point group spacy_architectures for the namespace "spacy", "architectures", in Registry.get and Registry.get_all. This allows other packages to auto-register functions.

Argument Type Description
*namespace str The namespace, e.g. "spacy" or "spacy", "architectures".
entry_points bool Whether to check for entry points of the given namespace and pre-populate the global registry.
RETURNS Registry The Registry object with methods to register and retrieve functions.
architectures = catalogue.create("spacy", "architectures")

# Use as decorator
@architectures.register("custom_architecture")
def custom_architecture():
    pass

# Use as regular function
architectures.register("custom_architecture", func=custom_architecture)

class Registry

The registry object that can be used to register and retrieve functions. It's usually created internally when you call catalogue.create.

method Registry.__init__

Initialize a new registry. If entry_points=True is set, the registry will check for Python entry points advertised for the given namespace, e.g. the entry point group spacy_architectures for the namespace "spacy", "architectures", in Registry.get and Registry.get_all.

Argument Type Description
namespace Tuple[str] The namespace, e.g. "spacy" or "spacy", "architectures".
entry_points bool Whether to check for entry points of the given namespace in get and get_all.
RETURNS Registry The newly created object.
# User-facing API
architectures = catalogue.create("spacy", "architectures")
# Internal API
architectures = Registry(("spacy", "architectures"))

method Registry.__contains__

Check whether a name is in the registry.

Argument Type Description
name str The name to check.
RETURNS bool Whether the name is in the registry.
architectures = catalogue.create("spacy", "architectures")

@architectures.register("custom_architecture")
def custom_architecture():
    pass

assert "custom_architecture" in architectures

method Registry.__call__

Register a function in the registry's namespace. Can be used as a decorator or called as a function with the func keyword argument supplying the function to register. Delegates to Registry.register.

method Registry.register

Register a function in the registry's namespace. Can be used as a decorator or called as a function with the func keyword argument supplying the function to register.

Argument Type Description
name str The name to register under the namespace.
func Any Optional function to register (if not used as decorator).
RETURNS Callable The decorator that takes one argument, the name.
architectures = catalogue.create("spacy", "architectures")

# Use as decorator
@architectures.register("custom_architecture")
def custom_architecture():
    pass

# Use as regular function
architectures.register("custom_architecture", func=custom_architecture)

method Registry.get

Get a function registered in the namespace.

Argument Type Description
name str The name.
RETURNS Any The registered function.
custom_architecture = architectures.get("custom_architecture")

method Registry.get_all

Get all functions in the registry's namespace.

Argument Type Description
RETURNS Dict[str, Any] The registered functions, keyed by name.
all_architectures = architectures.get_all()
# {"custom_architecture": <custom_architecture>}

method Registry.get_entry_points

Get registered entry points from other packages for this namespace. The name of the entry point group is the namespace joined by _.

Argument Type Description
RETURNS Dict[str, Any] The loaded entry points, keyed by name.
architectures = catalogue.create("spacy", "architectures", entry_points=True)
# Will get all entry points of the group "spacy_architectures"
all_entry_points = architectures.get_entry_points()

method Registry.get_entry_point

Check if registered entry point is available for a given name in the namespace and load it. Otherwise, return the default value.

Argument Type Description
name str Name of entry point to load.
default Any The default value to return. Defaults to None.
RETURNS Any The loaded entry point or the default value.
architectures = catalogue.create("spacy", "architectures", entry_points=True)
# Will get entry point "custom_architecture" of the group "spacy_architectures"
custom_architecture = architectures.get_entry_point("custom_architecture")

method Registry.find

Find the information about a registered function, including the module and path to the file it's defined in, the line number and the docstring, if available.

Argument Type Description
name str Name of the registered function.
RETURNS Dict[str, Union[str, int]] The information about the function.
import catalogue

architectures = catalogue.create("spacy", "architectures", entry_points=True)

@architectures("my_architecture")
def my_architecture():
    """This is an architecture"""
    pass

info = architectures.find("my_architecture")
# {'module': 'your_package.architectures',
#  'file': '/path/to/your_package/architectures.py',
#  'line_no': 5,
#  'docstring': 'This is an architecture'}

function catalogue.check_exists

Check if a namespace exists.

Argument Type Description
*namespace str The namespace, e.g. "spacy" or "spacy", "architectures".
RETURNS bool Whether the namespace exists.

catalogue's People

Contributors

adrianeboyd avatar honnibal avatar ines avatar joel-odlund avatar justindujardin avatar patjouk avatar pmbaumgartner avatar rmitsch avatar sbrugman avatar svlandeg avatar tamuhey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

catalogue's Issues

Can't get functions from registry if entry points only points to the correct module and not to the specific functions

I'm getting this extremely odd error that tells me chain.v1 is not available while it is showing up as available name at the same time.

catalogue.RegistryError: Cant't find 'chain.v1' in registry horizon -> components. Available names: Pipeline.v1, _, chain.v1, fetch.v1, response_to_bytes.v1, response_to_dict.v1, response_to_text.v1

I've tried to run my test in debug mode with a break point where the error occurs. If I run Registry.get then it fails as in the test but if I run Registry.get() a second time then it works.

Reason

In Registry.get() we only look for the specific name, i.e. from_entry_point = self.get_entry_point(name) or in the global variable REGISTRY. It doesn't exist in either in the first run. But in the second run it does exist in REGISTRY since we called Registry.get_entry_points() to print the error message, since this method actually load all the modules in the entry points and thereby populates REGISTRY.

Proposed solution

Load all modules from entry points in Registry.get() if there is entry points but the requested one is found directly.

Importlib issue with python3.10

I'm seeing an issue running the tests with python3.10. The error is

python3.10-catalogue> ============================= test session starts ==============================
python3.10-catalogue> platform linux -- Python 3.10.1, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
python3.10-catalogue> rootdir: /build/catalogue-2.0.6
python3.10-catalogue> collected 8 items
python3.10-catalogue> catalogue/tests/test_catalogue.py ......F.                               [100%]
python3.10-catalogue> =================================== FAILURES ===================================
python3.10-catalogue> ______________________________ test_entry_points _______________________________
python3.10-catalogue>     def test_entry_points():
python3.10-catalogue>         # Create a new EntryPoint object by pretending we have a setup.cfg and
python3.10-catalogue>         # use one of catalogue's util functions as the advertised function
python3.10-catalogue>         ep_string = "[options.entry_points]test_foo\n    bar = catalogue:check_exists"
python3.10-catalogue> >       ep = catalogue.importlib_metadata.EntryPoint._from_text(ep_string)
python3.10-catalogue> E       AttributeError: type object 'EntryPoint' has no attribute '_from_text'
python3.10-catalogue> catalogue/tests/test_catalogue.py:108: AttributeError
python3.10-catalogue> =========================== short test summary info ============================
python3.10-catalogue> FAILED catalogue/tests/test_catalogue.py::test_entry_points - AttributeError:...

The same route works fine with python3.9. Here's a full log from the NixOS CI system: https://hydra.nixos.org/log/pfyk1v5fl14yf1n31v8ppjknqxwzgrgm-python3.10-catalogue-2.0.6.drv

1.0.1 appears to break py2.7 compatibility

Is v1.x supposed to maintain python 2.7 compatibility? It looks like the recent commit from 10 days ago (ef4fd81) creates 1.0.1 but introduces changes that require python 3.

This was discovered when trying to install an older compatible spacy package, but pip grabbed 1.0.1 which ends up producing this error:

.../site-packages/catalogue/_importlib_metadata/__init__.py", line 170 def __len__(self) -> int: ^ SyntaxError: invalid syntax

Deprecation Warning with Python 3.10.0 and Spacy 3.2.4.

How to reproduce the behaviour

Run this with Python 3.10.0 and Spacy 3.2.4.

import en_core_web_md
from warnings import filterwarnings
filterwarnings('error')

pipeline = en_core_web_md.load()

You'll get,

Traceback (most recent call last):
  File "/home/username/project/prog.py", line 5, in <module>
    pipeline = en_core_web_md.load()
  File "/home/username/.local/share/virtualenvs/project-2ZeatEXR/lib/python3.10/site-packages/en_core_web_md/__init__.py", line 10, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/home/username/.local/share/virtualenvs/project-2ZeatEXR/lib/python3.10/site-packages/spacy/util.py", line 615, in load_model_from_init_py
    return load_model_from_path(
  File "/home/username/.local/share/virtualenvs/project-2ZeatEXR/lib/python3.10/site-packages/spacy/util.py", line 488, in load_model_from_path
    nlp = load_model_from_config(config, vocab=vocab, disable=disable, exclude=exclude)
  File "/home/username/.local/share/virtualenvs/project-2ZeatEXR/lib/python3.10/site-packages/spacy/util.py", line 524, in load_model_from_config
    lang_cls = get_lang_class(nlp_config["lang"])
  File "/home/username/.local/share/virtualenvs/project-2ZeatEXR/lib/python3.10/site-packages/spacy/util.py", line 325, in get_lang_class
    if lang in registry.languages:
  File "/home/username/.local/share/virtualenvs/project-2ZeatEXR/lib/python3.10/site-packages/catalogue/__init__.py", line 49, in __contains__
    has_entry_point = self.entry_points and self.get_entry_point(name)
  File "/home/username/.local/share/virtualenvs/project-2ZeatEXR/lib/python3.10/site-packages/catalogue/__init__.py", line 135, in get_entry_point
    for entry_point in AVAILABLE_ENTRY_POINTS.get(self.entry_point_namespace, []):
  File "/usr/lib/python3.10/importlib/metadata/__init__.py", line 400, in get
    self._warn()
DeprecationWarning: SelectableGroups dict interface is deprecated. Use select.

Environment

  • spaCy version: 3.2.4
  • Platform: Linux-5.13.0-39-generic-x86_64-with-glibc2.34
  • Python version: 3.10.0
  • Pipelines: en_core_web_md (3.2.0)
  • Operating System: Ubuntu 21.10
  • Python Version Used: 3.10.0
  • spaCy Version Used: 3.2.4

RuntimeError: dictionary changed size during iteration with spacy.load()

I am loading a spaCy model as part of a step in my Dataflow streaming pipeline. To load the pre-downloaded spaCy model for a specific language I am using nlp_model = spacy.load(SPACY_KEYS[lang]) where SPACY_KEYS is a dictionary containing the names of the models for each language (e.g. 'en': 'en_core_web_sm').

This works without any issues for the majority of the jobs run by the pipeline, but for a few iterations I am getting the following error, which seems to be coming from catalogue:

Error message from worker: generic::unknown: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1232, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 752, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 870, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "apache_beam/runners/common.py", line 1368, in apache_beam.runners.common._OutputProcessor.process_outputs
  File "/usr/local/lib/python3.7/site-packages/submodules/entities_and_pii_removal.py", line 259, in entities_and_PII
    nlp_model = spacy.load(SPACY_KEYS[lang])  # load spacy model
  File "/usr/local/lib/python3.7/site-packages/spacy/__init__.py", line 52, in load
    name, vocab=vocab, disable=disable, exclude=exclude, config=config
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 420, in load_model
    return load_model_from_package(name, **kwargs)  # type: ignore[arg-type]
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 453, in load_model_from_package
    return cls.load(vocab=vocab, disable=disable, exclude=exclude, config=config)  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.7/site-packages/de_core_news_sm/__init__.py", line 10, in load
    return load_model_from_init_py(__file__, **overrides)
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 621, in load_model_from_init_py
    config=config,
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 489, in load_model_from_path
    return nlp.from_disk(model_path, exclude=exclude, overrides=overrides)
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 2042, in from_disk
    util.from_disk(path, deserializers, exclude)  # type: ignore[arg-type]
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 1299, in from_disk
    reader(path / key)
  File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 2037, in <lambda>
    p, exclude=["vocab"]
  File "spacy/pipeline/trainable_pipe.pyx", line 343, in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk
  File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 1299, in from_disk
    reader(path / key)
  File "spacy/pipeline/trainable_pipe.pyx", line 333, in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk.load_model
  File "spacy/pipeline/trainable_pipe.pyx", line 334, in spacy.pipeline.trainable_pipe.TrainablePipe.from_disk.load_model
  File "/usr/local/lib/python3.7/site-packages/thinc/model.py", line 593, in from_bytes
    return self.from_dict(msg)
  File "/usr/local/lib/python3.7/site-packages/thinc/model.py", line 624, in from_dict
    loaded_value = deserialize_attr(default_value, value, attr, node)
  File "/usr/local/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/usr/local/lib/python3.7/site-packages/thinc/model.py", line 804, in deserialize_attr
    return srsly.msgpack_loads(value)
  File "/usr/local/lib/python3.7/site-packages/srsly/_msgpack_api.py", line 27, in msgpack_loads
    msg = msgpack.loads(data, raw=False, use_list=use_list)
  File "/usr/local/lib/python3.7/site-packages/srsly/msgpack/__init__.py", line 76, in unpackb
    for decoder in msgpack_decoders.get_all().values():
  File "/usr/local/lib/python3.7/site-packages/catalogue/__init__.py", line 110, in get_all
    for keys, value in REGISTRY.items():
RuntimeError: dictionary changed size during iteration

import 'importlib.metadata' try-catch doesn't work on Jupyter?

There's a stackoverflow post that seems to point at a problem with catalogue:

File "C:\Users\user1\AppData\Local\Continuum\anaconda3\envs\py37\lib\site-packages\catalogue.py", line 8, in
import importlib.metadata as importlib_metadata
ModuleNotFoundError: No module named 'importlib.metadata'

Referring to https://github.com/explosion/catalogue/blob/master/catalogue.py#L8, which makes no sense to me as ModuleNotFoundError is a subtype of ImportError and that is properly caught as an exception.

There's a lot of other things going on in those error logs, too, but this caught my attention and wanted to log this here for future reference.

suggestion: dry registrations for us lazy registrars

Thanks for building this.

Would it be easy to allow the __name__ of the callable being passed in as the default, but allowing a name kwarg to overwrite it? I started going down a decorator rabbit-hole on stack overflow to try and pitch in a solution, but got lost in decorator hell.

Using loaders = catalogue.create("mypackage", "loaders") as our shared example, here's what things look like at the moment:

#passing the name in explicitly
@loaders.register(name='custom_func')
def custom_func(data):
    pass

vs.

#letting the func name itself
@loaders.register
def custom_func(data):
    pass

vs.

#a cool shorthand version
@loaders
def custom_func(data):
    pass

This was my attempt, but it doesn't accept passing in the name kwarg.

class Registry:
    callables = {}
    
    def __call__(self, func, name=None):
        if not name: name = func.__name__
        self.callables[name] = func 
    
    def __contains__(self, func):
        return func in self.callables
        
    def __repr__(self):
        return f"{self.callables}"
    
loaders = Registry()

#this works fine
@loaders
def custom_func(data):
    pass

#this not so much
@loaders(name='blah')
def custom_func(data):
    pass

What do you think?

🐛 Pinning importlib-metadata from above creates incompatibility with twine

Dear explosion,

In your release v2.0.2 you have introduced a max version constraint for importlib-metadata which needs to be <3.3.0.

importlib_metadata>=0.20,<3.3.0; python_version < "3.8"

This makes catalogue incompatible with the twine (which is a quite useful package to test the packaging + upload to PyPI), since twine has been requiring importlib-metadata >= 3.6 since their release v3.4.0.

Was there any specific reason for pinning importlib-metadata to <3.3.0?
Can you please suggest some solution to allow compatibility with twine?

Thank you in advance!

Question: In spaCy you use catalogue to automatically register cli commands, how exactly you are doing that?

in spacy.cli.util.py you did this:

app = typer.Typer(name=NAME, help=HELP)

def setup_cli() -> None:
    # Make sure the entry-point for CLI runs, so that they get imported.
    registry.cli.get_all()
    # Ensure that the help messages always display the correct prompt
    command = get_command(app)
    command(prog_name=COMMAND)

Where the registry.cli is a catalogue Registry. But I can't get where you actually register commands to this. because they are in different python files and won't be registered normally. But somehow it is working. can you please give me some explanation?

‌PS: When I use the same structure commands in other files doesn't get imported.

Catalogue v2.1.0 depends on dependencies not declared

Hello,

I just noticed that the catalogue version v2.1.0 (Most recent version on PyPI currently) depends on dependencies not declared in the setup related files.
This is probably due to the temporary inclusion of the config system, which now lives in confection (#33).

The traceback when importing catalogue is the following:

Traceback (most recent call last):
  File "...", line 8, in <module>
    from functions import FUNCTIONS
  File "./functions.py", line 1, in <module>
    import catalogue
  File ".../lib/python3.10/site-packages/catalogue/__init__.py", line 2, in <module>
    from catalogue.config import *
  File ".../lib/python3.10/site-packages/catalogue/config/__init__.py", line 1, in <module>
    from .config import *
  File ".../lib/python3.10/site-packages/catalogue/config/config.py", line 10, in <module>
    from pydantic import BaseModel, create_model, ValidationError, Extra
ModuleNotFoundError: No module named 'pydantic'

When installing pydantic to fix the problem, the traceback states it cannot find srsly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.