padre-lab-eu / pypads Goto Github PK

Building on the MLFlow toolset this project aims to extend the functionality for MLFlow, increase the automation and therefore reduce the workload for the user. The production of structured results is an additional goal of the extension.

Home Page: https://pypads.readthedocs.io/en/latest/

License: GNU General Public License v3.0

Python 100.00%

pypads's Introduction

PyPads

This project aims to ease automated tracking and logging of meta information about experiments without enforcing any conceptual paradigm to be followed by the machine learning experiment and its data.

Intalling

This tool requires those libraries to work:

Python (>= 3.6),
cloudpickle (>= 1.3.3),
mlflow (>= 1.6.0),
boltons (>= 19.3.0),
loguru (>=0.4.1)

PyPads only support python 3.6 and higher. To install pypads run this in you terminal

Using source code

First, you have to install poetry

pip install poetry
poetry build (in the root folder of the repository pypads/)

This would create two files under pypads/dist that can be used to install,

pip install dist/pypads-X.X.X.tar.gz
OR
pip install dist/pypads-X.X.X-py3-none-any.whl

Using pip (PyPi release)

The package can be found on PyPi in following project.

pip install pypads

Tests

The unit tests can be found under 'test/' and can be executed using

poetry run pytest test/

Documentation

For more information, look into the official documentation of PyPads.

Getting Started

Usage example

pypads is easy to use. Just define what is needed to be tracked in the config and call PyPads.

A simple example looks like the following,

from pypads.app.base import PyPads
# define the configuration, in this case we want to track the parameters, 
# outputs and the inputs of each called function included in the hooks (pypads_fit, pypads_predict)
hook_mappings = {
    "parameters": {"on": ["pypads_fit"]},
    "output": {"on": ["pypads_fit", "pypads_predict"]},
    "input": {"on": ["pypads_fit"]}
}
# A simple initialization of the class will activate the tracking
PyPads(hooks=hook_mappings)

# An example
from sklearn import datasets, metrics
from sklearn.tree import DecisionTreeClassifier

# load the iris datasets
dataset = datasets.load_iris()

# fit a model to the data
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target) # pypads will track the parameters, output, and input of the model fit function.
# get the predictions
predicted = model.predict(dataset.data) # pypads will track only the output of the model predict function.

The used hooks for each event are defined in the mapping file where each hook represents the functions to listen to. Users can use regex for goruping functions and even provide paths to hook functions. In the sklearn mapping YAML file, an example entry would be:

fragments:
  default_model:
    !!python/pPath __init__:
      hooks: "pypads_init"
    !!python/rSeg (fit|.fit_predict|fit_transform)$:
      hooks: "pypads_fit"
    !!python/rSeg (fit_predict|predict|score)$:
      hooks: "pypads_predict"
    !!python/rSeg (fit_transform|transform)$:
      hooks: "pypads_transform"

mappings:
  !!python/pPath sklearn:
    !!python/pPath base.BaseEstimator:
      ;default_model: ~

For instance, "pypads_fit" is an event listener on any fit, fit_predict and fit_transform call made by the tracked model class which is in this case BaseEstimator that most estimators inherits from.

Using no custom yaml types and no fragments the mapping file would be equal to following definition:

mappings:
  :sklearn:
    :base.BaseEstimator:
        :__init__:
          hooks: "pypads_init"
        :{re:(fit|.fit_predict|fit_transform)$}:
          hooks: "pypads_fit"
        :{re:(fit_predict|predict|score)$}:
          hooks: "pypads_predict"
        :{re:(fit_transform|transform)$}:
          hooks: "pypads_transform"

Acknowledgement

This work has been partially funded by the Bavarian Ministry of Economic Affairs, Regional Development and Energy by means of the funding programm "Internetkompetenzzentrum Ostbayern" as well as by the German Federal Ministry of Education and Research in the project "Provenance Analytics" with grant agreement number 03PSIPT5C.

pypads's People

Contributors

Stargazers

Watchers

Forkers

julianstier spshashankgit zitryss rohrmose

pypads's Issues

Write output to OpenML

Support to log information into OpenML

Using PyPads in Docker

When trying to use PyPads (built from Source) inside a Docker Container you get following Error:

2020-09-18 12:45:04.696 | INFO     | pypads.injections.setup.misc_setup:_call:70 - Tracking execution to run with id 6370a226e32f4da8853c486cd9c85594
2020-09-18 12:45:05.522 | WARNING  | pypads.app.base:start_track:612 - Active run doesn't match given input name Default-PyPads. Recreating new run.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pypads/app/base.py", line 614, in start_track
    self.api.start_run(experiment_id=experiment_name, nested=True)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 61, in wrapper
    return Cmd(fn=f)(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 34, in __call__
    return self.__real_call__(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/misc/mixins.py", line 255, in __real_call__
    return self._fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 149, in start_run
    out = mlflow.start_run(run_id=run_id, experiment_id=experiment_id, run_name=run_name, nested=nested)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/fluent.py", line 159, in start_run
    active_run_obj = MlflowClient().create_run(experiment_id=exp_id_for_run, tags=tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/client.py", line 137, in create_run
    return self._tracking_client.create_run(experiment_id, start_time, tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 95, in create_run
    tags=[RunTag(key, value) for (key, value) in iteritems(tags)],
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 457, in create_run
    experiment = self.get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 338, in get_experiment
    experiment = self._get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 309, in _get_experiment
    databricks_pb2.RESOURCE_DOES_NOT_EXIST,
mlflow.exceptions.MlflowException: Could not find experiment with ID Default-PyPads

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "sklearn_example.py", line 2, in <module>
    tracker = PyPads(uri="mlruns", autostart=True)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/base.py", line 195, in __init__
    self.start_track()
  File "/usr/local/lib/python3.7/site-packages/pypads/app/base.py", line 617, in start_track
    self.api.start_run(experiment_id=experiment_name)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 61, in wrapper
    return Cmd(fn=f)(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 34, in __call__
    return self.__real_call__(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/misc/mixins.py", line 255, in __real_call__
    return self._fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 149, in start_run
    out = mlflow.start_run(run_id=run_id, experiment_id=experiment_id, run_name=run_name, nested=nested)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/fluent.py", line 159, in start_run
    active_run_obj = MlflowClient().create_run(experiment_id=exp_id_for_run, tags=tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/client.py", line 137, in create_run
    return self._tracking_client.create_run(experiment_id, start_time, tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 95, in create_run
    tags=[RunTag(key, value) for (key, value) in iteritems(tags)],
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 457, in create_run
    experiment = self.get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 338, in get_experiment
    experiment = self._get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 309, in _get_experiment
    databricks_pb2.RESOURCE_DOES_NOT_EXIST,
mlflow.exceptions.MlflowException: Could not find experiment with ID Default-PyPads

For reference, here is the Dockerfile:

FROM python:3.7
COPY pypads .
RUN pip install GitPython==3.1.8
RUN pip install ipython==7.18.1
RUN pip install Pygments==2.6.1
RUN pip install scikit-learn==0.21.3
RUN pip install psutil==5.7.0
RUN pip install poetry
RUN poetry build
RUN pip install dist/pypads-0.3.2.tar.gz

ModuleNotFoundError: No module named '_py_abc'

I have installed pypads v0.5.7 on python v3.6.4.
When i run from pypads.app.base import PyPads on jupyter notebook it shows ModuleNotFoundError.

~\AppData\Roaming\Python\Python36\site-packages\pypads\importext\wrapping\base_wrapper.py in
1 import inspect
----> 2 from _py_abc import ABCMeta
3 from abc import abstractmethod
4 from copy import copy
5 from types import ModuleType
ModuleNotFoundError: No module named '_py_abc'

How can i resolve this?

KeyError                                  Traceback (most recent call last)
C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\registry.py in get_store_builder(self, store_uri)
     74         try:
---> 75             store_builder = self._registry[scheme]
     76         except KeyError:

KeyError: 'c'

During handling of the above exception, another exception occurred:

UnsupportedModelRegistryStoreURIException Traceback (most recent call last)
<ipython-input-1-56fef9159740> in <module>
      1 from pypads.app.base import PyPads
----> 2 PyPads(autostart=True)
      3 
      4 # An example
      5 

C:\Users\USER\anaconda3\lib\site-packages\pypads\app\base.py in __init__(self, uri, folder, mappings, hooks, events, setup_fns, config, pre_initialized_cache, disable_plugins, autostart)
    187                 self.start_track(autostart)
    188             else:
--> 189                 self.start_track()
    190 
    191     @staticmethod

C:\Users\USER\anaconda3\lib\site-packages\pypads\app\base.py in start_track(self, experiment_name, disable_run_init)
    590             experiment_name = experiment_name or DEFAULT_EXPERIMENT_NAME
    591             # Create run if run doesn't already exist
--> 592             experiment = mlflow.get_experiment_by_name(experiment_name)
    593             experiment_id = experiment.experiment_id if experiment else mlflow.create_experiment(experiment_name)
    594             run = self.api.start_run(experiment_id=experiment_id)

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\fluent.py in get_experiment_by_name(name)
    343     :return: :py:class:`mlflow.entities.Experiment`
    344     """
--> 345     return MlflowClient().get_experiment_by_name(name)
    346 
    347 

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\client.py in __init__(self, tracking_uri, registry_uri)
     39         final_tracking_uri = tracking_uri or utils.get_tracking_uri()
     40         self._registry_uri = registry_uri or final_tracking_uri
---> 41         self._tracking_client = TrackingServiceClient(final_tracking_uri)
     42         # `MlflowClient` also references a `ModelRegistryClient` instance that is provided by the
     43         # `MlflowClient._get_registry_client()` method. This `ModelRegistryClient` is not explicitly

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\_tracking_service\client.py in __init__(self, tracking_uri)
     30         """
     31         self.tracking_uri = tracking_uri
---> 32         self.store = utils._get_store(self.tracking_uri)
     33 
     34     def get_run(self, run_id):

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\_tracking_service\utils.py in _get_store(store_uri, artifact_uri)
    122 
    123 def _get_store(store_uri=None, artifact_uri=None):
--> 124     return _tracking_store_registry.get_store(store_uri, artifact_uri)
    125 
    126 

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\_tracking_service\registry.py in get_store(self, store_uri, artifact_uri)
     34         from mlflow.tracking._tracking_service import utils
     35         store_uri = store_uri if store_uri is not None else utils.get_tracking_uri()
---> 36         builder = self.get_store_builder(store_uri)
     37         return builder(store_uri=store_uri, artifact_uri=artifact_uri)

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\registry.py in get_store_builder(self, store_uri)
     77             raise UnsupportedModelRegistryStoreURIException(
     78                 unsupported_uri=store_uri,
---> 79                 supported_uri_schemes=list(self._registry.keys()))
     80         return store_builder

UnsupportedModelRegistryStoreURIException:  Model registry functionality is unavailable; got unsupported URI 'C:\cocoa-5.2\emacs\.pypads\.mlruns' for model registry data storage. Supported URI schemes are: ['', 'file', 'databricks', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql']. See https://www.mlflow.org/docs/latest/tracking.html#storage for how to run an MLflow server against one of the supported backend storage locations.