Coder Social home page Coder Social logo

padre-lab-eu / pypads Goto Github PK

View Code? Open in Web Editor NEW
14.0 4.0 4.0 6.7 MB

Building on the MLFlow toolset this project aims to extend the functionality for MLFlow, increase the automation and therefore reduce the workload for the user. The production of structured results is an additional goal of the extension.

Home Page: https://pypads.readthedocs.io/en/latest/

License: GNU General Public License v3.0

Python 100.00%

pypads's Introduction

PyPads

This project aims to ease automated tracking and logging of meta information about experiments without enforcing any conceptual paradigm to be followed by the machine learning experiment and its data.

Documentation Status PyPI version
pipeline status

Intalling

This tool requires those libraries to work:

Python (>= 3.6),
cloudpickle (>= 1.3.3),
mlflow (>= 1.6.0),
boltons (>= 19.3.0),
loguru (>=0.4.1)

PyPads only support python 3.6 and higher. To install pypads run this in you terminal

Using source code

First, you have to install poetry

pip install poetry
poetry build (in the root folder of the repository pypads/)

This would create two files under pypads/dist that can be used to install,

pip install dist/pypads-X.X.X.tar.gz
OR
pip install dist/pypads-X.X.X-py3-none-any.whl

Using pip (PyPi release)

The package can be found on PyPi in following project.

pip install pypads

Tests

The unit tests can be found under 'test/' and can be executed using

poetry run pytest test/

Documentation

For more information, look into the official documentation of PyPads.

Getting Started

Usage example

pypads is easy to use. Just define what is needed to be tracked in the config and call PyPads.

A simple example looks like the following,

from pypads.app.base import PyPads
# define the configuration, in this case we want to track the parameters, 
# outputs and the inputs of each called function included in the hooks (pypads_fit, pypads_predict)
hook_mappings = {
    "parameters": {"on": ["pypads_fit"]},
    "output": {"on": ["pypads_fit", "pypads_predict"]},
    "input": {"on": ["pypads_fit"]}
}
# A simple initialization of the class will activate the tracking
PyPads(hooks=hook_mappings)

# An example
from sklearn import datasets, metrics
from sklearn.tree import DecisionTreeClassifier

# load the iris datasets
dataset = datasets.load_iris()

# fit a model to the data
model = DecisionTreeClassifier()
model.fit(dataset.data, dataset.target) # pypads will track the parameters, output, and input of the model fit function.
# get the predictions
predicted = model.predict(dataset.data) # pypads will track only the output of the model predict function.

The used hooks for each event are defined in the mapping file where each hook represents the functions to listen to. Users can use regex for goruping functions and even provide paths to hook functions. In the sklearn mapping YAML file, an example entry would be:

fragments:
  default_model:
    !!python/pPath __init__:
      hooks: "pypads_init"
    !!python/rSeg (fit|.fit_predict|fit_transform)$:
      hooks: "pypads_fit"
    !!python/rSeg (fit_predict|predict|score)$:
      hooks: "pypads_predict"
    !!python/rSeg (fit_transform|transform)$:
      hooks: "pypads_transform"

mappings:
  !!python/pPath sklearn:
    !!python/pPath base.BaseEstimator:
      ;default_model: ~

For instance, "pypads_fit" is an event listener on any fit, fit_predict and fit_transform call made by the tracked model class which is in this case BaseEstimator that most estimators inherits from.

Using no custom yaml types and no fragments the mapping file would be equal to following definition:

mappings:
  :sklearn:
    :base.BaseEstimator:
        :__init__:
          hooks: "pypads_init"
        :{re:(fit|.fit_predict|fit_transform)$}:
          hooks: "pypads_fit"
        :{re:(fit_predict|predict|score)$}:
          hooks: "pypads_predict"
        :{re:(fit_transform|transform)$}:
          hooks: "pypads_transform"

Acknowledgement

This work has been partially funded by the Bavarian Ministry of Economic Affairs, Regional Development and Energy by means of the funding programm "Internetkompetenzzentrum Ostbayern" as well as by the German Federal Ministry of Education and Research in the project "Provenance Analytics" with grant agreement number 03PSIPT5C.

pypads's People

Contributors

mehdibenamorr avatar spshashankgit avatar weissger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

pypads's Issues

Using PyPads in Docker

When trying to use PyPads (built from Source) inside a Docker Container you get following Error:

2020-09-18 12:45:04.696 | INFO     | pypads.injections.setup.misc_setup:_call:70 - Tracking execution to run with id 6370a226e32f4da8853c486cd9c85594
2020-09-18 12:45:05.522 | WARNING  | pypads.app.base:start_track:612 - Active run doesn't match given input name Default-PyPads. Recreating new run.
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/pypads/app/base.py", line 614, in start_track
    self.api.start_run(experiment_id=experiment_name, nested=True)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 61, in wrapper
    return Cmd(fn=f)(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 34, in __call__
    return self.__real_call__(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/misc/mixins.py", line 255, in __real_call__
    return self._fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 149, in start_run
    out = mlflow.start_run(run_id=run_id, experiment_id=experiment_id, run_name=run_name, nested=nested)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/fluent.py", line 159, in start_run
    active_run_obj = MlflowClient().create_run(experiment_id=exp_id_for_run, tags=tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/client.py", line 137, in create_run
    return self._tracking_client.create_run(experiment_id, start_time, tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 95, in create_run
    tags=[RunTag(key, value) for (key, value) in iteritems(tags)],
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 457, in create_run
    experiment = self.get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 338, in get_experiment
    experiment = self._get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 309, in _get_experiment
    databricks_pb2.RESOURCE_DOES_NOT_EXIST,
mlflow.exceptions.MlflowException: Could not find experiment with ID Default-PyPads

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "sklearn_example.py", line 2, in <module>
    tracker = PyPads(uri="mlruns", autostart=True)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/base.py", line 195, in __init__
    self.start_track()
  File "/usr/local/lib/python3.7/site-packages/pypads/app/base.py", line 617, in start_track
    self.api.start_run(experiment_id=experiment_name)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 61, in wrapper
    return Cmd(fn=f)(self, *args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 34, in __call__
    return self.__real_call__(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/misc/mixins.py", line 255, in __real_call__
    return self._fn(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/pypads/app/api.py", line 149, in start_run
    out = mlflow.start_run(run_id=run_id, experiment_id=experiment_id, run_name=run_name, nested=nested)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/fluent.py", line 159, in start_run
    active_run_obj = MlflowClient().create_run(experiment_id=exp_id_for_run, tags=tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/client.py", line 137, in create_run
    return self._tracking_client.create_run(experiment_id, start_time, tags)
  File "/usr/local/lib/python3.7/site-packages/mlflow/tracking/_tracking_service/client.py", line 95, in create_run
    tags=[RunTag(key, value) for (key, value) in iteritems(tags)],
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 457, in create_run
    experiment = self.get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 338, in get_experiment
    experiment = self._get_experiment(experiment_id)
  File "/usr/local/lib/python3.7/site-packages/mlflow/store/tracking/file_store.py", line 309, in _get_experiment
    databricks_pb2.RESOURCE_DOES_NOT_EXIST,
mlflow.exceptions.MlflowException: Could not find experiment with ID Default-PyPads

For reference, here is the Dockerfile:

FROM python:3.7
COPY pypads .
RUN pip install GitPython==3.1.8
RUN pip install ipython==7.18.1
RUN pip install Pygments==2.6.1
RUN pip install scikit-learn==0.21.3
RUN pip install psutil==5.7.0
RUN pip install poetry
RUN poetry build
RUN pip install dist/pypads-0.3.2.tar.gz

ModuleNotFoundError: No module named '_py_abc'

I have installed pypads v0.5.7 on python v3.6.4.
When i run from pypads.app.base import PyPads on jupyter notebook it shows ModuleNotFoundError.

~\AppData\Roaming\Python\Python36\site-packages\pypads\importext\wrapping\base_wrapper.py in
1 import inspect
----> 2 from _py_abc import ABCMeta
3 from abc import abstractmethod
4 from copy import copy
5 from types import ModuleType
ModuleNotFoundError: No module named '_py_abc'

How can i resolve this?

Support for sacred

We could support sacred experiments by duck punching its decorators with our own versions, which track with pypads instead of sacred or at least define similar decorators to use instead of the sacred decorators.

Windows Path ModelRegistryStoreURI

In the current version there seems to be a problem with the Windows artifact storage path:

KeyError                                  Traceback (most recent call last)
C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\registry.py in get_store_builder(self, store_uri)
     74         try:
---> 75             store_builder = self._registry[scheme]
     76         except KeyError:

KeyError: 'c'

During handling of the above exception, another exception occurred:

UnsupportedModelRegistryStoreURIException Traceback (most recent call last)
<ipython-input-1-56fef9159740> in <module>
      1 from pypads.app.base import PyPads
----> 2 PyPads(autostart=True)
      3 
      4 # An example
      5 

C:\Users\USER\anaconda3\lib\site-packages\pypads\app\base.py in __init__(self, uri, folder, mappings, hooks, events, setup_fns, config, pre_initialized_cache, disable_plugins, autostart)
    187                 self.start_track(autostart)
    188             else:
--> 189                 self.start_track()
    190 
    191     @staticmethod

C:\Users\USER\anaconda3\lib\site-packages\pypads\app\base.py in start_track(self, experiment_name, disable_run_init)
    590             experiment_name = experiment_name or DEFAULT_EXPERIMENT_NAME
    591             # Create run if run doesn't already exist
--> 592             experiment = mlflow.get_experiment_by_name(experiment_name)
    593             experiment_id = experiment.experiment_id if experiment else mlflow.create_experiment(experiment_name)
    594             run = self.api.start_run(experiment_id=experiment_id)

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\fluent.py in get_experiment_by_name(name)
    343     :return: :py:class:`mlflow.entities.Experiment`
    344     """
--> 345     return MlflowClient().get_experiment_by_name(name)
    346 
    347 

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\client.py in __init__(self, tracking_uri, registry_uri)
     39         final_tracking_uri = tracking_uri or utils.get_tracking_uri()
     40         self._registry_uri = registry_uri or final_tracking_uri
---> 41         self._tracking_client = TrackingServiceClient(final_tracking_uri)
     42         # `MlflowClient` also references a `ModelRegistryClient` instance that is provided by the
     43         # `MlflowClient._get_registry_client()` method. This `ModelRegistryClient` is not explicitly

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\_tracking_service\client.py in __init__(self, tracking_uri)
     30         """
     31         self.tracking_uri = tracking_uri
---> 32         self.store = utils._get_store(self.tracking_uri)
     33 
     34     def get_run(self, run_id):

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\_tracking_service\utils.py in _get_store(store_uri, artifact_uri)
    122 
    123 def _get_store(store_uri=None, artifact_uri=None):
--> 124     return _tracking_store_registry.get_store(store_uri, artifact_uri)
    125 
    126 

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\_tracking_service\registry.py in get_store(self, store_uri, artifact_uri)
     34         from mlflow.tracking._tracking_service import utils
     35         store_uri = store_uri if store_uri is not None else utils.get_tracking_uri()
---> 36         builder = self.get_store_builder(store_uri)
     37         return builder(store_uri=store_uri, artifact_uri=artifact_uri)

C:\Users\USER\anaconda3\lib\site-packages\mlflow\tracking\registry.py in get_store_builder(self, store_uri)
     77             raise UnsupportedModelRegistryStoreURIException(
     78                 unsupported_uri=store_uri,
---> 79                 supported_uri_schemes=list(self._registry.keys()))
     80         return store_builder

UnsupportedModelRegistryStoreURIException:  Model registry functionality is unavailable; got unsupported URI 'C:\cocoa-5.2\emacs\.pypads\.mlruns' for model registry data storage. Supported URI schemes are: ['', 'file', 'databricks', 'http', 'https', 'postgresql', 'mysql', 'sqlite', 'mssql']. See https://www.mlflow.org/docs/latest/tracking.html#storage for how to run an MLflow server against one of the supported backend storage locations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.