Coder Social home page Coder Social logo

mlf-core / mlf-core Goto Github PK

View Code? Open in Web Editor NEW
46.0 2.0 3.0 25.51 MB

CPU and GPU deterministic and therefore fully reproducible machine learning pipelines using MLflow.

License: Apache License 2.0

Dockerfile 0.82% Makefile 3.20% Python 95.86% Shell 0.12%
mlflow template machinelearning artificialintelligence reproducible deterministic gpu hacktoberfest

mlf-core's Introduction

mlf-core logo

mlf-core

PyPI Python Version License Read the documentation at https://mlf-core.readthedocs.io/ Build Package Status Run Tests Status Codecov pre-commit Black

Pepy Downloads

Discord

Preprint

mlf-core: a framework for deterministic machine learning

Overview

mlf-core provides CPU and GPU deterministic machine learning templates based on MLflow, Conda, Docker and a strong Github integration. Templates are available for PyTorch, TensorFlow and XGBoost. A custom linter ensures that projects stay deterministic in all phases of development and deployment.

mlf-core provides CPU and GPU deterministic machine learning templates based on MLflow, Conda, Docker and a strong Github integration. Templates are available for PyTorch, TensorFlow and XGBoost. A custom linter ensures that projects stay deterministic in all phases of development and deployment.

Installing

Start your journey with mlf-core by installing it via $ pip install mlf-core.

See Installation.

run

See a mlf-core project in action.

config

Configure mlf-core to get started.

See Configuring mlf-core

list

List all available mlf-core templates.

See Listing all templates.

info

Get detailed information on a mlf-core template.

See Get detailed template information.

create

Kickstart your deterministic machine laerning project with one of mlf-core's templates in no time.

See Create a project.

lint

Use advanced linting to ensure your project always adheres to mlf-core's standards and stays deterministic.

image

See Linting your project

bump-version

Bump your project version across several files.

See Bumping the version of an existing project.

sync

Sync your project with the latest mlf-core release to get the latest template features.

See Syncing a project.

upgrade

Check whether you are using the latest mlf-core version and update automatically to benefit from the latest features.

See https://mlf_core.readthedocs.io/en/latest/upgrade.html.

Credits

Primary idea and main development by Lukas Heumos. mlf-core is inspired by nf-core. This package was created with cookietemple based on a modified audreyr/cookiecutter-pypackage project template using cookiecutter.

mlf-core's People

Contributors

dependabot-preview[bot] avatar dependabot[bot] avatar edmundmiller avatar imipenem avatar kevinmenden avatar zethson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

mlf-core's Issues

Ask again if users decline Github support

Since we set up quite a lot without GH support including the repo secrets etc we should ensure that users always use the GH support except if they really really don t want to or can't (gitlab/bitbucket users).

I would suggest to ask again like

"Do you really want to skip this step? Creating a Github repository automatically also sets up automatic template syncing."

when users decline the automatic GH repository creation. It should be a yellow warning.

Verify that all mlf-core features work with default "main" branches.

Github recently changed the default branch of newly created repositories to "main". git itself went for similar measures.
I myself reverted to master, because I am just so used to it with muscle memory.
Since mlf-core automates a lot with specific branch names we must verify that everything still works.
I verified the following questions for master, but not yet main.

Questions to answer:

  1. Does the automatic repository creation still work?
  2. Does automatic syncing (can also be triggered manually) still work? Can be tested by deliberately decreasing the template version in the .mlf_core.yml file
  3. Is the documentation still pushed to the gh-pages branch when pushed to the main branch? I am pretty sure that this one will break, but maybe there are some synonyms going on behind the scenes?

If anything here breaks, please report and we can fix it.

Add information on how to create releases

Imagine your status is on the development branch
Create a release/0.1.0 branch
Merge into master
Use mlf-core bump-version to switch the version from a SNAPSHOT version to a release version
mlf-core bump-version 0.1.0 .
Create a release on Github
New Docker container is published with the latest version
Switch back to development
Bump the version again to a SNAPSHOT version
mlf-core bump-version 0.2.0-SNAPSHOT .
The changelog will automatically add sections

etc blabla

mlflow-package should have ready examples for predictions for pytorch, tensorflow (both mnist) and xgboost (covertype)

Currently the mlflow-packages templates for pytorch, tensorflow and xgboost do not really have out of the box working predictions.
Only xgboost works, but it is not using the covertype example and therefore does not match with the general mlflow-xgboost model.

mlflow-package for pytorch should load a trained mnist model and perform a prediction. Maybe even plot one thing.
Same for Tensorflow (mnist) and XGBoost with covertype.

linting errors when strings, which are not SNAPSHOT in version

Describe the bug
Linting errors when creating project

To Reproduce

? Choose the project's domain:   mlflow
? Choose the project's primary framework:   pytorch
? Project name [Exploding Springfield]:  mlfcore test
Looking up mlfcore test at readthedocs.io!
? Short description of your project [mlfcore test. A mlf-core based .]:  This is a test for the mlfcore project.
? Initial version of your project [0.1.0-SNAPSHOT]:  0.1.0dev
? License:   MIT
? Do you want to create a Github repository and push your template to it? [Yes]:   Yes
? Do you want to create an organization repository? [No]:   No
? Do you want your repository to be private? [No]:   No
Fixing too short underlines of *.rst file (usually index.rst)
Running general linting
Running lint checks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8 of 8 check_no_cookiecutter_strings
Running mlflow-pytorch linting
Running lint checks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3 of 3 pytorch_reproducibility_seeds

─────────────────────────────────────────────────────────────────────────────────────────────────────────────── LINT RESULTS ───────────────────────────────────────────────────────────────────────────────────────────────────────────────

     [[✔]]    5 tests passed
     [[!]]    8 tests had warnings
     [[✗]]    7 tests failed

──────────────────────────────────────────────────────────────────────────────────────────────────────────── [[✔]] Tests Passed ────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-1 : All required general files were found!                                                                                                                                 │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-2 : Dockerfile check passed                                                                                                                                                │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Passed conda environment checks.                                                                                                                                       │
│ 4 https://mlf-core.readthedocs.io/en/latest/lint.html#mlflow-pytorch-1 : All required mlflow-pytorch specific files were found!                                                                                                          │
│ 5 https://mlf-core.readthedocs.io/en/latest/lint.html#mlflow-pytorch-2 : All required reproducibility settings enabled.                                                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

─────────────────────────────────────────────────────────────────────────────────────────────────────────── [[!]] Test Warnings ────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Version 10.1 of cudatoolkit is not the latest available: 11.0.221                                                                                                      │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Version 3.7 of python is not the latest available: 3.8.5                                                                                                               │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Version 6.0.0 of rich is not the latest available: 6.1.1                                                                                                               │
│ 4 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in README.rst: * Write features here                                                                                                                 │
│ 5 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in model.rst: Write your model documentation here.                                                                                                   │
│ 6 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in usage.rst: Write your usage and parameter documentation here.                                                                                     │
│ 7 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in publish_docker.yml: Adapt the username, password and registry if you do not want to publish to Github Packages.                                   │
│ 8 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in publish_docker.yml: Note that for organization repositories the Github Container Registry needs to be enabled                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

─────────────────────────────────────────────────────────────────────────────────────────────────────────── [[✗]] Test Failures ────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in .mlf_core.yml: version: 0.1.0dev should be version: 0.1.0devdev                                                                          │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in docs/conf.py: version = '0.1.0dev' should be version = '0.1.0devdev'                                                                     │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in docs/conf.py: release = '0.1.0dev' should be release = '0.1.0devdev'                                                                     │
│ 4 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in MLproject:                                                                                                                               │
│                                                                                                                                                                                                                                          │
│╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗│
│║                                                                                           Version: 0.1.0dev should be # Version: 0.1.0devdev                                                                                           ║│
│╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝│
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in MLproject: image: ghcr.io/ggabernet/mlfcore_test:0.1.0dev should be image: ghcr.io/ggabernet/mlfcore_test:0.1.0devdev                    │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in .github/workflows/publish_docker.yml: tags: "latest,0.1.0dev" should be tags: "latest,0.1.0devdev"                                       │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-6 : No changelog sections detected!                                                                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 7 tests failed! Exiting with non-zero error code.

Expected behavior

System [please complete the following information]:

  • OS: MacOS 10.15
  • Language Version: Python 3.8
  • Virtual environment: Conda

Additional context

Github Packages Docker push not correctly named

https://github.com/elgohr/Publish-Docker-Github-Action

{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug_no_hyphen }}

should be

{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug_no_hyphen }}/{{ cookiecutter.project_slug_no_hyphen }}

It looks a little bit ugly, hence maybe

{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug_no_hyphen }}/mlf-core_{{ cookiecutter.project_slug_no_hyphen }}

I should not forget to adapt this in the MLproject file.
Also, when after creating the project has finished there should be a message telling people to make their container public

Add more status badges to READMEs

.. image:: https://github.com/mlf-core/lcep/workflows/Publish%20Container%20to%20Docker%20Packages/badge.svg
        :target: https://github.com/mlf-core/lcep/workflows/Publish%20Container%20to%20Docker%20Packages/badge.svg
        :alt: Publish Container to Docker Packages

.. image:: https://github.com/mlf-core/lcep/workflows/mlf-core%20lint/badge.svg
        :target: https://github.com/mlf-core/lcep/workflows/mlf-core%20lint/badge.svg
        :alt: mlf-core lint

Just cookiecutter them

Add a fix-artifact-paths command

MLflow does not allow for relative paths in the meta.yaml files.

Hence, they must be fixed manually or a Cloud provider must be used.

I shall provide a command, which finds all meta.yaml files and fixes the path whereever they are.

Track input data with md5

We currently do not track that the input data is/was the same.

I suggest that we unify the input training data parameter into --input or --training-data and always compute the md5 of all files or something and log the obtained value into mlflow.
This part should be added into the general mlf_core.py part. The linter should also reflect this.

Add mlf-core logo + citation to footer of README of templates

.. image:: https://user-images.githubusercontent.com/21954664/84388841-84b4cc80-abf5-11ea-83f3-b8ce8de36e25.png
    :target: https://mlf-core.com
    :alt: mlf-core logo

|

Could be added at the top of the README of all templates. Not quite sure actually whether this is any beneficial or not, since it may indicate that any project is a mlf-core project.

fix list of templates

xgboost_dask is not always GPU deterministic

should write CPU and GPU deterministic for all templates or something along those lines

Use new general mlflow autologging API and enable it with Pytorch Lightning

This is a twofold issue, but I did not split it into two since one leads to another.

mlflow-pytorch

Most libraries like Tensorflow or XGBoost have a autolog() function, which automatically grabs all parameters, metrics etc and tracks them. Until now Pytorch did not have such functionality for technical reasons.

MLflow recently released version 1.2.0, which includes autologging functionality with Pytorch Lightning.
Read: https://databricks.com/blog/2020/11/13/mlflow-1-12-features-extended-pytorch-integration.html

The mlflow-pytorch template of mlf-core must replace all manual logging statements (mlflow.log_metrics(), mlflow.log_model(), ...) with the new Pytorch autologging API.
An example for MNIST is provided here: https://github.com/mlflow/mlflow/tree/master/examples/pytorch/MNIST/example2
Since we are also using MNIST for the base template I recommend to have a look at it.

Note 2: Ensure that you have MLflow 1.12.0+ installed on your machine AND that 1.12 (currently 1.11!) is used here: https://github.com/mlf-core/mlf-core/blob/master/mlf_core/create/templates/mlflow/mlflow_pytorch/%7B%7B%20cookiecutter.project_slug%20%7D%7D/environment.yml

mlflow autologging

Since version 1.12 a new general autologging API exists.
https://github.com/mlflow/mlflow/releases/tag/v1.12.0

Add universal mlflow.autolog which enables autologging for all supported integrations (#3561, #3590, @andrewnitu)
Add mlflow.pytorch.autolog API for automatic logging of metrics, params, and models from Pytorch Lightning training (#3601, @shrinath-suresh, #3636, @karthik-77). This API is also enabled by mlflow.autolog.
Please note that the new mlflow.pytorch.autolog() requires pytorch lightning. Therefore, this must also be added here: https://github.com/mlf-core/mlf-core/blob/master/mlf_core/create/templates/mlflow/mlflow_pytorch/%7B%7B%20cookiecutter.project_slug%20%7D%7D/environment.yml

There are two ways of approach this: Either directly try to integrate the new mlflow.pytorch.autolog() API into the template or what may be easier is to start with the example provided above and then to keep refactoring it until it pretty much looks like the already existing template. Option 2 might be easier.

Hence, after we enabled autologging for Pytorch, we should replace the manual calls of LIBRARY.AUTOLOG() in the respective templates with mlflow.autolog(). This should be a trivial change.

enable mlflow autolog for loss (default every 100 iterations)

If you want to log the loss or some other metric for every epoch, you have to set every_n_iter parameter in the mlflow.tensorflow.autolog function or in the mlflow.pytorch.autolog function. The default is set to 100

Should quickly check whether the new mlflow.autolog() supports this and then likely go with a much smaller number.
Maybe even 1?

XGBoost parameters reported twice

The issue is that MLproject once logs the parameter and then XGBoost's autologging reports the parameter again due to autologging.
Candidates are: single-precision and seed

Not sure actually what the best approach here is...

fix sync

Sven F fix

Basically cloning in the workflow with the token and then we can get rid of the workarounds for workflow deletion

Add support for a common_domain folder for template creations

We currently have a lot of common files, which are not common for every template, but only for the respective domains.

e.g. all mlflow domain templates share lots of files. We should add support for a common domain folder, which is copied and cookiecuttered at template creation and copied into the ready baked template.

To break it up into small sub tasks:

The following subtasks should work only when the correct domain was selected

  1. Try to include a useless textfile into the template
  2. Try to include nested files into the template (e.g. .github/workflows/test.yml should be copied in the final template into .github/workflows/test.yml as well, where already other files may be)
  3. Add cookiecutter support! Add a .cookiecutter.json file and try to cookiecutter a single value in the e.g. test.yml file
  4. Refactor all files, which are common for all mlflow templates into this new common_mlflow folder and ensure that everything works

Add checks that no non deterministic functions of pytorch are used

PyTorch functions that use atomicAdd in the forward kernels include torch.Tensor.index_add_(), torch.Tensor.scatter_add_(), torch.bincount().

A number of operations have backwards kernels that use atomicAdd, including torch.nn.functional.embedding_bag(), torch.nn.functional.ctc_loss(), torch.nn.functional.interpolate(), and many forms of pooling, padding, and sampling.

There is currently no simple way of avoiding nondeterminism in these functions.

https://pytorch.org/docs/stable/notes/randomness.html

Should be another linting function or added to mlflow-pytorch-2

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.