mlf-core / mlf-core Goto Github PK

View Code? Open in Web Editor NEW

46.0 2.0 3.0 25.51 MB

CPU and GPU deterministic and therefore fully reproducible machine learning pipelines using MLflow.

License: Apache License 2.0

Dockerfile 0.82% Makefile 3.20% Python 95.86% Shell 0.12%

mlflow template machinelearning artificialintelligence reproducible deterministic gpu hacktoberfest

mlf-core's Introduction

mlf-core

Preprint

mlf-core: a framework for deterministic machine learning

Overview

mlf-core provides CPU and GPU deterministic machine learning templates based on MLflow, Conda, Docker and a strong Github integration. Templates are available for PyTorch, TensorFlow and XGBoost. A custom linter ensures that projects stay deterministic in all phases of development and deployment.

Installing

Start your journey with mlf-core by installing it via $ pip install mlf-core.

See Installation.

run

See a mlf-core project in action.

config

Configure mlf-core to get started.

See Configuring mlf-core

list

List all available mlf-core templates.

See Listing all templates.

info

Get detailed information on a mlf-core template.

See Get detailed template information.

create

Kickstart your deterministic machine laerning project with one of mlf-core's templates in no time.

See Create a project.

lint

Use advanced linting to ensure your project always adheres to mlf-core's standards and stays deterministic.

See Linting your project

bump-version

Bump your project version across several files.

See Bumping the version of an existing project.

sync

Sync your project with the latest mlf-core release to get the latest template features.

See Syncing a project.

upgrade

Check whether you are using the latest mlf-core version and update automatically to benefit from the latest features.

See https://mlf_core.readthedocs.io/en/latest/upgrade.html.

Credits

Primary idea and main development by Lukas Heumos. mlf-core is inspired by nf-core. This package was created with cookietemple based on a modified audreyr/cookiecutter-pypackage project template using cookiecutter.

mlf-core's People

Contributors

Stargazers

Watchers

Forkers

kevinmenden ggabernet minnervva

mlf-core's Issues

Ask again if users decline Github support

Since we set up quite a lot without GH support including the repo secrets etc we should ensure that users always use the GH support except if they really really don t want to or can't (gitlab/bitbucket users).

I would suggest to ask again like

"Do you really want to skip this step? Creating a Github repository automatically also sets up automatic template syncing."

when users decline the automatic GH repository creation. It should be a yellow warning.

Update XGBoost from 1.1.1 to 1.20

The release notes are not up yet, therefore it is too early to judge whether can update the version without any issues.

Don't run system-intelligence when the OS is not Linux

system-intelligence currently doesn't support macOS and Windows. It is however installable on those platforms.

Users should be warned and the usage of Docker should be suggested.

mlf_core.py should not use 'framework', but the project slug directly

It's not in common files anyways, so let's just use the project_slug.

list/info screenshots are outdated

aka CT style

Switch to Github container registry

https://docs.github.com/en/packages/getting-started-with-github-container-registry/about-github-container-registry

Verify that all mlf-core features work with default "main" branches.

Github recently changed the default branch of newly created repositories to "main". git itself went for similar measures.
I myself reverted to master, because I am just so used to it with muscle memory.
Since mlf-core automates a lot with specific branch names we must verify that everything still works.
I verified the following questions for master, but not yet main.

Questions to answer:

Does the automatic repository creation still work?
Does automatic syncing (can also be triggered manually) still work? Can be tested by deliberately decreasing the template version in the .mlf_core.yml file
Is the documentation still pushed to the gh-pages branch when pushed to the main branch? I am pretty sure that this one will break, but maybe there are some synonyms going on behind the scenes?

If anything here breaks, please report and we can fix it.

Add information on how to create releases

Imagine your status is on the development branch
Create a release/0.1.0 branch
Merge into master
Use mlf-core bump-version to switch the version from a SNAPSHOT version to a release version
mlf-core bump-version 0.1.0 .
Create a release on Github
New Docker container is published with the latest version
Switch back to development
Bump the version again to a SNAPSHOT version
mlf-core bump-version 0.2.0-SNAPSHOT .
The changelog will automatically add sections

etc blabla

mlflow-package should have ready examples for predictions for pytorch, tensorflow (both mnist) and xgboost (covertype)

Currently the mlflow-packages templates for pytorch, tensorflow and xgboost do not really have out of the box working predictions.
Only xgboost works, but it is not using the covertype example and therefore does not match with the general mlflow-xgboost model.

mlflow-package for pytorch should load a trained mnist model and perform a prediction. Maybe even plot one thing.
Same for Tensorflow (mnist) and XGBoost with covertype.

mlf-core config defaults to former values

When running mlf-core config for the second time, the defaults could be set to the previous values that were entered.

Additional context

New PAT requires Package permissions -> document

I have to document the new PAT permissions (requires repo, create and update packages).

Also a note on how the Docker container publishing works etc.

Add a package template, which allows for simple PyPI package creation

Proposed Template

package-pytorch
package-tensorflow
package-xgboost

Describe the proposed new template.

Describe the suggested technology stack.

Additional context

Github workflow status badges in README should link to the corresponding Action on Github

See title.

Currently they link to the image.

Add vscode to gitignore of templates

# visual studio code
.vscode

linting errors when strings, which are not SNAPSHOT in version

Describe the bug
Linting errors when creating project

To Reproduce

? Choose the project's domain:   mlflow
? Choose the project's primary framework:   pytorch
? Project name [Exploding Springfield]:  mlfcore test
Looking up mlfcore test at readthedocs.io!
? Short description of your project [mlfcore test. A mlf-core based .]:  This is a test for the mlfcore project.
? Initial version of your project [0.1.0-SNAPSHOT]:  0.1.0dev
? License:   MIT
? Do you want to create a Github repository and push your template to it? [Yes]:   Yes
? Do you want to create an organization repository? [No]:   No
? Do you want your repository to be private? [No]:   No
Fixing too short underlines of *.rst file (usually index.rst)
Running general linting
Running lint checks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8 of 8 check_no_cookiecutter_strings
Running mlflow-pytorch linting
Running lint checks ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3 of 3 pytorch_reproducibility_seeds

─────────────────────────────────────────────────────────────────────────────────────────────────────────────── LINT RESULTS ───────────────────────────────────────────────────────────────────────────────────────────────────────────────

     [[✔]]    5 tests passed
     [[!]]    8 tests had warnings
     [[✗]]    7 tests failed

──────────────────────────────────────────────────────────────────────────────────────────────────────────── [[✔]] Tests Passed ────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-1 : All required general files were found!                                                                                                                                 │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-2 : Dockerfile check passed                                                                                                                                                │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Passed conda environment checks.                                                                                                                                       │
│ 4 https://mlf-core.readthedocs.io/en/latest/lint.html#mlflow-pytorch-1 : All required mlflow-pytorch specific files were found!                                                                                                          │
│ 5 https://mlf-core.readthedocs.io/en/latest/lint.html#mlflow-pytorch-2 : All required reproducibility settings enabled.                                                                                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

─────────────────────────────────────────────────────────────────────────────────────────────────────────── [[!]] Test Warnings ────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Version 10.1 of cudatoolkit is not the latest available: 11.0.221                                                                                                      │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Version 3.7 of python is not the latest available: 3.8.5                                                                                                               │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-7 : Version 6.0.0 of rich is not the latest available: 6.1.1                                                                                                               │
│ 4 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in README.rst: * Write features here                                                                                                                 │
│ 5 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in model.rst: Write your model documentation here.                                                                                                   │
│ 6 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in usage.rst: Write your usage and parameter documentation here.                                                                                     │
│ 7 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in publish_docker.yml: Adapt the username, password and registry if you do not want to publish to Github Packages.                                   │
│ 8 https://mlf-core.readthedocs.io/en/latest/lint.html#general-3 : TODO string found in publish_docker.yml: Note that for organization repositories the Github Container Registry needs to be enabled                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

─────────────────────────────────────────────────────────────────────────────────────────────────────────── [[✗]] Test Failures ────────────────────────────────────────────────────────────────────────────────────────────────────────────
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in .mlf_core.yml: version: 0.1.0dev should be version: 0.1.0devdev                                                                          │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in docs/conf.py: version = '0.1.0dev' should be version = '0.1.0devdev'                                                                     │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in docs/conf.py: release = '0.1.0dev' should be release = '0.1.0devdev'                                                                     │
│ 4 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in MLproject:                                                                                                                               │
│                                                                                                                                                                                                                                          │
│╔════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗│
│║                                                                                           Version: 0.1.0dev should be # Version: 0.1.0devdev                                                                                           ║│
│╚════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝│
│                                                                                                                                                                                                                                          │
│ 1 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in MLproject: image: ghcr.io/ggabernet/mlfcore_test:0.1.0dev should be image: ghcr.io/ggabernet/mlfcore_test:0.1.0devdev                    │
│ 2 https://mlf-core.readthedocs.io/en/latest/lint.html#general-5 : Version number don´t match in .github/workflows/publish_docker.yml: tags: "latest,0.1.0dev" should be tags: "latest,0.1.0devdev"                                       │
│ 3 https://mlf-core.readthedocs.io/en/latest/lint.html#general-6 : No changelog sections detected!                                                                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
 7 tests failed! Exiting with non-zero error code.

Expected behavior

System [please complete the following information]:

OS: MacOS 10.15
Language Version: Python 3.8
Virtual environment: Conda

Additional context

qube github support update

adapt gh support from qube

Github Packages Docker push not correctly named

https://github.com/elgohr/Publish-Docker-Github-Action

{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug_no_hyphen }}

should be

{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug_no_hyphen }}/{{ cookiecutter.project_slug_no_hyphen }}

It looks a little bit ugly, hence maybe

{{ cookiecutter.github_username }}/{{ cookiecutter.project_slug_no_hyphen }}/mlf-core_{{ cookiecutter.project_slug_no_hyphen }}

I should not forget to adapt this in the MLproject file.
Also, when after creating the project has finished there should be a message telling people to make their container public

Add more status badges to READMEs

.. image:: https://github.com/mlf-core/lcep/workflows/Publish%20Container%20to%20Docker%20Packages/badge.svg
        :target: https://github.com/mlf-core/lcep/workflows/Publish%20Container%20to%20Docker%20Packages/badge.svg
        :alt: Publish Container to Docker Packages

.. image:: https://github.com/mlf-core/lcep/workflows/mlf-core%20lint/badge.svg
        :target: https://github.com/mlf-core/lcep/workflows/mlf-core%20lint/badge.svg
        :alt: mlf-core lint

Just cookiecutter them

run flake8 on all templates in the create workflows

So that we see earlier whether during refactoring we killed flake8.

Add a fix-artifact-paths command

MLflow does not allow for relative paths in the meta.yaml files.

Hence, they must be fixed manually or a Cloud provider must be used.

I shall provide a command, which finds all meta.yaml files and fixes the path whereever they are.

Fetch documentation structure from CT

After validation.

Move the mlf-core base container from Dockerhub to quay?

https://quay.io/

The 100 pulls per day limit is not fine for us.

Add new gifs to README like CT

Every command should have a gif.

Should discuss whether an mlf-core overview image should be at the top or so

Add a .gitattributes file to set Python as the primary language

Can either ignore *.html or set it as Python only.

Likely the first option?

mlf-core lint check_docker should verify that it contains something like FROM mlf-core

should ensure with a regex or something that it is using FROM mlf-core/base:something whatever

Update documentation for templates

Some still mention outdated docker building instructions etc.

Add Rich to templates

To add the traceback

because why the hell not

Track input data with md5

We currently do not track that the input data is/was the same.

I suggest that we unify the input training data parameter into --input or --training-data and always compute the md5 of all files or something and log the obtained value into mlflow.
This part should be added into the general mlf_core.py part. The linter should also reflect this.

ModuleAttributeError: 'DataParallel' object has no attribute 'log_weights'

Only happens on multi GPU machines. Does not happen with the CPU or single GPU.

Maybe we need to access the wrapped model's weights directly or something?

https://stackoverflow.com/questions/50442000/dataparallel-object-has-no-attribute-init-hidden

Github support currently doesn't allow for hyphens -> replaces with underscores

CT cli-python treatment is required.

-> Introduce project_slug_no_hyphen
-> prompt_general_template_configuration
-> replace project_slugs where required with no hyphen version in templates

Add mlf-core logo + citation to footer of README of templates

.. image:: https://user-images.githubusercontent.com/21954664/84388841-84b4cc80-abf5-11ea-83f3-b8ce8de36e25.png
    :target: https://mlf-core.com
    :alt: mlf-core logo

|

Could be added at the top of the README of all templates. Not quite sure actually whether this is any beneficial or not, since it may indicate that any project is a mlf-core project.

fix list of templates

xgboost_dask is not always GPU deterministic

should write CPU and GPU deterministic for all templates or something along those lines

Add check for all_reduce in xgboost linters -> should not be used

dmlc/xgboost#5023

Structure the model.rst file better

I should come up with a reasonable general structure for the model.rst file to guide the user through a reasonable documentation.

Use new general mlflow autologging API and enable it with Pytorch Lightning

This is a twofold issue, but I did not split it into two since one leads to another.

mlflow-pytorch

Most libraries like Tensorflow or XGBoost have a autolog() function, which automatically grabs all parameters, metrics etc and tracks them. Until now Pytorch did not have such functionality for technical reasons.

MLflow recently released version 1.2.0, which includes autologging functionality with Pytorch Lightning.
Read: https://databricks.com/blog/2020/11/13/mlflow-1-12-features-extended-pytorch-integration.html

The mlflow-pytorch template of mlf-core must replace all manual logging statements (mlflow.log_metrics(), mlflow.log_model(), ...) with the new Pytorch autologging API.
An example for MNIST is provided here: https://github.com/mlflow/mlflow/tree/master/examples/pytorch/MNIST/example2
Since we are also using MNIST for the base template I recommend to have a look at it.

Note 2: Ensure that you have MLflow 1.12.0+ installed on your machine AND that 1.12 (currently 1.11!) is used here: https://github.com/mlf-core/mlf-core/blob/master/mlf_core/create/templates/mlflow/mlflow_pytorch/%7B%7B%20cookiecutter.project_slug%20%7D%7D/environment.yml

mlflow autologging

Since version 1.12 a new general autologging API exists.
https://github.com/mlflow/mlflow/releases/tag/v1.12.0

Add universal mlflow.autolog which enables autologging for all supported integrations (#3561, #3590, @andrewnitu)
Add mlflow.pytorch.autolog API for automatic logging of metrics, params, and models from Pytorch Lightning training (#3601, @shrinath-suresh, #3636, @karthik-77). This API is also enabled by mlflow.autolog.
Please note that the new mlflow.pytorch.autolog() requires pytorch lightning. Therefore, this must also be added here: https://github.com/mlf-core/mlf-core/blob/master/mlf_core/create/templates/mlflow/mlflow_pytorch/%7B%7B%20cookiecutter.project_slug%20%7D%7D/environment.yml

There are two ways of approach this: Either directly try to integrate the new mlflow.pytorch.autolog() API into the template or what may be easier is to start with the example provided above and then to keep refactoring it until it pretty much looks like the already existing template. Option 2 might be easier.

Hence, after we enabled autologging for Pytorch, we should replace the manual calls of LIBRARY.AUTOLOG() in the respective templates with mlflow.autolog(). This should be a trivial change.

Add ability to ignore warnings about outdated dependencies

We could add a # MLF-CORE IGNORE statement optionally into the conda environment, which will be picked up when linting and subsequently ignored.

Add automatic pushes to Dockerhub for Dockerfiles

Github Packages for now.

mount /data in all templates

See lcep.

Also document this behavior.

Observed project_slug and project_slug_no_hyphen discrepancy for train_cpu workflow

Issue:

Workflow is called

enable mlflow autolog for loss (default every 100 iterations)

If you want to log the loss or some other metric for every epoch, you have to set every_n_iter parameter in the mlflow.tensorflow.autolog function or in the mlflow.pytorch.autolog function. The default is set to 100

Should quickly check whether the new mlflow.autolog() supports this and then likely go with a much smaller number.
Maybe even 1?

Add SHAP explanations to templates

The new 1.12 release also brings SHAP explanations with it.
It would be awesome if all templates featured it. Recommended reading: https://databricks.com/blog/2020/11/13/mlflow-1-12-features-extended-pytorch-integration.html

Test sync with CT and ensure that it works

Fix issues of arising

XGBoost parameters reported twice

The issue is that MLproject once logs the parameter and then XGBoost's autologging reports the parameter again due to autologging.
Candidates are: single-precision and seed

Not sure actually what the best approach here is...

Incorporate findings from Duncan

https://developer.download.nvidia.com/video/gputechconf/gtc/2019/presentation/s9911-determinism-in-deep-learning.pdf

Fusion threshold and dynamically patching bias_add

Enable colored linting output in GH Actions

GH Actions currently does not print colored linting output.

Fix is here: https://github.com/nf-core/tools/pull/760/files

Just force colored output when the environment is GITHUB_ACTIONS, which is a predefined env variable in GHA.

To break it up into small sub tasks:

The following subtasks should work only when the correct domain was selected

Try to include a useless textfile into the template
Try to include nested files into the template (e.g. .github/workflows/test.yml should be copied in the final template into .github/workflows/test.yml as well, where already other files may be)
Add cookiecutter support! Add a .cookiecutter.json file and try to cookiecutter a single value in the e.g. test.yml file
Refactor all files, which are common for all mlflow templates into this new common_mlflow folder and ensure that everything works

Add checks that no non deterministic functions of pytorch are used

PyTorch functions that use atomicAdd in the forward kernels include torch.Tensor.index_add_(), torch.Tensor.scatter_add_(), torch.bincount().

A number of operations have backwards kernels that use atomicAdd, including torch.nn.functional.embedding_bag(), torch.nn.functional.ctc_loss(), torch.nn.functional.interpolate(), and many forms of pooling, padding, and sampling.

There is currently no simple way of avoiding nondeterminism in these functions.

https://pytorch.org/docs/stable/notes/randomness.html

Should be another linting function or added to mlflow-pytorch-2

mlf-core / mlf-core Goto Github PK

mlf-core's Introduction

mlf-core

Preprint

Overview

Installing

run

config

list

info

create

lint

bump-version

sync

upgrade

Credits

mlf-core's People

Contributors

Stargazers

Watchers

Forkers

mlf-core's Issues

mlflow-pytorch

mlflow autologging

Recommend Projects

Recommend Topics

Recommend Org