Coder Social home page Coder Social logo

paletteml / mlsync Goto Github PK

View Code? Open in Web Editor NEW
70.0 5.0 4.0 12.4 MB

Sync your ML data with your favorite productivity tools!

Home Page: https://mlsync.dev

License: Apache License 2.0

Python 100.00%
deep-learning machine-learning mlflow notion python pytorch

mlsync's Introduction

Sync your ML data seamlessly with productivity tools you love


WebsiteInstallationDocsExamplesContributing

PyPI Status PyPI Status ReadTheDocs Slack license

Overview

What is MLSync?

MLSync is a Python library that acts as a bridge between your ML workflow and your project planning and management tools.

Installation

pip install mlsync

Why MLSync?

Developing ML projects is a lot of fun, but they are also hard to plan and manage. While the ML community has built several tools for developers to better track and visualize their ML workflow data, there is a disconnect between ML workflow data and the tools that are used for project management. MLSync is designed to bridge this gap.

How Does it Work?

There are four main aspects of MLSync:

  1. MLSync interfaces with modern ML experiment tracking tools such as MLflow and imports the raw data.
  2. Raw data from ML experiment tracking tools are converted to MLSync internal data format (user-defined) and stored in a database.
  3. MLSync engine processes this raw data and generates consolidated insights for your project.
  4. The insights are then converted to suitable formats and sent to your project planning and management tools such as Notion.

We are actively building MLSync with the vision to become a one-stop standard interface to map data from ML experiments to project management tools. The above figure shows the high-level architecture of MLSync. All the functionality is not yet available; please refer to the Roadmap for the current status. If you would like to contribute to MLSync, please refer to the Contributing section.

Example

In this example, we will sync your machine learning experiments to Notion in three simple steps!

1. Install MLSync

pip install mlsync

2. Setup the Example

  1. git clone https://github.com/paletteml/mlsync.git: Checkout the MLSync repository.
  2. cd mlsync/examples/mlflow-notion/: Change directory to the example directory
  3. pip install -r requirements.txt : Install the requirements for this example.
    • Note that the above step installs Pytorch. If you run into issues, please refer to the Pytorch documentation for more information.
  4. Run example training using python mlflow_pytorch.py --run-name <name>. Make sure it runs (Need not complete the run).

3. Notion Setup

Let us now link Notion to MLSync. This is required only for the first time you run MLSync.

  1. Create a new integration to Notion.
    1. Visit notion.so/my-integrations
    2. Click the + New Integration button.
    3. Name it as MLSync and hit submit.
    4. Copy your "Internal Integration Token" from your Notion integration page.
    5. Open the .env file in your path and update the Notion token.
      • NOTION_TOKEN=secret_0000000000000000000000000000000000000000000
  2. Create a new page in Notion. This will serve as the root page for your MLFlow runs.
    1. Let us name the page as Demo.
    2. Click the Share button on the top right corner of the page.
    3. Click the Invite button and then choose MLSync integration.

All Done

You are now all set! Now let us sync your MLFlow runs to Notion.

mlsync --config config.yaml

{% note %}

Note: First time you run, you will be prompted to choose a page to sync to. From the options, choose the page you created in the previous step (Demo).

{% endnote %}

That's it! You can now view your MLFlow runs in Notion. As long as mlsync is running in the background, all your future experiments and runs in this directory should appear in the selected Notion page.

Troubleshooting

  1. If you are getting an error related to the NOTION_TOKEN not being found, you can pass the --notion-token flag to mlsync to specify the token.
  2. If you are having trouble with MNIST dataflow download, you can try to download the data manually from here.
  3. Please contact us for any other issues.

Please raise an issue, or reach out if you have any other errors.

Advanced

  1. You can override the Notion page id, token, and other configurations by either modifying the config.yaml file or by passing the arguments to the mlsync command. Run mlsync --help to see the available arguments.
  2. Custom Report Formats: mlsync allows you to customize the report much further. You can customize the report by adding your own format.yaml file. Read documentation here to learn more.
  3. Custom Refresh Rates: You can control the refresh rate of the report by setting the refresh_rate field in the configuration file.
  4. Restarting mlsync: You can restart mlsync any time without losing earlier runs.

Enjoy! If you have any further questions, please contact us.

Roadmap

We want to support different training environments and different productivity tools.

  1. Productivity Tools
    1. Notion: Supported
    2. Trello: Planned
    3. Confluence: In progress
    4. Jira: Planned
  2. Monitoring Frameworks
    1. MLFlow: Supported
    2. TensorBoard: In progress
    3. ClearML: Planned
  3. Programmatic API
    1. Planned

Do you have other tools/frameworks you would like to see supported? Let us know!

Contributing

We welcome contributions from the community. Please feel free to open an issue or pull request. Or, if you are interested in working closely with us, please contact us directly. We will be happy to talk to you!

mlsync's People

Contributors

dhirajk1 avatar kartik-hegde avatar msharmavikram avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

mlsync's Issues

Enhancement: Slackbot support

Most developer and tracking is currently done with the help of JIRA tools but having a slackbot to automatically generate summaries of the changes would be helpful with MLSync. Basically, MLSync not only should update the JIRA ticket but also should notify in a slack channel that the JIRA ticket is cleared and the rest of the tickets that had dependencies can execute moving forward.

Enhancement: Backtracking with logs

There are different degrees to which we can enable Jupyter Notebook. The simplistic version is to use an MLflow-based mechanism as we did in #2. However, what I have in mind is an even more versatile solution with the assumption of an ever-changing notebook that keeps updating as the developers add a new piece of code. Here, MLSync tracks the history of these changes and creates a sort of mind map on the reasoning behind these changes and what was the impact of it. In an ideal MLsync world, we should enable tracking via git commit ids, tags marked to determine what PR id or JIRA ID corresponds to what changes and how these changes evolved over time. Personally, I see this as an exceptionally helpful tool when working with large codebases.

Enhancement: Support Jupyter notebook

Creating an enhancement request to enable support for the Jupyter notebook. Jupyter notebook is a widely used tool for developing ML projects but tracking with the Jupyter notebook is currently done manually. Automating this can be of significant benefit.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.