rsokl / noggin Goto Github PK

A simple tool for logging and plotting measurements during machine learning experiments

Home Page: https://noggin.readthedocs.io/en/latest

License: MIT License

Python 100.00%

data-visualization livedata machine-learning matplotlib neural-network python real-time

noggin's Introduction

Hello! I do work in the areas of machine learning, physics, and software development. I also work on improving methods for testing scientific/research software, and am a maintainer for the Hypothesis testing library. I am passionate about education, and created the CogWorks course at the MIT Beaver Works Summer Institute as well as the website Python Like You Mean It.

Libraries for accelerating and improving ML research

hydra-zen: Making Hydra more pythonic and easier to use at-scale for ML workflows and expriments
responsible-ai-toolbox: PyTorch-centric library for evaluating and enhancing the robustness of AI technologies.

Other open source projects

MyGrad: Drop-in automatic differentiation for NumPy
noggin: A simple tool for logging and plotting metrics in real time
custom_inherit: inheriting and merging docstrings in customizable ways (my first ever open source project!)

Tutorials

Property-based testing tutorial (with co-author Zac Dodds)

noggin's People

Contributors

Stargazers

Watchers

Forkers

lgtm-migrator

noggin's Issues

recreate_plot should take a figsize argument

It would be lovely to be able to take a figsize in recreate_plot so as not to end up with a miniscule plot. When I work out of interactive mode (e.g. when I work in emacs), I'd like to be able to simply construct the plot at the size I want via an interface like:

plotter, fig, ax  = recreate_plot(train_metrics=train, test_metrics=test, figsize=(8, 12))

rather than:

plotter, fig, ax = recreate_plot(train_metrics=train, test_metrics=test)
fig.set_size_inches(8, 12)

liveplot needs proper docs page

Experiment with various tips for speeding up plotting

Provide a compressed-save method

My logged data is taking up too much space!

Plotting in server mode

Add ability to serve logged data to a plotter. This would permit people to manage a live plot in a separate and multiple notebooks.

This is an ambitious enhancement that has the potential for a large payoff. I would like to carefully consider the best means for serving/listening to data in a simple but robust way. I'd like to get input from other about how to move forward with this (@davidmascharka , @ptran516 , @arjunmajum)

Add support for alternate plotting backends

Abstract away the specific plotting backend (i.e. matplotlib) from LivePlot. Thus the current version of LivePlot would become MatplotlibLivePlot, and would retain the matplotlib-specific functionality. Otherwise LivePlot will serve as an abstract base class that handles all of the metric logging, saving, refresh logic, etc.

Ultimately, it would be nice to support bokeh and toyplot as backends.

Investigate and document compatibility with Jupyter lab

At first glance, it looks like the %matplotlib notebook magic doesn't work

Logger should permit per-metric batch domains / missing data

Users should be able to set nan in their batch

Add fast-style plotting

https://matplotlib.org/tutorials/introductory/usage.html?highlight=fast%20style#using-the-fast-style

LivePlot should be a true drop-in replacement for LiveLogger

Need to update docs afterwards

last_n_batches breakage

when set mid-run and epoch data is present in the 2nd metric

x-axis values

Iteration number can be pretty unwieldy. It would be nice to have an option to label the x-axis by iteration number, epoch, etc.

Is noggin coming to conda?

I like to use conda rather than pip to keep all of my packages in one place. Will noggin be coming to conda via conda install at any point?

merge pytest.ini with setup.cfg

Warn users when plotting is a substantial portion of their loop time

Make metrics saveable/loadable as x-arrays

Live metrics are already handled as ordered dictionaries of numpy arrays; this is nearly exactly the data format needed to form an xarray of the metrics.

This would permit users to seamlessly access their data as N-dimensional arrays with labeled axes.

Update binder notebook / env

Add copy button to docs

Upgrade nbsphinx to 0.6

Add https://github.com/choldgraf/sphinx-copybutton capability

Create gif of liveplot in action

The README needs a brief gif that shows liveplot in action. It should show at least two metrics (e.g. loss and accuracy) being plotted with both batch and epoch-level statistics.

fix indentation

    # record training epoch
    if i%10 == 0 and i > 0:
        plotter.plot_train_epoch()

       # cue test-evaluation of model
       for x in np.linspace(0, 10, 5):
           x += (np.random.rand(1) - 0.5)*5
           test_metrics = {"accuracy": x**2}
           plotter.set_test_batch(test_metrics, batch_size=1)
       plotter.plot_test_epoch()
plotter.plot()  # ensures final data gets plotted

Add tests for last-N batches

Limit data rate for plotting

Currently liveplot will plot all available data regardless of how much data that is. This can lead to large computational costs, making plotting a bottleneck.

We should establish a heuristic for limiting the amount of data being plotted. Ideally this would involve estimating the computational cost of each "draw" during live plotting, and how this scales with the amount of data available.

We would also want to estimate the maximum visually-resolvable density of data. That is, if I am drawing 10,000 points on a typically-sized plot, does drawing every 10th point look just the same as drawing every point?

With these to pieces of analysis, we should be able to arrive at a sensible default for limiting the number of points that we draw in a given call. We could potentially plot sliding-window averages to coarsen the plot.