Currently, we just store the value of the nodes, however, some (popular) visualization libraries make use of side-effects to render the visualization, so the variable the user is dealing with often are not the value of the chart.
Let's take matplotlib as an example, taking an example from their gallery:
import matplotlib.pyplot as plt
import numpy as np
# Data for plotting
t = np.arange(0.0, 2.0, 0.01)
s = 1 + np.sin(2 * np.pi * t)
fig, ax = plt.subplots()
ax.plot(t, s)
ax.set(xlabel='time (s)', ylabel='voltage (mV)',
title='About as simple as it gets, folks')
ax.grid()
fig.savefig("test.png")
plt.show()
Which variable should we tell the Linea user to publish
?
In order to support the DataAssetManager
based logic (saving some variable value), we would need to look into the return values of fig.savefig
and plt.show
to see what is the easiest for us to use (I dug around for 2 mi but it seems like it would take longer, e.g., the show
function traces to a few overloads that we'd need to read through to understand).
Matplotlib is known to be tricky to deal with (global variables everywhere, exemplified by the use of plt
). Vega-lite
(through Altair
) I think is much better, but still they do not have a common used function that just returns the image binary---pretty useless for the "normal" use cases.
Most visualization libraries do however offer very easy support for writing to an image file. For example in the Matplotlib case, we have fig.savefig("test.png")
, and for Altair, we have the example---notice the different file formats and implications (the JS based one we can easily render in our UI, but it's less portable than a PDF/PNG, but the latter requires some additional libraries, i.e., altair_saver
.)
import altair as alt
from vega_datasets import data
chart = alt.Chart(data.cars.url).mark_point().encode(
x='Horsepower:Q',
y='Miles_per_Gallon:Q',
color='Origin:N'
)
chart.save('chart.json')
chart.save('chart.pdf')
So maybe instead of trying to dynamically figure out how to work with each library, we just directly intercept at the file level? I think both the pro and con is that we rely on the user to figure out how to save, which makes Linea less magical but also less work from us (there would be more work on the DataAssetManager
still to now also work with some files rather than in memory variables).
My vote is on to let the user give us the file containing the visualization. @dorx what do you think?
If you are also onboard, then we should think about what the implication is for the lineapy.publish
API.