I like the idea outlined in the 'manifesto' section of having a pipe-able api. Howeve

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

idea: auto-generating a pipe-able api,about vegawidget/altair

Comments (24)

ijlyttle commented on May 31, 2024 1

I have read this a few times, and it is taking me too long to get to an opinion that I can get myself to agree with for any length of time 😳

I think that pype is an interesting idea - but I will need to ask for a little while longer to be able to respond properly, commensurate with the thought you have put into it.

from altair.

eibanez commented on May 31, 2024 1

The $ operator remains unchanged, but the user would have to choose one or the other.

from altair.

ijlyttle commented on May 31, 2024

I totally agree! On the one hand, it would be great to have, but on the other, I share the nightmare you point out, so I have been apprehensive.

Please explore, as what you propose would be ideal! I would think that such an idea would generalize beyond this package, and would be of interest to the greater "reticulate" community.

I remain at your service to discuss ideas, and I'll do my best to keep up :)

from altair.

AliciaSchep commented on May 31, 2024

Digging a bit it looks like reticulate already has a helper function for generating r api's:

reticulate::py_function_wrapper("alt$Chart")

Just trying that copied & pasted didn't work out right away (some issues with Undefined, documentation parsing error) but those might be things to report (and/or propose solution) in reticulate repo

However, it doesn't seem to do the right thing for class methods (e.g. reticulate::py_function_wrapper("alt$Chart$mark_point") does not do something very useful)

Probably a "py_method_wrapper" for making an S3 method rather than a function would be helpful. I am thinking that kind of thing would either be an addition to the reticulate library itself or a separate package dedicated to this auto-generation....

from altair.

ijlyttle commented on May 31, 2024

I like it!

from altair.

AliciaSchep commented on May 31, 2024

I have a minimalist proof-of-concept version of auto-generating an api somewhat working in the branch 'autogen' of my fork. It enables:

alt_Chart(r_to_py(mtcars)) %>% 
  alt_mark_point() %>% 
  alt_encode(x = "mpg:Q",
                      y = "hp:Q",
                      color = "cyl:N") %>% 
  vegalite()

All the api does is creates dummy functions or s3 methods that pass their arguments to the right python/reticulate construct. It doesn't have any niceties like default arguments, documentation, rstudio autocomplete, etc... for that will likely need to make use of things like what was mentioned above (functionality from reticulate that might not yet be 100% up to the task). In addition, there is a function to generate the R api but it has to be run manually, while in an ideal case it might run on building the package.

# Manual process
ra <- generate_r_api(alt, prefix = "alt")
cat(ra, file = "R/r_api.R")

Lots more to do, but I think this idea might really be feasible!

from altair.

ijlyttle commented on May 31, 2024

I had a quick peek - this looks really cool! I agree that being able to automate bringing over defaults, documentation, ..., would be amazing.

As a side note, on the tooltip side, I am experiencing the tedium and brittleness that you came to experience with manual translations of APIs 😥- the tooltip is a relatively small API so I think I can handle it (famous last words...). Larger point being (which you are making with your investigation), that at the scale of the Altair API, automation seems the only feasible option.

from altair.

AliciaSchep commented on May 31, 2024

Exploring this a bit further, I am having my doubts that auto-generating the full altair api is a sensible thing to do. Automatically bringing over defaults seems feasible, but auto-generating the documentation in a helpful way seems like a much bigger challenge. The easiest approach would be bare-bones documentation that points the user to the right py_help command to get more info on arguments.

One issue is that the altair api is pretty large -- lots of classes & methods, and making r versions of all plus even some minimal documentation would mean a much larger package footprint, and a big challenge for testing (how to test that the auto-generation is successful for everything?). If the documentation was anything beyond the bare-bones approach, it would also be hard to verify that each class/method had decent documentation.

All-in-all, since this functionality is more of a nice-to-have rather than something essential, it seems like perhaps is is not worth the downsides?

I also think there are other ways to get at some of those niceties. I think an alternative could be to handcraft (perhaps with some help from auto-generated starter function) some R constructors for the main classes of altair (Chart + the various compound Charts). Those could have some extra niceties like doing r_to_py for the data, and checking for "." in column names. Since this is only six constructors, this wouldn't be so bad to manually curate (especially if there was a helper function to auto-generate a starting point).

For calling methods, there could be a simple helper function that allows you to pipe into one of the object's methods rather than using the "$" method for chaining. E.g:

pype <- function(pyobj, pymethod, ...){

  pyobj[[as.character(substitute(pymethod))]](...)

}

That plus your proposed Chart constructor

alt_chart <- function(data, ...) {
  alt$Chart(r_to_py(data), ...)
}

enables this kind of chart construction

mtcars %>% 
  alt_chart()  %>% 
  pype(mark_point) %>%
  pype(encode,
    x = "mpg:Q",
    y = "hp:Q",
    color = "cyl:N") %>%
  vegalite()

from altair.

g3o2 commented on May 31, 2024

I've been through reading this with lots of interest, though the more I think about it I wonder whether it would be any more difficult to programmatically translate directly from the vega-lite schema rather than from the Altair API. After all, that's what Altair is doing, isn't it?

from altair.

ijlyttle commented on May 31, 2024

@g3o2, I agree that translating directly from the Vega-Lite schema is what Altair is doing, and I can see the appeal of such an approach.

That being said, I don't have enough experience to judge the relative difficulty among these approaches. However, if this package stops using the Altair python package, it ceases to be "altair". This does not mean that I think it should not be done, it just means that it lies outside the scope of this package.

from altair.

AliciaSchep commented on May 31, 2024

@g3o2 I have also been thinking of the possibility of auto-generating an R api from JSON schema. I think in long term that would have several advantages:

Such an approach would also be valuable for wrappers for other schema-based js visualization libraries (e.g. I think you can specify a g2 plot with a json spec)
Cuts out the python middleman

In the short/medium term, however, I think there are many advantage of auto-generating an r api from a python api:

Python api is closer to what an r api would be than just a JSON schema.
While Altair starts with auto-generating an api, my understanding is that their is a fair amount of additional work that takes the auto-generated code and makes a fully featured api. An r api built on top of Python takes advantage of all the thought & hard work put into the Altair api design.
Altair takes advantages of the "traitlets" python package, which enforces strong typing for Python objects. For R, potentially S4 classes would make sense to have the same typing... in general I think a lot of thought would need to go into how a json schema should go turn into an R api...

@ijlyttle one thought about scope: In python, (to the best of my knowledge) Altair does not do any rendering of plots -- it relies on packages like ipyvega to use the spec and turn it into a plot. One thought would be to separate out the vegalite (and vega_embed) functionalities into a separate package that has a singular focus on turning vega-lite spec into a widget. It could support specs generated either manually, by the altair package, or some other package. The altair package would then focus just on easing use of the Altair python package to create a spec.

from altair.

ijlyttle commented on May 31, 2024

@g3o2 @AliciaSchep: I am certainly supportive of an effort to have a native, pipeable R API to build a Vega-Lite (or Vega) specs, as I think it gets us another step closer to the elusive promise of ggvis. I yield to both of your expertise on the challenges of getting "there from here", but I will echo @AliciaSchep's comment that I think this approach can be useful, for now, by giving R people a chance to start building Vega-Lite specs.

@AliciaSchep I have had a similar thought on splitting the package, but I had hoped that no-one else would notice :). If this did split into two packages, I think it should come after getting Altair v2.0.1 merged into master and before submitting to CRAN.

One thing I would want to think through: let's say that the new package was called vegarender (it would not surprise me that there is a better name out there). If multiple packages relied on vegarender, do we risk a reverse-dependency headache if vegarender and altair wanted to advance the JS versions, but another package could not?

from altair.

AliciaSchep commented on May 31, 2024

Yeah, the dependency issue would have to be thought through. One would hope that minor version updates of vega-lite and vega-embed would be backwards compatible, such that if the vegarender (or vegawidget or whatever) package would stay fairly up-do-date, things would be okay for dependencies even if they updated slower. For breaking, major version changes, one possibility would be to transition to new name, e.g. "vegarender4" or something like that so that the previous version could remain for awhile as a dependency and then be archived when no longer needed. I think that may be what the python rendering package has done judging from the name "vega3"...

Despite my earlier doubts about the auto-generating the whole api, I've continued a bit of effort along that front... I put the auto-generation code itself into a separate package (https://github.com/AliciaSchep/autopyr) and have updated the api in 'autogen' branch of my fork of this package (although that will likely move elsewhere too at some point). The main updates:

Now adds default arguments (not just passing in ...)
Does r_to_py conversion internally so you don't have to call it explicitly for input data.frame

Still need to figure out the thornier documentation issue... and as I've only tested out very simple examples, I'm guessing there are lots of bugs and such to work out

from altair.

ijlyttle commented on May 31, 2024

I like vegawidget!

I agree with you that it should work - the breaking of the tooltips in vega-embed gives me pause.

My thought is that I favor the idea of a widget package, but given that this might not happen in the next few weeks (I am trying to get the v2.0.1 working before becoming unavailable for a couple weeks), we could use the time to keep an eye on the JS side to make sure that the tooltips thing was a rarity.

As for autopyr - I am really interested in what you are doing with this, as I'm sure the rest of the community will be!

from altair.

g3o2 commented on May 31, 2024

Breaking of tooltips in vega-embed? vega-tooltip is about to see its first release version and is now integrated into vega-embed.

Breaking change for vega-embed resides in no longer needing to specify anything in or outside of vega-embed. In vega and vegalite, tooltip channels have been added as a new feature, so no break there either.

from altair.

terrytangyuan commented on May 31, 2024

@AliciaSchep I added a py_function_custom_scaffold() in reticulate some time ago here. Would that meet your needs? Note that it's not currently exported yet. i haven't looked into details of your autopyr package yet but the purpose seems to be similar so there might be things you can leverage and reuse from reticulate.

from altair.

AliciaSchep commented on May 31, 2024

@terrytangyuan thanks for sharing - I think that function may indeed be very useful for these purposes, I'll try it out!

from altair.

eibanez commented on May 31, 2024

Hello! I was pleasantly surprised when I learned about this project. I've been using vega-lite for a while and I love the idea of this package.

I've also been following Hadley Wickham's developments on non-standard evaluation (currently implemented in the rlang package). I thought this was a great opportunity to merge the two, so I took a stab at it.

Because of the way the normal pipe (%>%) is implemented, you cannot use it in this case without explicitly defining the altair functions or coming up with a workaround like the pype above. The reasons are explained here: tidyverse/magrittr#101

In that same link, the magrittr creator suggests using the dollar pipe (%$%) and that's what i did. The code below uses rlang magic to correctly apply the altair python methods. Parenthesis are optional too!!

Happy to discuss more. Keep up the good work!

library(altair)
library(magrittr)
library(rlang)

# Load datasets
vega_data <- import_vega_data()

# Overload %$% for Charts
with.altair.vegalite.v2.api.Chart <- function(chart, method) {
  # Capture the method
  mexpr <- rlang::enexpr(method)

  if (inherits(mexpr, "call")) {
    # Convert method name to string
    method.name <- as.character(mexpr[1])

    # Capture arguments
    if (length(mexpr) > 1) {
      args <- as.list(mexpr[2:length(mexpr)])
    } else {
      args <- list()
    }

    # Apply method and pass arguments
    out <- do.call(chart[[method.name]], args)
  } else {
    # "method" captured as a symbol
    method.name <- rlang::as_string(mexpr)
    out <- chart[[method.name]]
    
    # User tried to call a method without parenthesis
    if (inherits(out, "python.builtin.instancemethod")) {
      out <- do.call(out, list())
    }
  }

  out
}

chart <- 
  alt$Chart(r_to_py(vega_data$cars())) %$%
  encode(
    x = "Horsepower:Q",
    y = "Miles_per_Gallon:Q",
    color = "Origin:N"
  ) %$%
  mark_point

from altair.

AliciaSchep commented on May 31, 2024

Very cool @eibanez ! I did not know about %$%, thanks for sharing. I like the approach above because it allows:

vega_data$cars() %>% r_to_py() %>% alt$Chart() %$%
     encode(
         x = "Horsepower:Q",
         y = "Miles_per_Gallon:Q",
         color = "Origin:N"
     ) %$%
     mark_point

which fails when doing $ instead of %$% unless you awkwardly put parentheses around the first part. That pattern is useful because you can then start with a bit of data transformation and then pipe it directly into a plotting call...

@ijlyttle thoughts on whether this type of overload function could have a home in altair package itself or a helper package?

from altair.

eibanez commented on May 31, 2024

Cool! I had not thought about the precedence of $ and the pipe.

I can put together a PR if you like approach. I should probably test it some more and perhaps allow all Altair objects to take advantage of this, not just Chart.

We could also promote this approach upstream and see if it could live in reticulate.

Thoughts? @terrytangyuan

from altair.

ijlyttle commented on May 31, 2024

Hi @eibanez, thanks for your kind words and your constructive trouble-making :) Welcome!

Sorry for taking so long to jump in - I wanted to make sure I understand before saying too much.

However, at this point, perhaps I shall just have to risk my ignorance and ask you (all) to correct me if I misunderstand something.

This seems like an interesting compromise between using the existing $ operator and becoming fully pipeable.

As I understand tidyverse/magrittr#101, the "tidyverse" view is that $ and %>% are fundamentally different enough that we should have different operators. They outline some edge cases where things could fail in weird-and-difficult-to-detect ways. Although the proponent of tidyverse/magrittr#101 claims to have avoided problems, I fundamentally agree with the "safe" approach. From what I can tell, your proposal follows this safer approach.

I don't have the deep knowledge of the tidyverse folks for knowing where the dangers might be in overloading %$%, so I will have to ask you all (@eibanez, @AliciaSchep, @terrytangyuan), can you think of any edge cases where we could be burned?

That being said, I am open to bringing this into altair - I see this whole package as a bit of an experiment :) Perhaps we can use altair as a proving ground, and see what @terrytangyuan thinks about its use in reticulate.

Maybe the class to operate on would be altair.vegalite.v2.api.TopLevelMixin, but I agree with @eibanez that we probably have to figure this one out.

from altair.

eibanez commented on May 31, 2024

As I understand it, the summary of the magrittr issue is that we cannot use %>% without defining the functions that come afterwards. It seems to me that the level of effort it would take to create those functions from the altair codebase would be similar to developing a native R implementation (which would also have the benefits of not depending on having python correctly configured). I think there would be a value of doing that and developing a vegalite grammar that takes advantages of stuff that R can do and python cannot, but that is an entirely different issue that we can discuss at another point (which I might open with some thoughts).

I agree on the change of the class to operate. I simply used the first class listed, but we could go down even further than TopLevelMixin if we wanted.

I've read Hadley's chapter on non-standard evaluation a few times and I think the proposed solution will work. We probably would want to check some special cases, e.g., if you save the encode X string into a variable and use that variable instead.

The major problems of this approach are:

You block the with method for these objects (I don't think that's a huge issue for altair, but might be for reticulate as with is much more common in python programming).
The mixing of %>% and %$% will be confusing. Users will mess up and the errors are not going to be intuitive.

I hope I answered all the questions.

from altair.

ijlyttle commented on May 31, 2024

Thanks!

I think a native-R way into Vega-Lite, with all of the tidyverse and tidyeval goodness, will be a great and wonderful thing. I agree that it is a different discussion (and one that I will be happy to try to keep-up with).

I will be interested to hear @AliciaSchep's further thoughts, but I like what I see so far:

it should be straightforward to test the "X as a variable" case.
I don't see a lot of overlap with other python-usage (in which case the $ operator would remain available, no?).
the mixing of the operators is unavoidable, and is perhaps preferable given the magrittr discussion. We would be no-worse-off, in that respect, than where ggplot2 is today :)

from altair.

ijlyttle commented on May 31, 2024

Closing this issue in favor of vlbuildr 🚀

from altair.

idea: auto-generating a pipe-able api about altair HOT 24 CLOSED

Comments (24)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent