Coder Social home page Coder Social logo

Comments (7)

neoeinstein avatar neoeinstein commented on August 16, 2024

I know what the issue is here, and will have a PR that fixes this anyway. There's already a solution for this that comes from pbjson. Just using the #[serde] attributes blindly is going to lead to weird inconsistencies with the canonical serialized JSON form. pbjson fixes all that.

from aurae.

future-highway avatar future-highway commented on August 16, 2024

While we wait for the CLA bot I took a look at pbjson. I'm a little skeptical that it will solve this issue, because I don't think it is the serialization of JS/TS <-> proto that is the issue. I think the issue is at the JS/TS <-> Rust that Deno does. I'm looking forward to being pleasantly surprised by the PR.

But I'd like to probe your comment here if you have time to respond (at your convenience), somewhat for my own knowledge, but will help aurae too when we tackle #126.

My preferred approach for JSON serialization/deserialization has been to be liberal in what is accepted (i.e., camelCase, snake_case, etc.) to a degree, but be strict in the output (match the convention, i.e., camelCase fields, SCREAMIN_SNAKE_CASE enums, etc.). Your comment reads to me like you have insight on why that might be a bad idea. I appreciate any thoughts/input you might have.

from aurae.

neoeinstein avatar neoeinstein commented on August 16, 2024

In terms of the V8/Rust interop boundary, deno_core::op uses serde_v8 as the serializer/deserializer for non-fast-call-capable APIs. During the serialization into V8, Deno doesn't really care about any TS definition file. It just creates an object in the V8 runtime with properties named as specified in the serialization implementation. Thus, if the serialization specified to use snake_case names, then the object in V8 will have snake_case names. However, there is a mismatch because serde by default generates serializers that use the field names verbatim, and prost/tonic use the Rust standard snake_case names. Meanwhile the default code generation for ts-proto uses camelCase names (but can be configured to use the original snake_case if preferred). Because Auraescript uses the TS definition, it expects the script to use the camelCase convention, but through serialization, the actual V8 object has fields in the snake_case convention.

The same process applies on deserialization out of V8. The deserializer runs through the fields in the V8 object, but, with the default code generation, the new camelCased V8 object doesn't have any of the snake_cased fields that the derived serde deserializer is looking for. This is in part where pbjson comes in.

What you'll find is that the JSON serialization provided by pbjson is indeed a liberal reader for deserialization. It will accept both snake_case and the canonical camelCase when deserializing JSON. When dealing with the canonical JSON form of Protobuf, it is important not to be too liberal in what you accept, otherwise you might find yourself being bound by an edge of the "liberal" reader case which would be breaking to change, and we want the protobuf to define the schema (and through its canonical form, the valid JSON), not edge semantics of a particular JSON reader.

There may be a reason that you don't want to use pbjson: in order to have some custom serialization semantics that aren't built around plain old JSON objects. For example, if you're passing byte arrays, that would be more efficient to pass over the FFI layer without going through the base64 canonical serialization (and would probably require the TS harness you generate to include a MessageType.fromJSON/toJSON call with the canonical JSON semantics.

What you end up with is a problem where your gRPC API semantics are placing constraints on your FFI semantics, using the same data structure to cross multiple layers, kinda like exposing a data base structure directly to the API. You may actually prefer to have a distinct type for FFI (that way you can guarantee or require from TS that certain fields are inhabited, but which proto3 semantically allows to be empty). This would also allow you to provide the canonical JSON serialization for the gRPC API types, but allow you specify a more efficient serialization for the FFI interaction.

If you're not passing specialized types, and instead are primarily passing strings, integers, arrays, and objects, then the canonical JSON representation and the V8 FFI representation will overlap.

from aurae.

neoeinstein avatar neoeinstein commented on August 16, 2024

I know I just left a wall of text there, but wanted to provide another alternate option: Drop ts-proto and use our own protoc compiler plugin to generate TypeScript definitions, the Deno TS glue, and the Rust FFI glue with appropriate V8 serialization.

I am the author of protoc-gen-prost (and protoc-gen-tonic), and have experience writing custom protoc compiler plugins in Rust. It should be pretty straight forward, and then we'd have more control over the FFI boundary and could specify it more exactly. With ts-proto, users will see the Protobuf encode, decode, toJSON and fromJSON members on the interface types, which really have no relevance within auraescript. We'd also perhaps be able to do all the code generation in that plugin, rather than doing half with ts-proto, then the other half with a proc-macro.

from aurae.

future-highway avatar future-highway commented on August 16, 2024

Just to get some very initial thoughts down:

In my opinion, the main goal to keep in mind is that we want to stay at close to fully generated as possible for auraescript; small changes to the proto files should need a rebuild at most. That way development speed on the core functionality isn't hindered. Having looked at your comments only so far, I think your solutions meet that goal, which is great.

I've got no issue with finding another solution or dropping ts-proto (it's just what I was able to find). If we keep it, we can change the settings to eliminate toJSON/fromJSON, but they seem possibly useful. (I thought I already eliminated encode/decode).

Considering the two alternate solutions so far, buf gives me hesitation for multiple (potentially unfounded) reasons.

  1. I would rather rely on protoc as I think that is the standard and widely used generator.
  2. Does it give us a way to inject attributes like tonic/prost does? If not, it may limit our ability or force workarounds to generate other code using proc_macros, and limit our ability to keep rust lints on.
  3. Related to 2, we've been generating slightly different outputs between auraed and auraescript. The difference has been reduced since switching to Deno, potentially to the point of not mattering, but I'm not sure we should limit our ability to diverge again. I assume this can be solved by having multiple passes of the generator instead of the proposed aurae_proto crate. However, you're not the first to want a single crate for the proto output (I think #95), so the benefits may be adding up.

Admittedly, I don't know much about buf, except that I believe it is relatively new, and I'm happy to adapt to whatever others think is best. However, if we can use the lints from buf, whether we use it for more or not, I think that is a great idea.

A custom protoc plugin sounds appealing. I don't know how much effort is needed to make one or if we want to take on the maintenance cost, but I like that it will still use protoc.

from aurae.

neoeinstein avatar neoeinstein commented on August 16, 2024

I am 100% behind trying to be fully generated, and agree with the idea of using the protobuf schemas as the source of truth for everything downstream.

In regard to relying on buf vs. protoc, we can do both. Buf is effectively a driver for protoc for code generation, but also looks to build up a more complete ecosystem around protobuf. Thus, we can get rid of buf altogether and just string together the right incantations to protoc. Everything we can do via protoc, we can do via buf1, but we can also do more, linting, formating, and breaking-change detection among them. The key thing that I used here was "remote plugins". Effectively, instead of needing to install all the various plugins locally (protoc-gen-prost, protoc-gen-tonic, protoc-gen-ts_proto, and protoc-gen-doc), we can instead invoke them remotely from the Buf Schema Registry. This does require network access for code generation, but if we wanted something in the middle, we can use local plugins instead. We just need to make sure that we install all such plugins.

In this, I'm happy to back out buf, but we may want a Makefile entry that installs those code generation pre-requisites, and then make proto could drive protoc directly as appropriate. I tend to like the additional things that buf brings, but I completely understand if those aren't compelling at this point in the project.

On the code generation side of things, I am the author of protoc-gen-prost and protoc-gen-tonic, co-maintainer of prost, and I've authored additional internal protoc plugins based off of protoc-gen-prost. I believe that, with the pattern already established, the difficulty of creating a custom plugin is not that high, and likely a bit less difficult than writing a proc-macro.

A protoc compiler plugin receives a protobuf CodeGeneratorRequest, which includes options passed in the command line2, as well as the files we're that code generation is being requested for. From there, we can enumerate through the input protobuf files and messages, and generate arbitrary output files through the CodeGeneratorResponse that is passed back to protoc. The hardest part in that particular endeavor is pulling comments out of the source info3, but I've done this as well.

I think that a purpose-built protoc compiler plugin would give aurae the best level of control, and avoid it needing to generate intermediate TypeScript, parse it, and then modify and generate more TypeScript and Rust. We'd also have explicit control over the FFI boundary for V8 too, rather than relying on implicit guarantees, and the proc-macro could likely be retired in favor of this plugin.

Footnotes:
1: You may notice that buf.gen.yaml contains some demonstration of it providing additional command line options to the compiler plugins.
2: Both protoc-gen-prost and protoc-gen-tonic support all of the relevant configuration options that are provided by directly accessing prost-build or tonic-build. This was an explicit goal of these projects.
3: The SourceCodeInfo structure that holds these comments is stored in a parallel data structure as a sorted vector that is addressed in a non-intuitive manner.

from aurae.

future-highway avatar future-highway commented on August 16, 2024

First of all, thank you for all the info/context.

I finally had a chance to look at #129, and since it has been merged, I think we are ok with what we have for now. Let's see how things progress and where the pain points are (if any).

Some notes:

  • The proc-macro isn't currently actually parsing any TS, so I'm not concerned about that.
  • Re pbjson: I don't know if squeezing every bit of efficiency out of the TS <-> Rust conversions is necessary, so the code it generates is likely sufficient.

from aurae.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.