cosmos / cosmos-proto Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
Whilst iterating over the official goproto repo, many developers asked for reasons for which code generation fast paths (specifically related to marshalling/unmarshalling) implementation were never done. One of the answers given by the owners was that the big code size was a problem for some environments.
Might this something we need to consider in the scope of the cosmos ecosystem?
A solution could be generating a build tag for slow paths, ex:
types.slow.pb.go
// +build slow
... reflection code...
types.pb.go
// +build !slow
.. fast path code ...
we want to fork in protoc-gen-go to make it easier to introduce new features and have only 1 binary
Currently, fastreflection implements protomethods as nil
, which means that in order to marshal a protobuf message, it's going to get the fields one by one using protoreflect.Message.Range
. We should reconcile this with the fastmarshal implementation.
the AST patcher currently shifts comments after removing the ProtoReflect method from *pb.go files.
Regarding the issue documented at cosmos/cosmos-sdk#19213, I am seeking clarity on the potential deprecation of the Proto API v1. In API v1, the utilization of custom type functions is facilitated by gogoproto, which appears not to be supported in API v2. Could you please provide guidance on how to address this limitation within API v2?
Probably the only things that really need to be exposed are in runtime
and the cosmos_proto.proto
extensions.
cosmos_proto.scalar
proto option that applies string
and bytes
typesscalars:
cosmos.Int: github.com/cosmos/cosmos-sdk/types.Int
CustomType
golang interface that all golang scalar type implementations need to implement (ex. Int
), this is what we're using for gogo proto and can maybe reuse:type CustomType interface {
Marshal() ([]byte, error)
MarshalTo(data []byte) (n int, err error)
Unmarshal(data []byte) error
Size() int
MarshalJSON() ([]byte, error)
UnmarshalJSON(data []byte) error
}
An alternative that just deals with string
and bytes
types could be:
type CustomStringType interface {
MarshalString() (string, error)
UnmarshalString(string) error
}
type CustomBytesTypes interface {
MarshalBytes() ([]byte, error)
UnmarshalBytes([]byte) error
}
Depending on whether cosmos_proto.scalar
is used on a string
or bytes
type, the corresponding interface would be used.
protoreflect.Message
handlingNeeded for custom types and Any
s, will also speed up other codes.
Types to implement:
ProtoReflect.Messsage:
Protoreflect.List:
Protoreflect.Map:
Ex for https://pkg.go.dev/google.golang.org/[email protected]/reflect/protoreflect#Message Has
method:
func (foo *Foo) Has(f FieldDescriptor) bool {
switch f.Number() {
case 1:
return foo.Bar != nil
default:
return false
}
}
If we wanted to reuse the default protoreflect implementation, maybe this will work but it won't speed up other code:
func (foo *Foo) Has(f FieldDescriptor) bool {
switch f.Number() {
case 1:
return foo.Bar != nil
default:
return f.ProtoReflect().Has(f)
}
}
Implementing all of the protoreflect.Message
methods to be as fast as possible is the goal. Some benchmarks are here: cosmos/cosmos-sdk#9156 (reply in thread).
Given the following proto message type:
message Foo {
string type = 1;
}
We will produce a Foo.Type
field and a Foo.Type() protoreflect.MessageType
method, this causes code conflicts.
If a message uses a reserved name, for the sake of simplicity, we should suffix the field with an underscore _
.
I would also output a warning message that this is bad, and we might break it in the future (reason for breaking is that we are incompatible with protoc field codegen which I do not think is good).
Currently we're generating custom protoreflect.Message
methods on the message struct itself. But this may unnecessarily pollute the methods that all structs implement. What if a user wanted to implement a method named Has
or Range
for instance? They wouldn't be able to do it because it would conflict with the generated code.
An alternative would be a wrapper struct that implements protoreflect.Message
, i.e.:
type barReflect struct {
*Bar
}
func (x *Bar) ProtoReflect() protoreflect.Message {
return barReflect{x}
}
Based on the discussion, since protov3 does not support extensions, which makes protoreflect fast paths implementation a lot more easier, we shoudl either disable proto2 support or fallback to the original codegen (and maybe output a warning?)
This will reuse the existing cosmos_proto.implement_interface
and cosmos_proto.accepts_interface
: https://github.com/regen-network/cosmos-proto/blob/master/cosmos.proto
We want a mapping from proto interface names to golang interface names, ex:
interfaces:
cosmos.gov.Content: github.com/cosmos/cosmos-sdk/x/gov/types.Content
Then the generated code should use the interface and not Any
, ex:
type MsgSubmitProposal struct {
Content Content
InitialDeposit []*Coin
Proposer string
}
protoreflect.Message
and fast marshaling handling will need to be generated
bring in protoc #23
struct code gen
protoreflection & proto methods
Proposal:
Add a feature that allows certain fields to disallow omitempty json flag. This is used in a few places in the sdk, keeping inline with this could remove some headache for users.
FastReflection currently does not support nested messages. It produces code that does not compile.
Write a new utililty function that's the same as dynamicpb.NewMessageType
, but when a nested field is a known type, use that resolved type, instead of a new dynamicpb.Message.
ref: cosmos/cosmos-sdk#15302 (comment)
Consider a custom Msg, suppose it's only registered in gogo, and not in protov2:
message MsgCustom {
google.protobuf.Any a = 1;
}
Then when our Unpack
utility receives this message, it sees MsgCustom
and will use dynamicpb.NewMessageType to generate the whole proto.Message, including the nested field. MsgCustom.A
will have type dynamicpb.Message
.
Is this expected, or is it more helpful to have MsgCustom.A
be an anypb.Any
?
dynamicpb.NewMessageType
Using this, we know that proto.Messages created with dynamicpb are dynamicpb all the way down. There are no statically-created types used.
However, this also means that callers in general cannot type-cast. For example, in Textual, we use a switch case to handle both cases (dynamicpb or known type): https://github.com/cosmos/cosmos-sdk/blob/9a83c202cd545265f54d15090c6b150dc692f0f6/x/tx/signing/textual/any.go#L134-L148
We probably can move some utility functions like this to cosmos-proto
if they are repeated.
dynamicpb.NewMessageType
with known typesWe should pass a file resolver / type resolver to this new function, but it shouldn't be too hard to write it. The positive point is that callers can always type-cast to known types.
The plugin currently does not support optional fields:
testpb/1.proto: is a proto3 file that contains optional fields, but code generator protoc-gen-go-fasteflection hasn't been updated to support optional fields in proto3.
Currently ProtoReflect()
returns the reflection based protoreflect.Message
, but since we are implemented protoreflect.Message
on our message type directly (for now), we need to change this to:
func (x *Bar) ProtoReflect() protoreflect.Message {
return x
}
This also means that all the generated methods like Descriptor
can't just call ProtoReflect().Descriptor()
because that would be self reflexive. But we can deal with that in #3.
Our initial approach was to attempt to create a protoc-gen-go-lite and only generate what was needed, but as we found a need for descriptors and protoc lang features, more code was needed from protoc-gen-go.
The issue that was brought up from using isolated plugins was that we would end up with multiple generated files per proto file. Solutions to this problem:
x/upgrade/generated/*.pb.go
The Range function implemented on the fastreflection structs currently check the fields against their default values to decide if we should call f
on these fields
https://github.com/cosmos/cosmos-proto/blob/main/testpb/1.pulsar.go#L316-L335
We should probably just call f
on all fields unless they have the possibility of being nil, since we don't really have notion of "presence" in proto3:
In proto3, field presence for scalar fields simply doesn't exist. Your mental model for proto3 should be that it's a C++ or Go struct. For integers and strings, there is no such thing as being set or not, it always has a value. For submessages, it's a pointer to the submessage instance which can be NULL, that's why you can test presence for it.
Hi.
I hope you are having a great new year so far.
Is there documentation on how to use this library? Someone recommended I consider it for a project we are working and I would like to know how to do so.
Thanks in advance.
To reproduce the bug generate the following proto with pulsar:
// Copyright 2018 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
syntax = "proto3";
package goproto.proto.test3;
option go_package = "github.com/cosmos/cosmos-proto/internal/testprotos2";
enum NestedEnum {
FOO = 0;
BAR = 1;
BAZ = 2;
NEG = -1; // Intentionally negative.
}
message TestAllTypes {
NestedEnum singular_nested_enum = 101;
map <string, NestedEnum> map_string_nested_enum = 73;
}
The following are the codegen snippets which do not match with goproto and that cause descriptors to panic:
Sample 1:
var file_internal_testprotos_test_proto_depIdxs = []int32{
2, // 0: goproto.proto.test3.TestAllTypes.map_string_nested_enum:type_name -> goproto.proto.test3.TestAllTypes.MapStringNestedEnumEntry
0, // 1: goproto.proto.test3.TestAllTypes.singular_nested_enum:type_name -> goproto.proto.test3.NestedEnum
0, // 2: goproto.proto.test3.TestAllTypes.MapStringNestedEnumEntry.value:type_name -> goproto.proto.test3.NestedEnum
3, // [3:3] is the sub-list for method output_type
3, // [3:3] is the sub-list for method input_type
3, // [3:3] is the sub-list for extension type_name
3, // [3:3] is the sub-list for extension extendee
0, // [0:3] is the sub-list for field type_name
}
Expected:
var file_internal_testprotos2_test_proto_depIdxs = []int32{
0, // 0: goproto.proto.test3.TestAllTypes.singular_nested_enum:type_name -> goproto.proto.test3.NestedEnum
2, // 1: goproto.proto.test3.TestAllTypes.map_string_nested_enum:type_name -> goproto.proto.test3.TestAllTypes.MapStringNestedEnumEntry
0, // 2: goproto.proto.test3.TestAllTypes.MapStringNestedEnumEntry.value:type_name -> goproto.proto.test3.NestedEnum
3, // [3:3] is the sub-list for method output_type
3, // [3:3] is the sub-list for method input_type
3, // [3:3] is the sub-list for extension type_name
3, // [3:3] is the sub-list for extension extendee
0, // [0:3] is the sub-list for field type_name
}
Sample 2:
type TestAllTypes struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
MapStringNestedEnum map[string]NestedEnum `protobuf:"bytes,73,rep,name=map_string_nested_enum,json=mapStringNestedEnum,proto3" json:"map_string_nested_enum,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"varint,2,opt,name=value,proto3,enum=goproto.proto.test3.NestedEnum"`
SingularNestedEnum NestedEnum `protobuf:"varint,101,opt,name=singular_nested_enum,json=singularNestedEnum,proto3,enum=goproto.proto.test3.NestedEnum" json:"singular_nested_enum,omitempty"`
}
Expected:
type TestAllTypes struct {
state protoimpl.MessageState
sizeCache protoimpl.SizeCache
unknownFields protoimpl.UnknownFields
SingularNestedEnum NestedEnum `protobuf:"varint,101,opt,name=singular_nested_enum,json=singularNestedEnum,proto3,enum=goproto.proto.test3.NestedEnum" json:"singular_nested_enum,omitempty"`
MapStringNestedEnum map[string]NestedEnum `protobuf:"bytes,73,rep,name=map_string_nested_enum,json=mapStringNestedEnum,proto3" json:"map_string_nested_enum,omitempty" protobuf_key:"bytes,1,opt,name=key,proto3" protobuf_val:"varint,2,opt,name=value,proto3,enum=goproto.proto.test3.NestedEnum"`
}
If we replaced the malformed snippet in Sample 1
with the expected one the panic does not happen. Although generated code should match 1 on 1 what goproto produces.
ref: #21
tracking issue for testing extensions
In order to test that the behavior of fast reflection codegen is correct, let's write tests against the slow reflection path using dynamicpb for the same messages. We can use https://github.com/flyingmutant/rapid to write property based tests for this.
As a follow-up to #8, let's explore a way to have generic map and list wrapper types rather than doing codegen for each of them.
This will make codegen much easier. See https://docs.buf.build/bsr/remote-generation/plugin-example
Let's remove testpb package and use the internal/testing/test3.TestAllTypes for testing.
We should generate random messages through the fuzzer and then test the protoreflect logic on it.
Marshal and Unmarshal fast methods produce some functions with constant names, ex: skip
, encodeVarint
and variables such as ErrOverflow
.
This becomes problematic when protoc
compiles two files because it will redeclare those functions and go code will not compile.
We can get cosmos_proto
to be loaded from the buf schema registry (https://buf.build) if we do the proper buf.yaml
setup. I already have a repo reserved: https://buf.build/cosmos/cosmos-proto
Here's the current cosmos_proto
file: https://github.com/regen-network/cosmos-proto/blob/master/cosmos.proto
We can delete the interface_type
option as that's no longer used.
Our deterministic marshalling implementation enforces canonical marshalling for messages.
This is achieved because we generated that code with pulsar codegen.
But what happens when we import a message that was not generated with cosmos-proto?
Ex:
message GeneratedWithCosmosProto {
google.Protobuf.FileDescriptor not_generated_with_cosmos_proto = 1;
string field = 1;
map<string, int64> map_field = 2;
...
}
How do we detect that the import is not generated with pulsar from marshal codegen, and then how do we marshal it in a canonical way without impacting performance much?
Use cases:
google.Protobuf.FileDescriptor
message.NOTE:
We can enforce deterministic marshalling for imported objects, protobuf marshal options supports this, but there is no guarantee that that deterministic marshalling follows https://github.com/cosmos/cosmos-sdk/blob/master/docs/architecture/adr-027-deterministic-protobuf-serialization.md
Pulsar generated files are missing a comment like:
// Code generated by protoc-gen-go-pulsar. DO NOT EDIT.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.