google / proto-lens Goto Github PK
View Code? Open in Web Editor NEWAPI for protocol buffers using modern Haskell language and library patterns.
Home Page: https://google.github.io/proto-lens
License: BSD 3-Clause "New" or "Revised" License
API for protocol buffers using modern Haskell language and library patterns.
Home Page: https://google.github.io/proto-lens
License: BSD 3-Clause "New" or "Revised" License
We should preserve unknown fields and enums in proto2 messages.
From the docs:
https://developers.google.com/protocol-buffers/docs/proto
Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available.
Also, enums with unknown values should be treated like unknown fields and preserved for reserialization. (Note that this behavior is proto2-only; #28 describes the desired behavior for proto3.)
This library is a memory hog when it comes to parsing repeated packed scalar types. It would be nice if it would use contiguous arrays instead of lazy lists where appropriate.
Hello,
I was trying to build this with stack install
and got this error:
-- Dumping log file due to warnings: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log
[1 of 2] Compiling Main ( /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/Setup.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/Main.o )
[2 of 2] Compiling StackSetupShim ( /home/g/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/StackSetupShim.o )
Linking /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup ...
Configuring proto-lens-protobuf-types-0.2.2.0...
proto-src: warning: directory does not exist.
proto-src/google/protobuf/any.proto: No such file or directory
callProcess: /usr/local/bin/protoc
"--plugin=protoc-gen-haskell=/home/g/src/haskell/proto-lens/.stack-work/install/x86_64-linux-nopie/lts-9.0/8.0.2/bin/proto-lens-protoc"
"--haskell_out=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/build/autogen"
"--proto_path=proto-src" "proto-src/google/protobuf/any.proto"
"proto-src/google/protobuf/duration.proto"
"proto-src/google/protobuf/wrappers.proto" (exit 1): failed-- End of log file: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log
Log files have been written to: /home/g/src/haskell/proto-lens/.stack-work/logs/
Progress: 33/36
-- While building package proto-lens-protobuf-types-0.2.2.0 using:
/home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup --builddir=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0 build lib:proto-lens-protobuf-types --ghc-options " -ddump-hi -ddump-to-file"
Process exited with code: ExitFailure 1
Logs have been written to: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log[1 of 2] Compiling Main ( /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/Setup.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/Main.o ) [2 of 2] Compiling StackSetupShim ( /home/g/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/StackSetupShim.o ) Linking /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup ... Configuring proto-lens-protobuf-types-0.2.2.0... proto-src: warning: directory does not exist. proto-src/google/protobuf/any.proto: No such file or directory callProcess: /usr/local/bin/protoc "--plugin=protoc-gen-haskell=/home/g/src/haskell/proto-lens/.stack-work/install/x86_64-linux-nopie/lts-9.0/8.0.2/bin/proto-lens-protoc" "--haskell_out=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/build/autogen" "--proto_path=proto-src" "proto-src/google/protobuf/any.proto" "proto-src/google/protobuf/duration.proto" "proto-src/google/protobuf/wrappers.proto" (exit 1): failed
Apparently it is looking for proto-src/google/protobuf/any.proto
and that might be causing the issue?
The FieldDescriptor name is a String, which is probably less efficient than Text. The same goes for fieldsByTextFormatName.
This probably only affects text format decoding, since the wire format uses it only for error messages. We can add a benchmark and see whether it makes a difference.
Some related changes to help simplify the API around fields in proto-lens:
Hide the constructors for proto messages. They're even less useful than before now that we have unknown fields. (Developers can still click source
in the Haddock docs to see the underlying implementation.)
Make the Show
instance not display the internal fields, instead using the text format, for example:
showsPrec _ x = showChar '{' . showString (showMessageShort x) . showChar '}'
Add Haddock comments for every proto message that list the names and types of all available lenses. Note: this is a little tricky (but doable) since haskell-src-exts
doesn't easily support inserting top-level comments. (Note: include the accessor for unknown fields.)
Description of proto3 enums (reformatted from the docs):
During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent.
- In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation.
- In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.
Currently (i.e., on HEAD) we're using option #1. That is, if we had enum Foo = { A = 1; B = 2 }
then we generate newtype Foo = Foo Int32
and define A
and B
as pattern synonyms:
pattern A = Foo 1
pattern B = Foo 2
This is simpler, but limits our ability to get exhaustiveness checking from the compiler. Specifically,
if someone adds a new enum case to the proto, the type checker won't tell us that we're now missing a case. This issue happened to us in real code.
GHC 8.2.1 does has COMPLETE
directives for pattern synonyms, but (a) it's too soon to drop support for 8.0, and (b) that's a newer and less-well-understood feature.
The proposal for the new API is similar to what already exists for Scala and Java. For:
enum Foo = { A = 1; B = 2; }
generate the following code:
data Foo = A | B | Foo'Unrecognized Foo'UnrecognizedValue
-- | Representation of an unknown value. Uses a newtype to make
-- the different branches of `Foo` provably distinct.
-- For example, this way we don't have to worry about whether
-- `A == Foo'Unrecognized (Foo'UnrecognizedValue 1)`.
newtype Foo'UnrecognizedValue = Foo'UnrecognizedValue Int32 -- hidden constructor
unrecognizedValue'Foo :: Foo'UnrecognizedValue -> Int32
instance Enum Foo where
toEnum 1 = A
toEnum 2 = B
toEnum n = Foo'Unrecognized (Foo'UnrecognizedValue n)
fromEnum = ...
showMessage and the related pprintMessage and showMessageShort functions use the Haskell string escaping conventions instead of the C ones. This means that non-printing characters get written as, e.g. "\SOH", which https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.cc#L1039 won't parse. Worse, in Haskell the escape "\101" means decimal 101 whereas the tokenizer.cc code (following C convention) interprets that as octal, i.e. decimal 97.
In order, the following will be built (use -v for more details):
- proto-lens-combinators-0.1.0.8 {proto-lens-combinators-0.1.0.8-inplace} (lib:proto-lens-combinators) (first run)
[1 of 1] Compiling Main ( /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/setup.hs, /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/Main.o )
Linking /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/setup ...
<<ghc: 195021320 bytes, 90 GCs, 9019646/22334576 avg/max bytes residency (7 samples), 57M in use, 0.001 INIT (0.001 elapsed), 0.123 MUT (1.892 elapsed), 0.185 GC (0.185 elapsed) :ghc>>
Configuring proto-lens-combinators-0.1.0.8...
<<ghc: 190663776 bytes, 110 GCs, 12726334/39654304 avg/max bytes residency (8 samples), 102M in use, 0.001 INIT (0.001 elapsed), 0.101 MUT (0.101 elapsed), 0.218 GC (0.221 elapsed) :ghc>>
==========
Error: couldn't find the executable "proto-lens-protoc" in your $PATH.
Please file a bug at https://github.com/google/proto-lens/issues .
==========
Missing executable "proto-lens-protoc"
CallStack (from HasCallStack):
error, called at src/Data/ProtoLens/Setup.hs:297:13 in proto-lens-protoc-0.2.2.0-6fbfcc9fefb6f837231240070e1fad9e51f23d5d830dd28e2a4fa31f1e705ca4:Data.ProtoLens.Setup
proto-lens-protoc
touches (updates the modification time of) generated .hs
files even when the contents did not change, causing unecessary rebuilds when compiling with ghc.
This is because currently GHC considers only the mtime of the input file for determining whether something has to be recompiled, not its contents.
The problematic code is here:
proto-lens/proto-lens-protoc/src/Data/ProtoLens/Setup.hs
Lines 307 to 312 in fe05638
It seems that it is protoc
that touches / rewrites the files.
Which way should this be fixed?
output
directory, the moving the .hs
files over only if they aren't identical?protoc
itself to not write the files if the contents are the same?Please upload the packages which are updated for ghc-8.2.1 to the hackage.
These packages (on the hackage) are still not able to work with ghc-8.2.1 because of depend base (>=4.8 && <4.10)
.
Given this protobuf
message Date {
int32 year = 1;
int32 month = 2;
int32 day = 3;
}
you get
day: 1 month: 9 year: 2016
I think it would be better to order by tag number.
While writing the implementation for prisms I had to go back and forth between the language extension library, Combinators.hs and Generate.hs figuring out what did what.
I think some documentation of Combinators.hs would be super helpful for any future development on those files.
Very useful for JSON parsing etc..
The proto-lens
, proto-lens-protoc
and proto-lens-descriptors
packages are tied pretty closely together. Consider consolidating some or all of them.
The main concern is bootstrapping. Changing the internals of lens-labels
or proto-lens
effectively breaks proto-lens-descriptors
until the descriptor modules can be regenerated -- but regenerating them requires a working proto-lens-protoc
, which depends on proto-lens-descriptors
, introducing a cycle. The current bootstrap script solves this using the fact that they're all separate packages: it builds a new proto-lens-protoc
against an old, working version of lens-labels
, proto-lens
and proto-lens-descriptors
, and uses that compiler to generate the new descriptor modules. I'm not sure how to implement that process if they're all in the same Cabal package.
Cabal 2.0 added a function autogenPackageModulesDir
which we should use in Data.ProtoLens.Setup
if it's available. That would let us generate modules separately for each component (e.g. library vs exe vs tests), rather than generating them all in one place.
At minimum, this would prevent confusing GHC/Cabal errors when an exe imports a proto module but doesn't specify it in other-deps, and the module is specified for a library, but the test accidentally doesn't depend on the library.
Currently proto-lens returns an error when decoding unknown enum values. It should instead accept and preserve such values.
Quoting the proto3 docs:
During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation. In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.
.proto files with "import public" statements don't re-export the public imports.
For example, if foo.proto
contains
public import "bar.proto"
Then the generated module Foo.hs
should re-export all the names defined in Bar.hs
.
I think this is doable, but it might be a little tricky to avoid name conflicts between the autogenerated field accessors in Foo.hs
and Bar.hs
.
In stackage nightly. Full trace:
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ghc -clear-package-db -global-package-db -package-db=/var/stackage/work/builds/nightly/pkgdb -hide-all-packages -package=Cabal -package=base -package=proto-lens-protoc Setup
[1 of 1] Compiling Main ( Setup.hs, Setup.o )
Linking Setup ...
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ./Setup configure --enable-tests --package-db=clear --package-db=global --package-db=/var/stackage/work/builds/nightly/pkgdb --libdir=/var/stackage/work/builds/nightly/lib --bindir=/var/stackage/work/builds/nightly/bin --datadir=/var/stackage/work/builds/nightly/share --libexecdir=/var/stackage/work/builds/nightly/libexec --sysconfdir=/var/stackage/work/builds/nightly/etc --docdir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --htmldir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --haddockdir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --flags=
Configuring proto-lens-combinators-0.1.0.8...
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ghc -clear-package-db -global-package-db -package-db=/var/stackage/work/builds/nightly/pkgdb -hide-all-packages -package=Cabal -package=base -package=proto-lens-protoc Setup
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ./Setup build
unrecognized option `--plugin=protoc-gen-haskell=/var/stackage/work/builds/nightly/bin/proto-lens-protoc'
unrecognized option `--haskell_out=dist/build/global-autogen'
unrecognized option `--proto_path=tests'
Usage: protoc [OPTION]... FILES
-h --help show usage
-v --version show version number
Preprocessing library for proto-lens-combinators-0.1.0.8..
Building library for proto-lens-combinators-0.1.0.8..
[1 of 1] Compiling Data.ProtoLens.Combinators ( src/Data/ProtoLens/Combinators.hs, dist/build/Data/ProtoLens/Combinators.o )
Preprocessing test suite 'combinators_test' for proto-lens-combinators-0.1.0.8..
Setup: can't find source for Proto/Combinators in tests,
dist/build/combinators_test/autogen, dist/build/global-autogen
Currently every proto file needs to be specified twice in the .cabal file
: the raw .proto
file in extra-src-files
, and the Proto.*
module in exposed-modules
/other-modules
.
From very basic experiments with stack, I think it's possible to drop the latter requirement and have our Setup script populate the list of Haskell modules automatically, by changing the PackageDescription
and/or LocalBuildInfo
.
In addition to less redundancy, this will help with:
hpack
(in particular, its ability to autodetect the exposed-modules
and other-modules
)The exact design of this feature is still an open question: can (should?) we provide control to the user over whether their protos end up in exposed-modules
or other-modules
? Or in individual components (e.g. tests, executables or benchmarks)? For example, proto-lens-combinators
contains a proto test file which is test-only and not intended to be exported from the library.
Hi, thanks for proto-lens
; I got it to work on a gRPC client project with somewhat complicated .proto
s. While making it work, I had to patch a workaround to support valid Enum definitions which do not implement the recommended style-guide and use lower-cased enum value names.
My workaround is to call toUpper
on enum names. This solution is not really great so I wanted to discuss with you the best implementation choices before making a proper pull request. You can find my patches at:
master...lucasdicioccio:workarounds .
Extensions (proto2-only) currently aren't supported yet. It's not clear what the API should look like.
This would primarily be useful for legacy code, since proto3 replaces extensions with the Any
type (#22).
Currently, required fields are defaulted to the "zero" value for that type. We should instead provide smart construction that checks at compile time whether all the required fields have been set properly.
Note that this is moot for proto3, which got rid of the concept of required fields altogether.
One possible, nebulously-described approach: for every datatype Foo
, also define a Foo'Builder
which is parametrized by the type of each required field (and which may be ()
if it's not set). This Foo'Builder
can be an instance of Default
(instead ofFoo
), and we can provide lenses to build up its individual fields, as well as a class to "freeze" Foo'Builder
into Foo
once all its fields have been set.
message AcmeObservation {
oneof status {
ActionWin win = 2;
CompletedHurdleStatus completed_hurdle = 3;
QualifyTransaction qualify_transaction = 4;
}
}
results in the generated haskell:
data AcmeObservation'Status = AcmeObservation'Win !ActionWin
| AcmeObservation'Completed_hurdle !CompletedHurdleStatus
| AcmeObservation'Qualify_transaction !QualifyTransaction
deriving (Prelude.Show, Prelude.Eq)
Notice AcmeObservation'Completed_hurdle
, which should become AcmeObservation'CompletedHurdle
according to the renaming of all other snake case identifiers.
Hello - this is to start a discussion if proto-lens & haskell-indexer could support this feature. For a background, please read https://kythe.io/docs/schema/indexing-protobuf.html .
TLDR for proto-lens: the generated Haskell code (on specific request) should be annotated with proto2.GeneratedCodeInfo
-equivalent data, most importantly path of proto file and the "magic path string" of the proto entity.
A complication is that proto-lens AFAIU doesn't generate direct field lens, rather string proxy lens (what's the correct term for this)?, so maybe the specific typeclass instance methods (these are the lens, am I right?) need to be annotated.
Then a complication for haskell-indexer is that now it emits a reference to the class method instead of the instance method from the use site (assuming the instance is fixed at the use-site). The indexer should rather reference the instance method (and of course emit a generates
edge from the proto VName to the instance method lens VName), which might be possible to find out from the AST, though some digging is required here.
Open questions:
Does this sound reasonable for proto-lens?
How to parametrize proto-lens to get the metadata emitted? How do we arrange that this happens only in haskell-indexer mode?
How should the metadata be emitted? In the C++ example, a new .pb.meta
include is generated and included into the .pb.h
. I think the main point is that the indexer should have somewhat convenient access to this - for example the data (generated Haskell spans -> proto source info) could also be shipped in a side-channel file.
+@judah @blackgnezdo for proto-lens
Hi, again. On top of #152 I had to make another workaround for the project I was using: the project had a directory with a UNIX-hidden-directory such as .protodir/myfilename.proto
. The current plugin will generate modules named Proto..protodir.MyFileName
, which is invalid Haskell.
As for the other bug I filed, I wanted to discuss with you the best implementation choices before making a proper pull request. You can find my patches at:
master...lucasdicioccio:workarounds .
If a field has the wrong wire type, currently we fail the decode. Instead, we should just ignore that field (and/or add it to the unknown fields set, once we've implemented #29).
I confirmed that C++, Java and Go all have this more lenient behavior (although it's not documented well).
Also mentioned here:
apple/swift-protobuf#342
I got a socket connection that is used to stream data with delimited protobuf messages (see this). In the java api there is an convenient method called parseDelimitedFrom. How can this be achieved using this library?
I am new to haskell, so i might have overlooked something. I am sorry if this is very obvious.
Is the omission of the predefined timestamp proto intentional?
The TextFormat encoding may use either single or double quotes (though they must match). However proto-lens only supports double-quotes so far.
The error message looks like:
unexpected "'"
expecting "-", number, literal string or identifier
I found documentation of this behavior in the protobuf sources:
https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.h#L116
Currently proto-lens's TextFormat parser uses Text.Parsec.Token.stringLiteral
which doesn't support single-quoted strings:
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Token.html#v:stringLiteral
We should support the Any
type that was introduced in proto3:
https://developers.google.com/protocol-buffers/docs/proto3#any
At its core, Any
is just another protocol buffer message (defined in google/protobuf/any.proto
), so we should already be able to handle protos that reference it. However, we can add a nicer API on top for converting to/from an arbitrary message type, siimilar to what the C++ and Java bindings provide.
We currently treat "oneof" fields similar to optional fields. This is an intended backwards-compatible behavior of the wire encoding, but makes some use cases more awkward.
https://developers.google.com/protocol-buffers/docs/proto#oneof
One possible approach is to store the value as a sum type internally, and provide lenses that return a default value when their case isn't set (as well as "maybe'foo" variants). Another option (less memory efficient) is to store the fields normally, but make each field's lens clear out all the other fields when it's being set.
It would be nice if protolens automatically generated Ord
instances of messages. This could be hidden behind a cabal flag if it turned out to be egregiously slow.
I've got the following protobuf definition:
syntax` = "proto2";
message Request {
required string uri = 1;
required string userUuid = 2;
}
But the generated code downcases the userUuid
field so the "accessor" function is: useruuid
which doesn't seem right. The only reference to changing case is to do with groups so I assume this isn't correct.
I've put the resultant file in line as I can't attach it:
{- This file was auto-generated from Request.proto by the proto-lens-protoc program. -}
{-# LANGUAGE ScopedTypeVariables, DataKinds, TypeFamilies,
MultiParamTypeClasses, FlexibleContexts, FlexibleInstances,
PatternSynonyms #-}
{-# OPTIONS_GHC -fno-warn-unused-imports#-}
module Proto.Request where
import qualified Prelude
import qualified Data.Int
import qualified Data.Word
import qualified Data.ProtoLens.Reexport.Data.ProtoLens
as Data.ProtoLens
import qualified
Data.ProtoLens.Reexport.Data.ProtoLens.Message.Enum
as Data.ProtoLens.Message.Enum
import qualified Data.ProtoLens.Reexport.Lens.Family2
as Lens.Family2
import qualified Data.ProtoLens.Reexport.Lens.Family2.Unchecked
as Lens.Family2.Unchecked
import qualified Data.ProtoLens.Reexport.Data.Default.Class
as Data.Default.Class
import qualified Data.ProtoLens.Reexport.Data.Text as Data.Text
import qualified Data.ProtoLens.Reexport.Data.Map as Data.Map
import qualified Data.ProtoLens.Reexport.Data.ByteString
as Data.ByteString
data Request = Request{_Request'uri :: !Data.Text.Text,
_Request'useruuid :: !Data.Text.Text}
deriving (Prelude.Show, Prelude.Eq)
type instance Data.ProtoLens.Field "uri" Request = Data.Text.Text
instance Data.ProtoLens.HasField "uri" Request Request where
field _
= Lens.Family2.Unchecked.lens _Request'uri
(\ x__ y__ -> x__{_Request'uri = y__})
type instance Data.ProtoLens.Field "useruuid" Request =
Data.Text.Text
instance Data.ProtoLens.HasField "useruuid" Request Request where
field _
= Lens.Family2.Unchecked.lens _Request'useruuid
(\ x__ y__ -> x__{_Request'useruuid = y__})
instance Data.Default.Class.Default Request where
def
= Request{_Request'uri = Data.ProtoLens.fieldDefault,
_Request'useruuid = Data.ProtoLens.fieldDefault}
instance Data.ProtoLens.Message Request where
descriptor
= let uri__field_descriptor
= Data.ProtoLens.FieldDescriptor "uri"
(Data.ProtoLens.StringField ::
Data.ProtoLens.FieldTypeDescriptor Data.Text.Text)
(Data.ProtoLens.PlainField Data.ProtoLens.Required uri)
useruuid__field_descriptor
= Data.ProtoLens.FieldDescriptor "userUuid"
(Data.ProtoLens.StringField ::
Data.ProtoLens.FieldTypeDescriptor Data.Text.Text)
(Data.ProtoLens.PlainField Data.ProtoLens.Required useruuid)
in
Data.ProtoLens.MessageDescriptor
(Data.Map.fromList
[(Data.ProtoLens.Tag 1, uri__field_descriptor),
(Data.ProtoLens.Tag 2, useruuid__field_descriptor)])
(Data.Map.fromList
[("uri", uri__field_descriptor),
("userUuid", useruuid__field_descriptor)])
uri ::
forall msg msg' . Data.ProtoLens.HasField "uri" msg msg' =>
Lens.Family2.Lens msg msg' (Data.ProtoLens.Field "uri" msg)
(Data.ProtoLens.Field "uri" msg')
uri
= Data.ProtoLens.field
(Data.ProtoLens.ProxySym :: Data.ProtoLens.ProxySym "uri")
useruuid ::
forall msg msg' . Data.ProtoLens.HasField "useruuid" msg msg' =>
Lens.Family2.Lens msg msg' (Data.ProtoLens.Field "useruuid" msg)
(Data.ProtoLens.Field "useruuid" msg')
useruuid
= Data.ProtoLens.field
(Data.ProtoLens.ProxySym :: Data.ProtoLens.ProxySym "useruuid")
We should do something similar to happy/alex/etc for generated files, i.e., bundle them into the release archive that's uploaded to Hackage/Stackage. That way, packages that depend on protos won't require installing the protoc
executable.
Cabal has special logic for happy
and alex
, but the logic around when to rebuild the generated files is somewhat flaky: haskell/cabal#2940, haskell/cabal#2311, haskell/cabal#2362. Part of the problem is that when Cabal unpacks the tarball of the package, it doesn't set the modification times consistently (this may be fixed on newer versions of Cabal, not sure though).
One option is for us to do something simpler than Cabal:
protoc
when building from an archive that was created by cabal sdist
protoc
otherwise (in particular: when building from the git repo).cabal sdist
do something special in order for cabal build
to tell the difference. One hacky option is to include an extra dummy file in extra-src-files. A more involved option would be to copy the generated files from the autogen dir (where they are now) to one of the hs-source-dirs; but that may be complicated in the presence of multiple binaries/tests.readMessage is using Haskell string escaping conventions. This will lead it to reading text protocol , e.g. "\101" as decimal, i.e. 'e' instead of as octal, i.e. 'a'. I think proto-lens should match the behavior of https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.cc#L1039.
Build fails due to lack of HasLens'
class when building against latest release of lens-labels instead of from git
Protobuf compiler plugins can request that their generated code be pasted into an existing generated file immediately above a specified insertion point, using the File.insertion_point field. These insertion points are specified by placing "@@protoc_insertion_point(some_name)" in the generated source. proto-lens-protoc should have insertion points for at least the imports and the module top-level scope.
Hi,
I'm curious if you'd consider including proto-lens and friends in Stackage to simplify (for users) the in stack projects?
https://github.com/fpco/stackage/blob/master/MAINTAINERS.md#adding-a-package
Thanks.
I define a file LinkParser.proto
syntax = "proto3";
message LinkParseResult {
string title = 1;
map<string, string> og = 2;
repeated string imgs = 3;
}
When I run stack build
, and also when I run:
protoc --plugin=protoc-gen-haskell=`which proto-lens-protoc` --haskell_out . LinkParser.proto
manually, I get the following error:
proto-lens-protoc: definedFieldType: Field type .LinkParseResult.OgEntry not found in environment.
--haskell_out: protoc-gen-haskell: Plugin failed with status code 1.
Happens on both libprotoc 3.0.0
and libprotoc 2.6.1
(where the latter used the "equivalent syntax" mentioned in the protoc
docs).
Would it be possible to bump the constraint on data-default-class == 0.0.*
to data-default-class == 0.1.*
or to relax the constraint? (Stackage LTS-7.2 is pinned to data-default-class-0.1.2.0
for example). Thanks!
See https://developers.google.com/protocol-buffers/docs/proto3#specifying-field-rules. proto-lens generated code does not implement this correctly. Trying to decode repeated fields of scalar numeric types (like int32 and float) results in a message like "Field 1 expects wire type 0 but found 2", because packed repeated fields are wire type 2 (see https://developers.google.com/protocol-buffers/docs/encoding#structure).
The Data.ProtoLens.Compiler.Combinators
module has some combinators for making working with TH much easier. It'd be fantastic if it was moved to a separate library where it could be more widely adopted.
We should have a benchmark for encoding to/from the wire format, to make sure that our reflection and abstraction in the parser doesn't cause a significant slowdown compared to other Haskell code, and to give us more confidence when refactoring. (So far, we haven't done any performance tuning of the code.)
One arbitrary data point: decoding a 1MB proto took ~60ms on my desktop, and decoding followed by encoding (which includes forcing all the fields) took ~230ms. (The code was compiled with -O2.)
We should probably also benchmark the text format, though that's usually less performance-critical.
We can use google.protobuf.Any
to support basic parametric polymorphism. For example, if we define a file haskell_type_variables.proto
with custom options:
extend google.protobuf.MessageOptions {
repeated string haskell_type_var = 50000;
}
extend google.protobuf.FieldOptions {
optional string haskell_type_var = 50000;
}
Then we can use those options to annotate a type:
import ".../haskell_type_variables.proto";
message Foo {
option (haskell_type_var) = "a";
option (haskell_type_var) = "b";
int32 x = 1;
google.protobuf.Any y = 2 [(haskell_type_var)="a"];
google.protobuf.Any z = 2 [(haskell_type_var)="b"];
}
And generate the following type from that file:
data Foo a b = Foo { _Foo'x :: Int32, _Foo'y :: Maybe a, _Foo'z :: Maybe b}
instance (Message a, Message b) => Message (Foo a b) where ...
encodeMessage/decodeMessage
could serialize the submessage as an Any
, but in Haskell code represent it as a regular (not serialized) Haskell type. This makes it easier to use proto types directly in Haskell (instead of via wrapper types).
Following the above option-based design will require us to support extensions (#27) so that proto-lens-protoc can understand the new options that we add.
We can provide functions to convert proto messages to/from JSON, using the existing reflection capabilities of Data.ProtoLens.Message
.
This is the canonical mapping:
https://developers.google.com/protocol-buffers/docs/proto3#json
Some example language bindings:
Note that they don't include proto2-only features; extensions and unknown fields are dropped from the JSON output.
Enums can have "aliases" where two different constructors may map to the same int value. (In both proto2 and proto3). This breaks our codegen, in particular the fromEnum
instances.
Documentation:
https://developers.google.com/protocol-buffers/docs/proto3#enum
The user enables this feature by adding option allow_alias = true
to the enum declaration. I don't know whether the protobuf compiler is the one doing the checking, or if our proto-lens-protoc plugin needs check it manually.
Awesome project! One of my first published haskell projects was an attempt at doing protobufs well: https://github.com/mgsloan/sproto
Anyway, if making this fast is a priority, then you might be interested in porting it to store, as it's almost certainly faster than attoparsec.
It's still new, mgsloan/store#36 probably ought to be implemented before it's a responsible choice for this lib.
Currently, when parsing messages we force a strict ByteString for every submessage. It might be better to parse in a more chunked fashion, e.g., for use with lazy ByteStrings or with conduits/pipes.
Counterpoint: regardless, parsing has to read the whole input into a Haskell object that's at least as big as the input data anyway. So it's not clear how much you'd gain from this change.
If we did want this, it might be easier to move from attoparsec
to another library. Protobufs are a little tricky to parse because the wire format is not naturally delimited; messages are just sequences of tagged field/value pairs, and sub-messages are encoded as a varint followed by the message (with no "ending" marker).
For example, from the binary
package we could use isolate :: Int -> Get a -> Get a
which restricts a sub-parser to a specific number of bytes, and isEmpty :: Get Bool
which detects end-of-input correctly within a call to isolate. In comparison:
attoparsec
doesn't provide an isolate
function, AFAIK; currently we mimic it by running a parser on the output of take :: Parser ByteString
.cereal
provides an isolate
function, but it still reads the full span into a single ByteString
.store
's isolate doesn't work yet for our use case (mgsloan/store#40) and the library also lacks support for architecture-independent serialization (mgsloan/store#36). See #5 for more discussion.A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.