google / proto-lens Goto Github PK

View Code? Open in Web Editor NEW

460.0 460.0 110.0 2.01 MB

API for protocol buffers using modern Haskell language and library patterns.

Home Page: https://google.github.io/proto-lens

License: BSD 3-Clause "New" or "Revised" License

Haskell 99.75% Shell 0.25%

proto-lens's Issues

Preserve unknown proto2 fields and enums

We should preserve unknown fields and enums in proto2 messages.

From the docs:
https://developers.google.com/protocol-buffers/docs/proto

Similarly, messages created by your new code can be parsed by your old code: old binaries simply ignore the new field when parsing. However, the unknown fields are not discarded, and if the message is later serialized, the unknown fields are serialized along with it – so if the message is passed on to new code, the new fields are still available.

Also, enums with unknown values should be treated like unknown fields and preserved for reserialization. (Note that this behavior is proto2-only; #28 describes the desired behavior for proto3.)

Please use vectors instead of lists for packed values

This library is a memory hog when it comes to parsing repeated packed scalar types. It would be nice if it would use contiguous arrays instead of lazy lists where appropriate.

Build failure

Hello,

I was trying to build this with stack install and got this error:

-- Dumping log file due to warnings: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log

[1 of 2] Compiling Main ( /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/Setup.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/Main.o )
[2 of 2] Compiling StackSetupShim ( /home/g/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/StackSetupShim.o )
Linking /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup ...
Configuring proto-lens-protobuf-types-0.2.2.0...
proto-src: warning: directory does not exist.
proto-src/google/protobuf/any.proto: No such file or directory
callProcess: /usr/local/bin/protoc
"--plugin=protoc-gen-haskell=/home/g/src/haskell/proto-lens/.stack-work/install/x86_64-linux-nopie/lts-9.0/8.0.2/bin/proto-lens-protoc"
"--haskell_out=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/build/autogen"
"--proto_path=proto-src" "proto-src/google/protobuf/any.proto"
"proto-src/google/protobuf/duration.proto"
"proto-src/google/protobuf/wrappers.proto" (exit 1): failed

-- End of log file: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log

Log files have been written to: /home/g/src/haskell/proto-lens/.stack-work/logs/
Progress: 33/36
-- While building package proto-lens-protobuf-types-0.2.2.0 using:
/home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup --builddir=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0 build lib:proto-lens-protobuf-types --ghc-options " -ddump-hi -ddump-to-file"
Process exited with code: ExitFailure 1
Logs have been written to: /home/g/src/haskell/proto-lens/.stack-work/logs/proto-lens-protobuf-types-0.2.2.0.log
[1 of 2] Compiling Main             ( /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/Setup.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/Main.o )
[2 of 2] Compiling StackSetupShim   ( /home/g/.stack/setup-exe-src/setup-shim-mPHDZzAJ.hs, /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/StackSetupShim.o )
Linking /home/g/src/haskell/proto-lens/proto-lens-protobuf-types/.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/setup/setup ...
Configuring proto-lens-protobuf-types-0.2.2.0...
proto-src: warning: directory does not exist.
proto-src/google/protobuf/any.proto: No such file or directory
callProcess: /usr/local/bin/protoc
"--plugin=protoc-gen-haskell=/home/g/src/haskell/proto-lens/.stack-work/install/x86_64-linux-nopie/lts-9.0/8.0.2/bin/proto-lens-protoc"
"--haskell_out=.stack-work/dist/x86_64-linux-nopie/Cabal-1.24.2.0/build/autogen"
"--proto_path=proto-src" "proto-src/google/protobuf/any.proto"
"proto-src/google/protobuf/duration.proto"
"proto-src/google/protobuf/wrappers.proto" (exit 1): failed

Apparently it is looking for proto-src/google/protobuf/any.proto and that might be causing the issue?

No trailing newlines in generated code

Switch FieldDescriptor name from String to Text

The FieldDescriptor name is a String, which is probably less efficient than Text. The same goes for fieldsByTextFormatName.

This probably only affects text format decoding, since the wire format uses it only for error messages. We can add a benchmark and see whether it makes a difference.

Improve the field-related documentation/API

Some related changes to help simplify the API around fields in proto-lens:

Hide the constructors for proto messages. They're even less useful than before now that we have unknown fields. (Developers can still click source in the Haddock docs to see the underlying implementation.)
Make the Show instance not display the internal fields, instead using the text format, for example:
```
showsPrec _ x = showChar '{' . showString (showMessageShort x) . showChar '}' 
```
Add Haddock comments for every proto message that list the names and types of all available lenses. Note: this is a little tricky (but doable) since haskell-src-exts doesn't easily support inserting top-level comments. (Note: include the accessor for unknown fields.)

Change proto3 enums (back) to a sum type

Description of proto3 enums (reformatted from the docs):

During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent.

In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation.

In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.

Currently (i.e., on HEAD) we're using option #1. That is, if we had enum Foo = { A = 1; B = 2 } then we generate newtype Foo = Foo Int32 and define A and B as pattern synonyms:

pattern A = Foo 1
pattern B = Foo 2

This is simpler, but limits our ability to get exhaustiveness checking from the compiler. Specifically,
if someone adds a new enum case to the proto, the type checker won't tell us that we're now missing a case. This issue happened to us in real code.

GHC 8.2.1 does has COMPLETE directives for pattern synonyms, but (a) it's too soon to drop support for 8.0, and (b) that's a newer and less-well-understood feature.

The proposal for the new API is similar to what already exists for Scala and Java. For:
enum Foo = { A = 1; B = 2; }
generate the following code:

data Foo = A | B | Foo'Unrecognized Foo'UnrecognizedValue

-- | Representation of an unknown value.  Uses a newtype to make
-- the different branches of `Foo` provably distinct.
-- For example, this way we don't have to worry about whether
-- `A == Foo'Unrecognized (Foo'UnrecognizedValue 1)`.
newtype Foo'UnrecognizedValue = Foo'UnrecognizedValue Int32 -- hidden constructor

unrecognizedValue'Foo :: Foo'UnrecognizedValue -> Int32

instance Enum Foo where
    toEnum 1 = A
    toEnum 2 = B
    toEnum n = Foo'Unrecognized (Foo'UnrecognizedValue n)
    fromEnum = ...

showMessage and related functions use Haskell not C string escaping conventions

showMessage and the related pprintMessage and showMessageShort functions use the Haskell string escaping conventions instead of the C ones. This means that non-printing characters get written as, e.g. "\SOH", which https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.cc#L1039 won't parse. Worse, in Haskell the escape "\101" means decimal 101 whereas the tokenizer.cc code (following C convention) interprets that as octal, i.e. decimal 97.

Compile failure with `proto-lens-combinators-0.1.0.8`

In order, the following will be built (use -v for more details):
 - proto-lens-combinators-0.1.0.8 {proto-lens-combinators-0.1.0.8-inplace} (lib:proto-lens-combinators) (first run)
[1 of 1] Compiling Main             ( /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/setup.hs, /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/Main.o )
Linking /tmp/matrix-worker/1501476906/dist-newstyle/build/x86_64-linux/ghc-8.2.1/proto-lens-combinators-0.1.0.8/setup/setup ...
<<ghc: 195021320 bytes, 90 GCs, 9019646/22334576 avg/max bytes residency (7 samples), 57M in use, 0.001 INIT (0.001 elapsed), 0.123 MUT (1.892 elapsed), 0.185 GC (0.185 elapsed) :ghc>>
Configuring proto-lens-combinators-0.1.0.8...
<<ghc: 190663776 bytes, 110 GCs, 12726334/39654304 avg/max bytes residency (8 samples), 102M in use, 0.001 INIT (0.001 elapsed), 0.101 MUT (0.101 elapsed), 0.218 GC (0.221 elapsed) :ghc>>
==========
Error: couldn't find the executable "proto-lens-protoc" in your $PATH.
    Please file a bug at https://github.com/google/proto-lens/issues .
==========
Missing executable "proto-lens-protoc"
CallStack (from HasCallStack):
  error, called at src/Data/ProtoLens/Setup.hs:297:13 in proto-lens-protoc-0.2.2.0-6fbfcc9fefb6f837231240070e1fad9e51f23d5d830dd28e2a4fa31f1e705ca4:Data.ProtoLens.Setup

Haskell files being touched causes unecessary recompilation

proto-lens-protoc touches (updates the modification time of) generated .hs files even when the contents did not change, causing unecessary rebuilds when compiling with ghc.

This is because currently GHC considers only the mtime of the input file for determining whether something has to be recompiled, not its contents.

The problematic code is here:

proto-lens/proto-lens-protoc/src/Data/ProtoLens/Setup.hs

Lines 307 to 312 in fe05638

    
           callProcess protoc $ 
        
               [ "--plugin=protoc-gen-haskell=" ++ protoLensProtoc 
        
               , "--haskell_out=" ++ output 
        
               ] 
        
               ++ ["--proto_path=" ++ p | p <- imports] 
        
               ++ files

It seems that it is protoc that touches / rewrites the files.

Which way should this be fixed?

Generate the files into a different, temporary output directory, the moving the .hs files over only if they aren't identical?
Change protoc itself to not write the files if the contents are the same?

Please upload the packages which are updated for ghc-8.2.1 to the hackage

Please upload the packages which are updated for ghc-8.2.1 to the hackage.
These packages (on the hackage) are still not able to work with ghc-8.2.1 because of depend base (>=4.8 && <4.10) .

TextFormat prints fields in alphabetical order

Given this protobuf

message Date {
  int32 year = 1;
  int32 month = 2;
  int32 day = 3;
}

you get

day: 1 month: 9 year: 2016

I think it would be better to order by tag number.

Document Combinators

While writing the implementation for prisms I had to go back and forth between the language extension library, Combinators.hs and Generate.hs figuring out what did what.

I think some documentation of Combinators.hs would be super helpful for any future development on those files.

Should generated types derive Generic by default?

Very useful for JSON parsing etc..

Consolidate packages

The proto-lens, proto-lens-protoc and proto-lens-descriptors packages are tied pretty closely together. Consider consolidating some or all of them.

The main concern is bootstrapping. Changing the internals of lens-labels or proto-lens effectively breaks proto-lens-descriptors until the descriptor modules can be regenerated -- but regenerating them requires a working proto-lens-protoc, which depends on proto-lens-descriptors, introducing a cycle. The current bootstrap script solves this using the fact that they're all separate packages: it builds a new proto-lens-protoc against an old, working version of lens-labels, proto-lens and proto-lens-descriptors, and uses that compiler to generate the new descriptor modules. I'm not sure how to implement that process if they're all in the same Cabal package.

Use `autogenPackageModulesDir`

Cabal 2.0 added a function autogenPackageModulesDir which we should use in Data.ProtoLens.Setup if it's available. That would let us generate modules separately for each component (e.g. library vs exe vs tests), rather than generating them all in one place.

At minimum, this would prevent confusing GHC/Cabal errors when an exe imports a proto module but doesn't specify it in other-deps, and the module is specified for a library, but the test accidentally doesn't depend on the library.

Correct handling of unknown proto3 enum values

Currently proto-lens returns an error when decoding unknown enum values. It should instead accept and preserve such values.

Quoting the proto3 docs:

During deserialization, unrecognized enum values will be preserved in the message, though how this is represented when the message is deserialized is language-dependent. In languages that support open enum types with values outside the range of specified symbols, such as C++ and Go, the unknown enum value is simply stored as its underlying integer representation. In languages with closed enum types such as Java, a case in the enum is used to represent an unrecognized value, and the underlying integer can be accessed with special accessors. In either case, if the message is serialized the unrecognized value will still be serialized with the message.

Finish support for "import public"

.proto files with "import public" statements don't re-export the public imports.

For example, if foo.proto contains

public import "bar.proto"

Then the generated module Foo.hs should re-export all the names defined in Bar.hs.

I think this is doable, but it might be a little tricky to avoid name conflicts between the autogenerated field accessors in Foo.hs and Bar.hs.

Test failure (test build failure)

In stackage nightly. Full trace:

> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ghc -clear-package-db -global-package-db -package-db=/var/stackage/work/builds/nightly/pkgdb -hide-all-packages -package=Cabal -package=base -package=proto-lens-protoc Setup
[1 of 1] Compiling Main             ( Setup.hs, Setup.o )
Linking Setup ...
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ./Setup configure --enable-tests --package-db=clear --package-db=global --package-db=/var/stackage/work/builds/nightly/pkgdb --libdir=/var/stackage/work/builds/nightly/lib --bindir=/var/stackage/work/builds/nightly/bin --datadir=/var/stackage/work/builds/nightly/share --libexecdir=/var/stackage/work/builds/nightly/libexec --sysconfdir=/var/stackage/work/builds/nightly/etc --docdir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --htmldir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --haddockdir=/var/stackage/work/builds/nightly/doc/proto-lens-combinators-0.1.0.8 --flags=
Configuring proto-lens-combinators-0.1.0.8...
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ghc -clear-package-db -global-package-db -package-db=/var/stackage/work/builds/nightly/pkgdb -hide-all-packages -package=Cabal -package=base -package=proto-lens-protoc Setup
> /tmp/stackage-build14/proto-lens-combinators-0.1.0.8$ ./Setup build
unrecognized option `--plugin=protoc-gen-haskell=/var/stackage/work/builds/nightly/bin/proto-lens-protoc'
unrecognized option `--haskell_out=dist/build/global-autogen'
unrecognized option `--proto_path=tests'
Usage: protoc [OPTION]... FILES
  -h  --help     show usage
  -v  --version  show version number

Preprocessing library for proto-lens-combinators-0.1.0.8..
Building library for proto-lens-combinators-0.1.0.8..
[1 of 1] Compiling Data.ProtoLens.Combinators ( src/Data/ProtoLens/Combinators.hs, dist/build/Data/ProtoLens/Combinators.o )
Preprocessing test suite 'combinators_test' for proto-lens-combinators-0.1.0.8..
Setup: can't find source for Proto/Combinators in tests,
dist/build/combinators_test/autogen, dist/build/global-autogen

Autogenerate the list of modules

Currently every proto file needs to be specified twice in the .cabal file: the raw .proto file in extra-src-files, and the Proto.* module in exposed-modules/other-modules.

From very basic experiments with stack, I think it's possible to drop the latter requirement and have our Setup script populate the list of Haskell modules automatically, by changing the PackageDescription and/or LocalBuildInfo.

In addition to less redundancy, this will help with:

Refactorings such as #100
Integrate better with hpack (in particular, its ability to autodetect the exposed-modules and other-modules)

The exact design of this feature is still an open question: can (should?) we provide control to the user over whether their protos end up in exposed-modules or other-modules? Or in individual components (e.g. tests, executables or benchmarks)? For example, proto-lens-combinators contains a proto test file which is test-only and not intended to be exported from the library.

Handle non-capitalized enum values

Hi, thanks for proto-lens; I got it to work on a gRPC client project with somewhat complicated .protos. While making it work, I had to patch a workaround to support valid Enum definitions which do not implement the recommended style-guide and use lower-cased enum value names.

My workaround is to call toUpper on enum names. This solution is not really great so I wanted to discuss with you the best implementation choices before making a proper pull request. You can find my patches at:
master...lucasdicioccio:workarounds .

Support extensions

Extensions (proto2-only) currently aren't supported yet. It's not clear what the API should look like.

This would primarily be useful for legacy code, since proto3 replaces extensions with the Any type (#22).

Static type-checking of required fields.

Currently, required fields are defaulted to the "zero" value for that type. We should instead provide smart construction that checks at compile time whether all the required fields have been set properly.

Note that this is moot for proto3, which got rid of the concept of required fields altogether.

One possible, nebulously-described approach: for every datatype Foo, also define a Foo'Builder which is parametrized by the type of each required field (and which may be () if it's not set). This Foo'Builder can be an instance of Default (instead ofFoo), and we can provide lenses to build up its individual fields, as well as a class to "freeze" Foo'Builder into Foo once all its fields have been set.

Inconsistent naming of coproduct oneof fields

message AcmeObservation {
  oneof status {
    ActionWin win = 2;
    CompletedHurdleStatus completed_hurdle = 3;
    QualifyTransaction qualify_transaction = 4;
  }
}

results in the generated haskell:

data AcmeObservation'Status = AcmeObservation'Win !ActionWin
                            | AcmeObservation'Completed_hurdle !CompletedHurdleStatus
                            | AcmeObservation'Qualify_transaction !QualifyTransaction
                            deriving (Prelude.Show, Prelude.Eq)

Notice AcmeObservation'Completed_hurdle, which should become AcmeObservation'CompletedHurdle according to the renaming of all other snake case identifiers.

Support Kythe metadata to crosslink Haskell code and protos

Hello - this is to start a discussion if proto-lens & haskell-indexer could support this feature. For a background, please read https://kythe.io/docs/schema/indexing-protobuf.html .

TLDR for proto-lens: the generated Haskell code (on specific request) should be annotated with proto2.GeneratedCodeInfo-equivalent data, most importantly path of proto file and the "magic path string" of the proto entity.

A complication is that proto-lens AFAIU doesn't generate direct field lens, rather string proxy lens (what's the correct term for this)?, so maybe the specific typeclass instance methods (these are the lens, am I right?) need to be annotated.

Then a complication for haskell-indexer is that now it emits a reference to the class method instead of the instance method from the use site (assuming the instance is fixed at the use-site). The indexer should rather reference the instance method (and of course emit a generates edge from the proto VName to the instance method lens VName), which might be possible to find out from the AST, though some digging is required here.

Open questions:

Does this sound reasonable for proto-lens?
How to parametrize proto-lens to get the metadata emitted? How do we arrange that this happens only in haskell-indexer mode?
How should the metadata be emitted? In the C++ example, a new .pb.meta include is generated and included into the .pb.h. I think the main point is that the indexer should have somewhat convenient access to this - for example the data (generated Haskell spans -> proto source info) could also be shipped in a side-channel file.

+@judah @blackgnezdo for proto-lens

Support filepaths with `dots`.

Hi, again. On top of #152 I had to make another workaround for the project I was using: the project had a directory with a UNIX-hidden-directory such as .protodir/myfilename.proto . The current plugin will generate modules named Proto..protodir.MyFileName, which is invalid Haskell.

As for the other bug I filed, I wanted to discuss with you the best implementation choices before making a proper pull request. You can find my patches at:
master...lucasdicioccio:workarounds .

Don't fail to decode if fields have the wrong wire type

If a field has the wrong wire type, currently we fail the decode. Instead, we should just ignore that field (and/or add it to the unknown fields set, once we've implemented #29).

I confirmed that C++, Java and Go all have this more lenient behavior (although it's not documented well).

Also mentioned here:
apple/swift-protobuf#342

How to parse delimited messages

I got a socket connection that is used to stream data with delimited protobuf messages (see this). In the java api there is an convenient method called parseDelimitedFrom. How can this be achieved using this library?
I am new to haskell, so i might have overlooked something. I am sorry if this is very obvious.

google/protobuf/timestamp.proto support

Is the omission of the predefined timestamp proto intentional?

readMessage chokes on '-delimited strings

The TextFormat encoding may use either single or double quotes (though they must match). However proto-lens only supports double-quotes so far.

The error message looks like:

unexpected "'"
expecting "-", number, literal string or identifier

I found documentation of this behavior in the protobuf sources:
https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.h#L116

Currently proto-lens's TextFormat parser uses Text.Parsec.Token.stringLiteral which doesn't support single-quoted strings:
https://hackage.haskell.org/package/parsec-3.1.9/docs/Text-Parsec-Token.html#v:stringLiteral

Support "Any"

We should support the Any type that was introduced in proto3:
https://developers.google.com/protocol-buffers/docs/proto3#any

At its core, Any is just another protocol buffer message (defined in google/protobuf/any.proto), so we should already be able to handle protos that reference it. However, we can add a nicer API on top for converting to/from an arbitrary message type, siimilar to what the C++ and Java bindings provide.

Better support for "oneof" fields

We currently treat "oneof" fields similar to optional fields. This is an intended backwards-compatible behavior of the wire encoding, but makes some use cases more awkward.

https://developers.google.com/protocol-buffers/docs/proto#oneof

One possible approach is to store the value as a sum type internally, and provide lenses that return a default value when their case isn't set (as well as "maybe'foo" variants). Another option (less memory efficient) is to store the fields normally, but make each field's lens clear out all the other fields when it's being set.

Generate `Ord` instances for messages.

It would be nice if protolens automatically generated Ord instances of messages. This could be hidden behind a cabal flag if it turned out to be egregiously slow.

Protobuf generation does not preserve capitalisation

I've got the following protobuf definition:

syntax` = "proto2";

message Request {
  required string uri = 1;
  required string userUuid = 2;
}

But the generated code downcases the userUuid field so the "accessor" function is: useruuid which doesn't seem right. The only reference to changing case is to do with groups so I assume this isn't correct.

I've put the resultant file in line as I can't attach it:

{- This file was auto-generated from Request.proto by the proto-lens-protoc program. -}
{-# LANGUAGE ScopedTypeVariables, DataKinds, TypeFamilies,
  MultiParamTypeClasses, FlexibleContexts, FlexibleInstances,
  PatternSynonyms #-}
{-# OPTIONS_GHC -fno-warn-unused-imports#-}
module Proto.Request where
import qualified Prelude
import qualified Data.Int
import qualified Data.Word
import qualified Data.ProtoLens.Reexport.Data.ProtoLens
       as Data.ProtoLens
import qualified
       Data.ProtoLens.Reexport.Data.ProtoLens.Message.Enum
       as Data.ProtoLens.Message.Enum
import qualified Data.ProtoLens.Reexport.Lens.Family2
       as Lens.Family2
import qualified Data.ProtoLens.Reexport.Lens.Family2.Unchecked
       as Lens.Family2.Unchecked
import qualified Data.ProtoLens.Reexport.Data.Default.Class
       as Data.Default.Class
import qualified Data.ProtoLens.Reexport.Data.Text as Data.Text
import qualified Data.ProtoLens.Reexport.Data.Map as Data.Map
import qualified Data.ProtoLens.Reexport.Data.ByteString
       as Data.ByteString

data Request = Request{_Request'uri :: !Data.Text.Text,
                       _Request'useruuid :: !Data.Text.Text}
             deriving (Prelude.Show, Prelude.Eq)

type instance Data.ProtoLens.Field "uri" Request = Data.Text.Text

instance Data.ProtoLens.HasField "uri" Request Request where
        field _
          = Lens.Family2.Unchecked.lens _Request'uri
              (\ x__ y__ -> x__{_Request'uri = y__})

type instance Data.ProtoLens.Field "useruuid" Request =
     Data.Text.Text

instance Data.ProtoLens.HasField "useruuid" Request Request where
        field _
          = Lens.Family2.Unchecked.lens _Request'useruuid
              (\ x__ y__ -> x__{_Request'useruuid = y__})

instance Data.Default.Class.Default Request where
        def
          = Request{_Request'uri = Data.ProtoLens.fieldDefault,
                    _Request'useruuid = Data.ProtoLens.fieldDefault}

instance Data.ProtoLens.Message Request where
        descriptor
          = let uri__field_descriptor
                  = Data.ProtoLens.FieldDescriptor "uri"
                      (Data.ProtoLens.StringField ::
                         Data.ProtoLens.FieldTypeDescriptor Data.Text.Text)
                      (Data.ProtoLens.PlainField Data.ProtoLens.Required uri)
                useruuid__field_descriptor
                  = Data.ProtoLens.FieldDescriptor "userUuid"
                      (Data.ProtoLens.StringField ::
                         Data.ProtoLens.FieldTypeDescriptor Data.Text.Text)
                      (Data.ProtoLens.PlainField Data.ProtoLens.Required useruuid)
              in
              Data.ProtoLens.MessageDescriptor
                (Data.Map.fromList
                   [(Data.ProtoLens.Tag 1, uri__field_descriptor),
                    (Data.ProtoLens.Tag 2, useruuid__field_descriptor)])
                (Data.Map.fromList
                   [("uri", uri__field_descriptor),
                    ("userUuid", useruuid__field_descriptor)])

uri ::
    forall msg msg' . Data.ProtoLens.HasField "uri" msg msg' =>
      Lens.Family2.Lens msg msg' (Data.ProtoLens.Field "uri" msg)
        (Data.ProtoLens.Field "uri" msg')
uri
  = Data.ProtoLens.field
      (Data.ProtoLens.ProxySym :: Data.ProtoLens.ProxySym "uri")

useruuid ::
         forall msg msg' . Data.ProtoLens.HasField "useruuid" msg msg' =>
           Lens.Family2.Lens msg msg' (Data.ProtoLens.Field "useruuid" msg)
             (Data.ProtoLens.Field "useruuid" msg')
useruuid
  = Data.ProtoLens.field
      (Data.ProtoLens.ProxySym :: Data.ProtoLens.ProxySym "useruuid")

Include generated files in releases of packages depending on proto-lens

We should do something similar to happy/alex/etc for generated files, i.e., bundle them into the release archive that's uploaded to Hackage/Stackage. That way, packages that depend on protos won't require installing the protoc executable.

Cabal has special logic for happy and alex, but the logic around when to rebuild the generated files is somewhat flaky: haskell/cabal#2940, haskell/cabal#2311, haskell/cabal#2362. Part of the problem is that when Cabal unpacks the tarball of the package, it doesn't set the modification times consistently (this may be fixed on newer versions of Cabal, not sure though).

One option is for us to do something simpler than Cabal:

Never run protoc when building from an archive that was created by cabal sdist
Always run protoc otherwise (in particular: when building from the git repo).
We'd need to make cabal sdist do something special in order for cabal build to tell the difference. One hacky option is to include an extra dummy file in extra-src-files. A more involved option would be to copy the generated files from the autogen dir (where they are now) to one of the hs-source-dirs; but that may be complicated in the presence of multiple binaries/tests.

readMessage wrongly uses Haskell string escaping conventions

readMessage is using Haskell string escaping conventions. This will lead it to reading text protocol , e.g. "\101" as decimal, i.e. 'e' instead of as octal, i.e. 'a'. I think proto-lens should match the behavior of https://github.com/google/protobuf/blob/master/src/google/protobuf/io/tokenizer.cc#L1039.

proto-lens-descriptors version bound for lens-labels is too permissive

Build fails due to lack of HasLens' class when building against latest release of lens-labels instead of from git

Generate insertion points to allow integration with other protoc plugins.

Protobuf compiler plugins can request that their generated code be pasted into an existing generated file immediately above a specified insertion point, using the File.insertion_point field. These insertion points are specified by placing "@@protoc_insertion_point(some_name)" in the generated source. proto-lens-protoc should have insertion points for at least the imports and the module top-level scope.

Inclusion in Stackage?

Hi,

I'm curious if you'd consider including proto-lens and friends in Stackage to simplify (for users) the in stack projects?
https://github.com/fpco/stackage/blob/master/MAINTAINERS.md#adding-a-package

Thanks.

Error on using maps

I define a file LinkParser.proto

syntax = "proto3";

message LinkParseResult {
  string title = 1;
  map<string, string> og = 2;
  repeated string imgs = 3;
}

When I run stack build, and also when I run:

protoc --plugin=protoc-gen-haskell=`which proto-lens-protoc` --haskell_out . LinkParser.proto

manually, I get the following error:

proto-lens-protoc: definedFieldType: Field type .LinkParseResult.OgEntry not found in environment.
--haskell_out: protoc-gen-haskell: Plugin failed with status code 1.

Happens on both libprotoc 3.0.0 and libprotoc 2.6.1 (where the latter used the "equivalent syntax" mentioned in the protoc docs).

bump or relax version constraint on data-default-class?

Would it be possible to bump the constraint on data-default-class == 0.0.* to data-default-class == 0.1.* or to relax the constraint? (Stackage LTS-7.2 is pinned to data-default-class-0.1.2.0 for example). Thanks!

In proto3, repeated fields of scalar numeric types use packed encoding by default.

See https://developers.google.com/protocol-buffers/docs/proto3#specifying-field-rules. proto-lens generated code does not implement this correctly. Trying to decode repeated fields of scalar numeric types (like int32 and float) results in a message like "Field 1 expects wire type 0 but found 2", because packed repeated fields are wire type 2 (see https://developers.google.com/protocol-buffers/docs/encoding#structure).

Factor Data.ProtoLens.Compiler.Combinators into a separate library

The Data.ProtoLens.Compiler.Combinators module has some combinators for making working with TH much easier. It'd be fantastic if it was moved to a separate library where it could be more widely adopted.

Add a benchmark for endoding/decoding

We should have a benchmark for encoding to/from the wire format, to make sure that our reflection and abstraction in the parser doesn't cause a significant slowdown compared to other Haskell code, and to give us more confidence when refactoring. (So far, we haven't done any performance tuning of the code.)

One arbitrary data point: decoding a 1MB proto took ~60ms on my desktop, and decoding followed by encoding (which includes forcing all the fields) took ~230ms. (The code was compiled with -O2.)

We should probably also benchmark the text format, though that's usually less performance-critical.

Support parametrized "Any"

We can use google.protobuf.Any to support basic parametric polymorphism. For example, if we define a file haskell_type_variables.proto with custom options:

extend google.protobuf.MessageOptions {
    repeated string haskell_type_var = 50000;
}

extend google.protobuf.FieldOptions {
    optional string haskell_type_var = 50000;
}

Then we can use those options to annotate a type:

import ".../haskell_type_variables.proto";

message Foo {
    option (haskell_type_var) = "a";
    option (haskell_type_var) = "b";
    int32 x = 1;
    google.protobuf.Any y = 2 [(haskell_type_var)="a"];
    google.protobuf.Any z = 2 [(haskell_type_var)="b"];
}

And generate the following type from that file:

data Foo a b = Foo { _Foo'x :: Int32, _Foo'y :: Maybe a, _Foo'z :: Maybe b}

instance (Message a, Message b) => Message (Foo a b) where ...

encodeMessage/decodeMessage could serialize the submessage as an Any, but in Haskell code represent it as a regular (not serialized) Haskell type. This makes it easier to use proto types directly in Haskell (instead of via wrapper types).

Following the above option-based design will require us to support extensions (#27) so that proto-lens-protoc can understand the new options that we add.

Static type-checking of required fields.

Convert to/from JSON

We can provide functions to convert proto messages to/from JSON, using the existing reflection capabilities of Data.ProtoLens.Message.

This is the canonical mapping:
https://developers.google.com/protocol-buffers/docs/proto3#json

Some example language bindings:

https://github.com/google/protobuf/blob/master/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java

Note that they don't include proto2-only features; extensions and unknown fields are dropped from the JSON output.

Support enum aliases

Enums can have "aliases" where two different constructors may map to the same int value. (In both proto2 and proto3). This breaks our codegen, in particular the fromEnum instances.

Documentation:
https://developers.google.com/protocol-buffers/docs/proto3#enum

The user enables this feature by adding option allow_alias = true to the enum declaration. I don't know whether the protobuf compiler is the one doing the checking, or if our proto-lens-protoc plugin needs check it manually.

Consider using store for serialization?

Awesome project! One of my first published haskell projects was an attempt at doing protobufs well: https://github.com/mgsloan/sproto

Anyway, if making this fast is a priority, then you might be interested in porting it to store, as it's almost certainly faster than attoparsec.

It's still new, mgsloan/store#36 probably ought to be implemented before it's a responsible choice for this lib.

Better support for chunked parsers and streaming

Currently, when parsing messages we force a strict ByteString for every submessage. It might be better to parse in a more chunked fashion, e.g., for use with lazy ByteStrings or with conduits/pipes.

Counterpoint: regardless, parsing has to read the whole input into a Haskell object that's at least as big as the input data anyway. So it's not clear how much you'd gain from this change.

If we did want this, it might be easier to move from attoparsec to another library. Protobufs are a little tricky to parse because the wire format is not naturally delimited; messages are just sequences of tagged field/value pairs, and sub-messages are encoded as a varint followed by the message (with no "ending" marker).

For example, from the binary package we could use isolate :: Int -> Get a -> Get a which restricts a sub-parser to a specific number of bytes, and isEmpty :: Get Bool which detects end-of-input correctly within a call to isolate. In comparison:

attoparsec doesn't provide an isolate function, AFAIK; currently we mimic it by running a parser on the output of take :: Parser ByteString.
cereal provides an isolate function, but it still reads the full span into a single ByteString.
store's isolate doesn't work yet for our use case (mgsloan/store#40) and the library also lacks support for architecture-independent serialization (mgsloan/store#36). See #5 for more discussion.

	callProcess protoc $
	[ "--plugin=protoc-gen-haskell=" ++ protoLensProtoc
	, "--haskell_out=" ++ output
	]
	++ ["--proto_path=" ++ p \| p <- imports]
	++ files

google / proto-lens Goto Github PK

proto-lens's Issues

Recommend Projects

Recommend Topics

Recommend Org