Coder Social home page Coder Social logo

pbdirect's Introduction

Build status codecov License Download

PBDirect

Read/Write Scala objects directly to Protobuf with no .proto file definitions

Context

Protobuf is a fast and efficient way to serialize data. While .proto files are great to share schema definitions between components, it is sometimes much simpler and straightforward to directly encode Scala object without using a .proto schema definition file.

PBDirect aims just that: Make it easier to serialize/deserialize into Protobuf.

Setup

In order to use PBDirect you need to add the following lines to your build.sbt:

resolvers += Resolver.bintrayRepo("beyondthelines", "maven")

libraryDependencies += "beyondthelines" %% "pbdirect" % "0.1.0"

Dependencies

PBDirect depends on:

  • protobuf-java the Protobuf java library (maintained by Google)
  • shapeless for the generation of type-class instances
  • cats to deal with optional and repeated fields

Usage

In order to use PBDirect you need to import the following:

import cats.instances.list._
import cats.instances.option._
import pbdirect._

Note: It's not recommended to use import cats.instances.all._ as it may cause issues with implicit resolution.

Example

Schema definition

PBDirect serialises case classes into protobuf and there is no need for a .proto schema definition file.

case class MyMessage(
  id: Option[Int],
  text: Option[String],
  numbers: List[Int]
)

is equivalent to the following protobuf definition:

message MyMessage {
   optional int32  id      = 1;
   optional string text    = 2;
   repeated int32  numbers = 3;
}

The field numbers correspond to the order of the fields inside the case class.

Serialization

You only need to call the toPB method on your case class. This method is implicitly added with import pbdirect._.

val message = MyMessage(
  id = Some(123),
  text = Some("Hello"),
  numbers = List(1, 2, 3, 4)
)
val bytes = message.toPB

Deserialization

Deserializing bytes into a case class is also straight forward. You only need to call the pbTo[A] method on the byte array containing the protobuf encoded data. This method is added implicitly on all Array[Byte] by importing pbdirect._.

val bytes: Array[Byte] = Array[Byte](8, 123, 18, 5, 72, 101, 108, 108, 111, 24, 1, 32, 2, 40, 3, 48, 4)
val message = bytes.pbTo[MyMessage]

Indexing

The protobuf indexes reflects directly the order of the fields declaration. E.g.

case class MyMessage(
  id: Option[Int],
  text: Option[String],
  numbers: List[Int]
)

is equivalent to the following protobuf definition:

message MyMessage {
   optional int32  id      = 1;
   optional string text    = 2;
   repeated int32  numbers = 3;
}

It's possible to specify the protobuf index by using the @Index annotation.

case class MyMessage(
  @Index(10) id: Option[Int],
  @Index(20) text: Option[String],
  @Index(30) numbers: List[Int]
)

is equivalent to the following protobuf definition:

message MyMessage {
   optional int32  id      = 10;
   optional string text    = 20;
   repeated int32  numbers = 30;
}

This is particularly useful to model an ADT where several members have the same generic type (i.e the same HList)

Extension

You might want to define your own formats for unsupported types. E.g. to add a format to write java.time.Instant you can do:

import java.time.Instant
import cats.syntax.invariant._

implicit val instantFormat: PBFormat[Instant] =
  PBFormat[Long].imap(Instant.ofEpochMilli)(_.toEpochMilli)

If you only need a reader you can map over an existing PBReader

import java.time.Instant
import cats.syntax.functor._

implicit val instantReader: PBReader[Instant] =
  PBReader[Long].map(Instant.ofEpochMilli)

And for a writer you simply contramap over it:

import java.time.Instant
import cats.syntax.contravariant._

implicit val instantWriter: PBWriter[Instant] =
  PBWriter[Long].contramap(_.toEpochMilli)
  )

More information

Finally you can find more implementation details over here

pbdirect's People

Contributors

akara avatar btlines avatar eperinan avatar henryem avatar juanpedromoreno avatar xuwei-k avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pbdirect's Issues

Nested object - java.util.NoSuchElementException: head of empty list

exception is thrown when:

    case class A(b: B)
    case class B(c: Option[C])
    case class C(s1: String, s2: String) 

    val msg = A(B(Some(C("ds", "xd"))))
    val bytes = msg.toPB
    val parsed = bytes.pbTo[A]

If you change B to to case class B(c: C) it works.
If you change C to case class B(c: C) it works.

Am I doing something wrong or is it bug?

Case class with nested inner class not resolving properly.

Simple example:

object TestMessages {
  case class InnerMessage(id: Int, name: String)
}

import TestMessages._

case class NestedInnerMessage(id: Int, inner: InnerMessage)

Writing this message works correctly:

    "write a message with inner nested type to Protobuf" in {
      import TestMessages._
      val nested = NestedInnerMessage(0, InnerMessage(1, "Hi"))
      nested.toPB shouldBe Array[Byte](8, 0, 18, 6, 8, 1, 18, 2, 72, 105)
    }

However, reading gives a compiler error:

    "read a message with inner nested type to Protobuf" in {
      import TestMessages._
      val bytes = Array[Byte](8, 0, 18, 6, 8, 1, 18, 2, 72, 105)
      bytes.pbTo[NestedInnerMessage] shouldBe NestedInnerMessage(0, InnerMessage(1, "Hi"))
    }
error: could not find implicit value for parameter reader: pbdirect.PBParser[pbdirect.NestedInnerMessage]

Bad coproduct serialization behaviour with nested messages

We're having problems using pbdirect for serializing case classes with nested optional arguments. I've reproduced this error with a simple main program. It's very strange because it works in tests but it fails in the main code. (see the code here

I've created a new test case in PBWriterSpec that works flawlessly:

    "write a nested empty message to Protobuf" in {
      case class InnerMessage(value: Int)
      case class OuterMessage(text: Option[String], inner: Option[InnerMessage])
      val message = OuterMessage(Some("Hello"), None)
      message.toPB shouldBe Array[Byte](10, 5, 72, 101, 108, 108, 111)
    }

But a similar code block fails when is moved to src/main/ code:

package pbdirect

object TestToPB extends App {

  case class InnerMessage(value: Int)
  case class OuterMessage(text: Option[String], inner: Option[InnerMessage])
  val message = OuterMessage(Some("Hello"), None)
  val a: Array[Byte] = message.toPB
  val e: Array[Byte] = Array[Byte](10, 5, 72, 101, 108, 108, 111)
  assert(a.sameElements(e))

}

We've debugged a bit the problem and found that in the second case, the serialization is adding an extra character at the end:

screen shot 2018-07-13 at 10 23 38

And seems to be a problem related to how shapeless is resolving the coproduct implicits:
Test case:

--------------prodWriter
--------------A = OuterMessage(Some(Hello),None)
--------------prodWriter

--------------consWriter
-------------- H :: T = Some(Hello) :: None :: HNil
--------------consWriter

--------------4
--------------consWriter
-------------- H :: T = None :: HNil
--------------consWriter

Bytes: 10,5,72,101,108,108,111

[info] PBWriterSpec:
[info] PBWriter
[info] - should write a nested empty message to Protobuf

In the main class case:

--------------prodWriter
--------------A = OuterMessage(Some(Hello),None)
--------------prodWriter

--------------consWriter
-------------- H :: T = Some(Hello) :: None :: HNil
--------------consWriter

--------------coprodWriter
--------------A = Some(Hello)
--------------coprodWriter

--------------cconsWriter
--------------H:+: T = Inr(Inl(Some(Hello)))
--------------cconsWriter

--------------cconsWriter
--------------H:+: T = Inl(Some(Hello))
--------------cconsWriter

--------------prodWriter
--------------A = Some(Hello)
--------------prodWriter

--------------consWriter
-------------- H :: T = Hello :: HNil
--------------consWriter

--------------4
--------------consWriter
-------------- H :: T = None :: HNil
--------------consWriter

--------------coprodWriter
--------------A = None
--------------coprodWriter

--------------cconsWriter
--------------H:+: T = Inl(None)
--------------cconsWriter

--------------prodWriter
--------------A = None
--------------prodWriter

Bytes: 10, 7, 10, 5, 72, 101, 108, 108, 111, 18, 0

Wrong deserialization for coproducts with empty lists

import pbdirect._

sealed trait Foo

case class Foo1(li: List[Int])
case class Foo2(ll: List[Long])
import cats.instances.list._

println(Foo1(List.empty).toPB.toSeq)
println(Foo2(List.empty).toPB.toSeq)

// prints
// WrappedArray()
// WrappedArray()

So in general (a: A).toPb.pbTo[A] == a can be false

Diverging implicit expansion error when deserializing optional timestamp fields

Trying to deserialize optional timestamp field like in next example:

package test

import java.sql.Timestamp

import com.google.protobuf.CodedInputStream
import pbdirect._

object Main {

  import cats.syntax.invariant._

  //implicit object TimestampExtractor extends PBExtractor[Timestamp] {
  //  override def extract(input: CodedInputStream): Timestamp = new Timestamp(input.readInt64())
  //}

  implicit val instantFormat: PBFormat[Timestamp] = PBFormat[Long].imap(new Timestamp(_))(_.getTime)

  def main(args: Array[String]): Unit = {

    import cats.instances.list._
    import cats.instances.option._

    println(Array[Byte](1, 2, 3).pbTo[TimestampOpt])

  }

}

case class TimestampOpt(opt: Option[Timestamp])

I get compilation error:

[info] Compiling 1 Scala source to /Users/vshchu/pbtest/target/scala-2.11/classes...
[error] /Users/vshchu/pbtest/src/main/scala/test/Main.scala:23: diverging implicit expansion for type pbdirect.PBReader[List[List[test.TimestampOpt]]]
[error] starting with method repeatedReader in trait PBReaderImplicits
[error]     println(Array[Byte](1, 2, 3).pbTo[TimestampOpt])
[error]                                      ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 4 s, completed Jan 10, 2018 5:39:53 PM

No compilation errors if I use TimestampExtractor implicit (commented out).
Scala 2.11.11, pbdirect 0.0.8, sbt 0.13.15

Update cats to 1.0.0-MF

There is a new available version of Cats "1.0.0-MF".

I am going to send a PR in few minutes with this update.

Bintray repository returns unauthorized 403

Hi, when I add the lines below to my build.sbt:

resolvers += Resolver.bintrayRepo("beyondthelines", "maven")
libraryDependencies += "beyondthelines" %% "pbdirect" % "0.1.0"

I get http response code 403 from the bintray repo url:

download error: Caught java.io.IOException: Server returned HTTP response code: 403 for URL: https://dl.bintray.com/beyondthelines/maven/beyondthelines/pbdirect_2.12/0.2.1/pbdirect_2.12-0.2.1.pom

Is there some kind of credential that I should put in the build.sbt file?

Serialization performance is poor for nested objects

For objects with nested structure, the algorithm currently implemented in LowPriorityPBWriterImplicits.prodWriter performs poorly. For example, if we construct a payload of size K nested inside D layers of Tuple1 (or single-field case classes), then the write performance is O(D^2 + DK). This is because at each layer prodWriter will allocate and copy from a buffer containing the payload plus the headers for the layers beneath. The D^2 term is not going to be a problem in most cases, but the DK term is. (The performance is more complicated with real-world structures, but this example gets the point across.)

The fundamental issue seems to be that we don't know the size of the nested object until it is written. So the current strategy is to compute the nested object first, and then copy it as a byte array into the CodedOutputStream.

It looks like the standard Java Protobuf implementation solves this problem by first computing the size of the nested object with a separate memoized call. If we do that, we should be able to write the nested object directly into the CodedOutputStream, replacing the current use of writeByteArray with manual writing of the header and length of the serialized nested object. Without memoization of the size computation, this approach would give us O(D^2 + K) performance. Memoizing would further reduce this to O(D+K), but it would introduce extra code complexity and probably a performance hit for shallower objects.

If there's agreement that this is a problem we should solve, I can probably contribute a fix (at least the non-memoized version).

Writing integers produces strange error

If I'm trying to write simple integer types like Long or Int I get the com.google.protobuf.InvalidProtocolBufferException error

import pbdirect._
1L.toPB

results to

com.google.protobuf.InvalidProtocolBufferException: While parsing a protocol message, the input ended unexpectedly in the middle of a field.  This could mean either that the input has been truncated or that an embedded message misreported its own length.

On the contrary "foobar".toPB or Array[Byte](1, 2, 3).toPB are working fine, it's quite confusing.

It should print something meaningful, or the implicit extension .toPb just shouldn't exists for these types.

Maven central publication

It would be great to have this library in the maven central repo, for example through sonatype. We're using this library in some other libraries and the resolver needs to be specified in the projects using our library.

Thanks

Support for Schema Evolution

Nice library ๐Ÿ‘ it would be nice to have support for schema evolution, so I can add and remove fields and be able to deserialize older data.

Thanks

Simple examples not working

package example

import cats.instances.list._
import pbdirect._

object Hello extends App {
  case class MyMessage( id: Option[Int], text: Option[String], numbers: List[Int])
  val message = MyMessage( id = Some(123),  text = Some("Hello"),  numbers = List(1, 2, 3, 4))

  println(List(message, message.copy(id = Some(999))).toPB.pbTo[List[MyMessage]])  //List(MyMessage(Some(123),Some(Hello),List(1, 2, 3, 4)))
  println(List(1, 2, 3).toPB.pbTo[List[Int]])
  /* Exception in thread "main" com.google.protobuf.InvalidProtocolBufferException:
  While parsing a protocol message, the input ended unexpectedly in the middle of a field.
  This could mean either that the input has been truncated or that an embedded message misreported its own length. */
}

with
sbt.version=1.2.6
and

lazy val root = (project in file(".")).
  settings(
    inThisBuild(List(
      organization := "com.example",
      scalaVersion := "2.12.7",
      version      := "0.1.0-SNAPSHOT"
    )),
    name := "pbdirectTest",
    resolvers += Resolver.bintrayRepo("beyondthelines", "maven"),
    libraryDependencies +=  "beyondthelines" %% "pbdirect" % "0.1.0"
  )

Diverging implicit expansion for type pbdirect.PBWriter

I am trying to serialize two case classes but I am getting the next error:

[error] /Users/eperinan/workspace/47deg/open-source-github/freestyle-opscenter/src/main/scala/Main.scala:71:92: diverging implicit expansion for type pbdirect.PBWriter[freestyle.Main.Metric :: List[freestyle.Main.Metric] :: shapeless.HNil]
[error] starting with method enumerationWriter in trait PBWriterImplicits
[error]     Binary(MetricsList(List(Metric("metric", "microservices", "node", 12.toFloat, 12345))).toPB)

Here you can check the app that I am building and the two case class.

object Main extends App {

  import cats.instances.list._
  import pbdirect._

  case class Metric(metric: String, microservice: String, node: String, value: Float, count: Int)
  case class MetricsList(metrics: List[Metric])

  println(
    Binary(MetricsList(List(Metric("metric", "microservices", "node", 12.toFloat, 12345))).toPB)
  )
}

I took a look to the code and we have implicits for all of them. Do you know which could be the problem ?

Thanks so much!

Can't infer PBWriter for recursive Coproducts

For example I want infer pbdirect instances for the following sealed trait:

import pbdirect._

sealed trait MyList

case class Cons(i: Int, l: MyList) extends MyList
case object Nil extends MyList

PBWriter[MyList] // error: could not find implicit value for evidence parameter of type pbdirect.PBWriter[MyList]
PBReader[MyList] // ok

PBReader is inferred just fine, but PBWriter is not.

My guess is that it should be tail: Lazy[PBWriter[T]] in implicit def cconsWriter instead of just tail: PBWriter[T]

implicit def cconsWriter[H, T <: Coproduct](
      implicit head: PBWriter[H],
      tail: PBWriter[T]): PBWriter[H :+: T] // it should be Lazy[PBWriter[T]]

just as it's implemented for PBReader

implicit def cconsParser[H, T <: Coproduct](
      implicit
      head: PBParser[H],
      tail: Lazy[PBParser[T]]): PBParser[H :+: T]

Deserialization performance is poor for large products

Currently, as noted in the project page, we use an O(N*M) algorithm for deserializing products with N fields repeated a total of M times. In profiling the deserialization of a complex deeply-nested object, I found that we end up spending a huge amount of time in the resulting calls to CodedInputStream.skipField.

One way to eliminate this is to (1) create a map from field index to parser, (2) loop through fields, passing each to its appropriate parser, and (3) finally build the results of each parser. I have prepared a PR that does this, which I'll be submitting soon (once my previous PR, which is a dependency, is merged). My PR also reduces memory pressure for deserializing nested objects by eliminating allocation of extra byte buffers in all cases except coproduct parsing.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.