actgardner / gogen-avro Goto Github PK

View Code? Open in Web Editor NEW

362.0 362.0 83.0 6.67 MB

Generate Go code to serialize and deserialize Avro schemas

License: MIT License

Go 99.96% Shell 0.04%

gogen-avro's People

Contributors

Stargazers

Watchers

Forkers

scalingdata takebayashi koadr securityscorecard rikonor peernova-private peay agorobets panxh rmrobinson-textnow karrick jsteenb2 pendo-io larryzhao whilei fuzzystatic ryouzhang francoiswagner ricardohsd artheus jonnydubowsky xleon90 clear-street matlockx chrisrx dmlambea wfscheper onet netrounds-joakim pengyide karol-kokoszka amit-pandia jeroenrinzema andrewarrow burkad01 ahornerr august-at-goguardian zimbabao dmitris josephglanville alexguerra seibert-media rogpeppe-contrib rogpeppe tanis2000 cloudant sourcefellows dtacalau prabhash1785 anstoli nikolayvoronchikhin tomaszbartoszewski bhaskardivya-reflektion paulliang1 levnovikov92 mikaeldnetapp artempanko lukasleung cmouli84 sixstone-qq lunarway vedadiyan pantafive mikeschlosser16 klausner17 galion96 global-soft-ba frankgrimes97 injeniero fruiting fortelabsinc loveinshenzhen stampy88 srvgit peskic93 nicanor-romero enrichman skaidus masmovil gnagel tv2-thomas

gogen-avro's Issues

Support for unions with records

The current implementation (decodeUnionDefinition) assumes that your unions will only contain a go string type. If you have a union with a type that is not primitive it fails.

Error:
Decoded field err map[default: name:someId type:[null map[name:ID namespace:com.mystuff size:16 type:fixed]]] - Union members for someId is not of type strings

Schema file used:
unionFailure.txt

Ideas

Hello,

I just wanted to bring up a few things that I ran across when using your lib in my project. I am able to make things work as-is, but thought it might be helpful to share with you.

First, my project is rather large. I am writing a program to extract a over 1000 sql views out of one product and dump them into another product. The destination product accepts avro files, so this is the intermediate format that I have chosen. So thank you for your work! :)

I wrote a program to generate go structs for each view being extracted. These structs are then used with the Go SQL package to extract data. I then generate avro schema files and use gogen-avro to generate go code that can be used to export the data to avro container files. So, I execute gogen-avro with 1300 avro schema files on the command line to generate the code.

Issues:

The comment block that is inserted into almost every generated source code file contains a full list of all avro schema files. So, I get 1300 files which each have a list 1300 source files at the top. This quickly gets out of hand regarding scalability.
It would be really nice if there was a way to invoke your code programmatically. I could include your code in my code generation project, and pass the avro schema definitions to your code via sice of []byte or string. This way I would not need to first right the schema files to the file system, and then invoke your command line tool. Wish list item I suppose, but I think it would be easy to rewrite things a bit to provide a programmatic API that can invoke your code in the same manner as the command line tool.

(question) Production ready?

I'm currently evaluating the fastest method to use Avro with Go, and this seems like the ideal lib. Does anyone have experience working with this in production setting? Our particular setting is using Avro over Kafka with relatively simple schemas.

Type name references get converted to copies of type definition in generated schema

For example, if I have one field in my schema that looks like this:

{"name": "Hash", "type": {"type": "fixed", "size": 32, "name": "sha256"}}

...and then another field, later on in the schema, that looks like this:

{"name": "AnotherHash", "type": "com.example.sha256"}

...then the schema string-literal which gogen-avro emits as the body of the Schema() method for this record type, should look pretty much the same. The first occurrence of the sha256 type should be a fully-expanded definition, and then all later occurrences should be (fully-qualified) references to the type.

Instead, though, in the generated code of gogen-avro, the expanded definitional form of the sha256 type appears every time.

This isn't allowed, per the Avro spec, and causes the official Apache avro-tools to barf:

Exception in thread "main" org.apache.avro.SchemaParseException: Can't redefine: com.example.sha256
	at org.apache.avro.Schema$Names.put(Schema.java:1128)
	at org.apache.avro.Schema$Names.add(Schema.java:1123)
	at org.apache.avro.Schema.parse(Schema.java:1317)
	at org.apache.avro.Schema.parse(Schema.java:1269)
	at org.apache.avro.Schema.parse(Schema.java:1269)
	at org.apache.avro.Schema$Parser.parse(Schema.java:1032)
	at org.apache.avro.Schema$Parser.parse(Schema.java:1020)
	at org.apache.avro.Schema.parse(Schema.java:1081)
	at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:124)
	at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
	at org.apache.avro.tool.CatTool.nextInput(CatTool.java:147)
	at org.apache.avro.tool.CatTool.run(CatTool.java:88)
	at org.apache.avro.tool.Main.run(Main.java:87)
	at org.apache.avro.tool.Main.main(Main.java:76)

support for nested records

Great to see this tool in the making. Would you consider adding support for nested records?
For example, the following schema contains a nested record.
{
"type": "record",
"name": "OuterRecord",
"namespace": "...",
"doc": "Schema containing record type",
"fields": [
{
"name": "nested-record",
"type": {
"type": "record",
"name": "NestedRecord",
"doc": "...",
"fields": [
{
"name": "some-attribute",
"type": "string"
}
]
}
}
]
}

Refactoring various components

I noticed while going through some packages that i received a lot of go-lint warnings. I am wondering if you are open for a PR that resolves these linting warnings/errors. I have included below some of the linting errors that I found.

types/schema.go:67  receiver name n should be consistent with previous receiver name namespace for Namespace

types/schema:27  exported type Schema should have comment or be unexported

types/schema:12  don't use ALL_CAPS in Go names; use CamelCase

While working on #77 did I also notice that many strings are hard defined inside methods. I want to propose to make constants out of them to improve/enforce consistency.

Missing package management

Can you @actgardner please add something like dep to manage dependencies? Right now I'm not certain I have the same ones (I'm assuming this is the main reason most of my tests are failing).

container_test.go:85: cannot create OCFReader: cannot read OCF header with invalid avro.schema: Record "event" field 1 ought to be valid Avro named type: unknown type name: "string.uuid"

Does the library support logical types?

As defined here: https://avro.apache.org/docs/1.8.2/spec.html#Logical+Types

Will they be ignored and serialize/deserialize as their underlying types (e.g. timestamp-millis as a long)? Will the same happen for the code generation part?

Thanks.

New release

The latest release has been released on March and since then there have been some useful additions.
Using gogen-avro with Glide or Dep make us use the 4.0.2 by default while installing the compiler using go get installs the latest commit.
Unfortunately, the code doesn't compile because of differences on some signatures.
I think it would be way easier if you created a new release with possibly prebuilt binaries. What do you think ?

Handling default values

Hey Alan,

I noticed that default values don't seem to be handled at the moment (unless I'm missing something).

I generated code based on:

{
    "type": "record",
    "name": "example",
    "subject": "example",
    "fields": [
        {
            "name": "name",
            "type": "string"
        },
        {
            "name": "hobby",
            "type": "string",
            "default": "golf"
        }
    ]
}

Then ran:

func main() {
	el := example.Example{
		Name: "Or",
		// Hobby: "PRs",
	}

	var buf bytes.Buffer
	if err := el.Serialize(&buf); err != nil {
		log.Fatal(err)
	}

	eln, err := example.DeserializeExample(&buf)
	if err != nil {
		log.Fatal(err)
	}

	fmt.Printf("%+v\n", eln)
}

which output:

&{Name:Or Hobby:}

If I uncomment the line above, I get a value as expected.

Integration tests for nested complex types

Add integration tests for maps, arrays and unions containing complex types (maps, arrays, unions, records).

Suppress "SOURCES" comment?

When I generate code it adds a comment to every file listing the "SOURCES". So, every time I add a new file, all existing files are affected. This creates conflicts when collaborating via git. Could we omit this comment?

Accessing schema metadata fields

Hey Alan,

Is it possible somehow to access metadata fields on a schema?

Notice the uuid_keys metadata field on this schema:

[
  { "name": "ip_address", "type": "fixed", "size": 16 },
  {
      "type": "record",
      "name": "event",
      "subject": "event",
      "fields": [
          { "name": "id", "type": "string" },
          { "name": "domain", "type": "string" },
          { "name": "start_ip_address", "type": "ip_address" },
          { "name": "end_ip_address", "type": "ip_address" }
      ],
      "uuid_keys": [ "domain" ]
  }
]

Originally, we were using that field to generate a GenerateID method on the record struct.
Although, after the current changes, it seems a bit more problematic to get that field.
I tried implementing it again in here but I can't access that field anymore.

Another approach which could work is to generate this function on the fly (and not making any changes to gogen-avro) but for that to happen the Serializable's Schema method would have to return metadata fields as well, which it does not seem to do at the moment.

Return interface on deserialization

Deserialize should return container.AvroRecord so that the function can be passed around on a consumer.

Support for standard library MarshalJson([]byte, error) for structure fields

From #79

Configurable support to add MarshalJSON([]byte, error) during code generation. MarshalJSON is called by the standard library "encoding/json" on all the fields, as per Marshal documentation.
Many other libraries are calling this function internally to convert native struct to JSON.
Enumerations which are currently returned as numbers instead of string representations can be resolved by above approach. Example generated code:

type ESomeType int32
const (
    ESomeTypeA ESomeType = 0
    ESomeTypeB ESomeType = 1
)

func (e ESomeType) String() string {
    switch e {
    case ESomeTypeA:
        return "A"
    case ESomeTypeB:
        return "B"
    }
    return "unknown"
}

func (e ESomeType) MarshalJSON() ([]byte, error) {
    switch e {
    case ESomeTypeA:
        return []byte("\"A\""), nil
    case ESomeTypeB:
        return []byte("\"B\""), nil
    }
    return []byte("unknown"), nil
}

There could be a way reutilize the existing String function.

Replace gopkg.in and dep with Go Modules

gopkg.in is the old way of locking versions of packages, in the Dark Ages before we had proper dependency management in Go. Even using just dep would be an improvement over gopkg.in.
Hosting the project under two domains -- gopkg.in and github.com -- adds unnecessary confusion and complexity to anybody trying to use gogen-avro. I ran into this while trying to use gogen-avro using modules. I found that I could not stop go from fetching from both URLs. I have references in my go.mod file to both URLs, and the generated code is using gopkg.in for imports.

"gopkg.in/actgardner/gogen-avro.v5/container"

This is not what I want. I want to import the github.com URLs, not the obsolete old URLs.

The best solution would be to migrate to using modules, which are the new standard for Go dependency management. If you don't like modules for some reason, please pick one method of dependency management -- gopkg.in or dep -- so we can avoid these headaches. Thanks!

Update: When I say headaches, I mean this is breaking my builds and I'm struggling to understand why.

.generated/avro/gulp_metadata.go:27:2: no Go files in /go/src/github.com/foo/bar/vendor/gopkg.in/actgardner/gogen-avro.v5/container

Support deserializing records

Right now we can read schemas and serialize structs, add deserialization support.

Add a private flag so generated structs aren't visible outside the package

Hi,

I noticed that the generated code will contain many exported things that don't necessarily need to be exported.

A few examples are:

DeserializeAvroContainerBlock
DeserializeAvroContainerHeader
AvroContainerBlock
AvroContainerHeader
Magic
Sync

Should those be left as is or possibly made private?

Struct not generated for record inside fields

In the following example, the SpfSub struct is not included in the generated code, but it is referenced:

{
    "type": "record",
    "name": "spf",
    "subject": "spf",
    "version": 1,
    "fields": [
        {
            "name": "dns_txt",
            "type": "string"
        },
        {
            "name": "spf_subrecords",
            "type": [
                "null",
                {
                    "type": "map",
                    "values": {
                        "name": "spf_sub",
                        "type": "record",
                        "fields": [
                            {"name": "dns_txt", "type": "string"},
                            {"name": "malformed_spf", "type": "boolean", "default": false}
                        ]
                    }
                }
            ],
            "default": null
        }
    ]
}

Generated spf_sub.go

package avro

import (
	"io"
)

func DeserializeSpfSub(r io.Reader) (*SpfSub, error) {
	return readSpfSub(r)
}

func (r SpfSub) Serialize(w io.Writer) error {
	return writeSpfSub(&r, w)
}

Naming Conflicts in generated code

I just ran across an issue caused from the way that field names are generated where we end up with two fields with the same name, and obviously the go code will not compile.

Example:

{
	"type": "record",
	"name": "PROFILE_INFO",
	"fields" : [
		{"name": "PERMRESCARD", "type": ["null", "string"]},
		{"name": "PERM_RES_CARD", "type": ["null", "string"]},
	]
}

The resulting struct will not compile:

type PROFILEINFO struct {
	PERMRESCARD     UnionNullString
	PERMRESCARD     UnionNullString
}

will not compile.

container.Writer.Flush() writes empty blocks

This is a problem for two reasons:

They create unnecessary additional space. For thousands of Avro files this matters.
Google BigQuery actually will fail with an EOF error if there are empty blocks in a file.

Nested maps are not supported

Seems that because of small bug, it's not possible to generate code for schema with nested map.
I made a fix already with tests, but as no permission for pushing to this repo, it's available here karol-kokoszka@d84f713

Handling primitive field types in a type map

Support maps for type, even when the nested type is just a primitive and some metadata. Also, pass through the metadata for those primitive types so container files are written correctly.

Expanding on this PR: #26

Generation is type ordering dependent

Schema

{
    "type": "record",
    "name": "Account",
    "fields": [
        {"name": "accountnumber", "type": ["null", "string"]},
        {"name": "name", "type": ["string", "null"]},
    ]
}

yields

type Account struct {
        Accountnumber        UnionNullString
        Name                 UnionStringNull
}

Since type really is a set, I expect Accountnumber to have the same type as Name, but they differ.

release notes?

container.NewWriter should not accept empty compression codec

For now, the container.Writer Factory Method doesn't return any error if one forgets to assign a value to the codec parameter. I would expect the method to return an error or assign the Null Codec instead. This codec parameter mistake/omission can be a bit difficult to troubleshoot, as it causes a nil pointer exception into the AvroRecord's Serialize method of the record being written.

Proposal:

	if codec == Deflate {
		avroWriter.compressedWriter, err = flate.NewWriter(avroWriter.blockBuffer, flate.DefaultCompression)
		if err != nil {
			return nil, err
		}
	} else if codec == Snappy {
		avroWriter.compressedWriter = newSnappyWriter(avroWriter.blockBuffer)
	// else if codec == Null
        } else {
		avroWriter.compressedWriter = avroWriter.blockBuffer
	}

Unable to install `gogen-avro` since `v5.1.0`

It seems that import paths are broken in the new tag:

go get gopkg.in/actgardner/gogen-avro.v5
go install gopkg.in/actgardner/gogen-avro.v5/gogen-avro
/go/src/gopkg.in/actgardner/gogen-avro.v5/gogen-avro/main.go:11:2: cannot find package "github.com/actgardner/gogen-avro/generator" in any of:
        /usr/local/go/src/github.com/actgardner/gogen-avro/generator (from $GOROOT)
        /go/src/github.com/actgardner/gogen-avro/generator (from $GOPATH)
/go/src/gopkg.in/actgardner/gogen-avro.v5/gogen-avro/main.go:12:2: cannot find package "github.com/actgardner/gogen-avro/types" in any of:
        /usr/local/go/src/github.com/actgardner/gogen-avro/types (from $GOROOT)
        /go/src/github.com/actgardner/gogen-avro/types (from $GOPATH)

I see the following change between 5.0.1 and 5.1.0:

https://github.com/actgardner/gogen-avro/blob/v5.0.1/gogen-avro/main.go#L11
https://github.com/actgardner/gogen-avro/blob/v5.1.0/gogen-avro/main.go#L11

Any chance a new tag would fix this?

Thanks!

Support for multiple root objects

Hi,

While trying to generate code for the following schema does works:

[
  {
      "type": "record",
      "name": "event",
      "subject": "event",
      "fields": [
          {
              "name": "id",
              "type": "string",
              "logicalType": "uuid",
              "doc": "Unique ID for this event."
          },
          {
              "name": "start_ip",
              "type": {
                  "type": "fixed",
                  "size": 16,
                  "name": "ip_address"
              },
              "doc": "Start IP of this observation's IP range."
          },
          {
              "name": "end_ip",
              "type": {
                  "type": "fixed",
                  "size": 16,
                  "name": "ip_address"
              },
              "doc": "End IP of this observation's IP range."
          }
      ]
  }
]

It turns out that the above schema is not valid avro (because of defining the type ip_address twice).

It was noticed that it's possible to reformat the schema in the following way:

[
  {
    "type": "fixed",
    "size": 16,
    "name": "ip_address"
  },
  {
      "type": "record",
      "name": "event",
      "subject": "event",
      "fields": [
          {
              "name": "id",
              "type": "string",
              "logicalType": "uuid",
              "doc": "Unique ID for this event."
          },
          {
              "name": "start_ip",
              "type": "ip_address",
              "doc": "Start IP of this observation's IP range."
          },
          {
              "name": "end_ip",
              "type": "ip_address",
              "doc": "End IP of this observation's IP range."
          }
      ]
  }
]

but seems like this schema structure is not supported by gogen-avro.

nested array of records not able to generate go files properly

When I try to generate go files for schema without nested array like one below it works as expected

{
    "name": "Parent",
    "type":"record",
    "fields":[
        {
            "name":"children",
            "type":{
                "name":"Child",
                "type":"record",
                "fields":[
                    {"name":"name", "type":"string"}
                ]
            }
        }
    ] 
}

For the schema mentioned above gogen-avro is able to generate the structs below as expected

type Parent struct {
    Children *Child
}
type Child struct {
    Name string
}

But when make the schema to consist of an array of records like

{
   "name":"Parent",
   "type":"record",
   "fields":[
      {
         "name":"children",
         "type":{
            "type":"array",
            "items":{
               "name":"Child",
               "type":"record",
               "fields":[
                  {
                     "name":"name",
                     "type":"string"
                  }
               ]
            }
         }
      }
   ]
}

gogen-avro generates a struct like

type Parent struct {
    Children []*Record
}

I would expect it to be something like the below structs instead


type Parent struct {
Children []*Child
}
type Child struct {
Name string
}

Please look into it and let me know if you require any further details or if I am missing something

Allow globbing the spec files

There seems to be no support for writing glob-patterns like *.avsc or **/*.avsc when writing the line. Allowing globbing would be a great feature for this code generator.

Handling schema compatibility

Is the parsing lib able to support parsing content generated with different schema, assuming the Avro schema compatibility rules are properly applied ?

For example, adding a field with a default value should not break schema compatibility.

I made some test with it, but I am hitting EOF error.

For example, here is a possible acceptable change:

Add a field at the end of your structure in new version, with a default value.
When you read an old structure with the new schema and hit EOF, if there is default value to apply, they should be applied (without error).
Reading a new structure with an old schema should be just fine as the new field at the end will be ignored.

Are my assumptions correct ?

Generated files are executable

Go source files should not have execute access.

tv@dark ~/z$ ls
schema.avro
tv@dark ~/z$ cat schema.avro 
{
     "type": "record",
     "name": "A",
     "fields": [
       { "name": "name", "type": "string" },
       { "name": "birthday", "type": "long" },
       { "name": "phone", "type": "string" },
       { "name": "siblings", "type": "int" },
       { "name": "spouse", "type": "boolean" },
       { "name": "money", "type": "double" }
     ]
}
tv@dark ~/z$ gogen-avro . schema.avro 
tv@dark ~/z$ ls -l
total 4
-rwxrwxr-x 1 tv tv  773 Mar 29 15:02 a.go*
-rwxrwxr-x 1 tv tv 5224 Mar 29 15:02 primitive.go*
-rw-rw-r-- 1 tv tv  346 Mar 29 15:02 schema.avro
tv@dark ~/z$

This is incorrect:

gogen-avro/generator/file.go

Line 53 in 263625d

err = ioutil.WriteFile(targetFile, fileContent, os.ModePerm)

That's a bitmask you're passing in, you're trying to give all possible access to the file, only tamed by the umask. Pass in 0444 or something like that, instead.

question: using gogen-avro with schemas that contain a map

I'm wondering whether gogen-avro supports Avro schemas with type "map"

{
    "namespace": "my.namespace",
    "type": "record",
    "name": "Metric",
    "fields": [
        {"name": "labels", "type": "map", "values": "string"}
    ]
}

When I run gogen-avro I get the following error:

Error generating code for schema - Unable to resolve definition of type my.namespace.map

I've also tried removing the namespace with a similar result:

Error generating code for schema - Unable to resolve definition of type map

I think that #27 added support for maps but perhaps I'm missing something. Is "map" supported?

Support generating packages that match the Avro namespaces

Hello.

While using version 3 of the generator, if the avro schema contains a type that is part of a namespace code generation fails. This is because the name of the object is invalid (it contains dots)

Avro Schema sample:

{ "type" : "record", "name" : "test", "namespace" : "com.avro.test", "doc" : "GoGen test", "fields" : [ { "name" : "header", "type" : [ "null", { "type" : "record", "name" : "CoreHeader", "namespace" : "headerworks", "doc" : "Common information related to the event which must be included in any clean event", "fields" : [ { "name" : "uuid", "type" : [ "null", { "type" : "record", "name" : "UUID", "namespace" : "headerworks.datatype", "doc" : "A Universally Unique Identifier, in canonical form in lowercase. Example: de305d54-75b4-431b-adb2-eb6b9e546014", "fields" : [ { "name" : "uuid", "type" : "string", "default" : "" } ] } ], "doc" : "Unique identifier for the event used for de-duplication and tracing.", "default" : null }, { "name" : "hostname", "type" : [ "null","string"], "doc" : "Fully qualified name of the host that generated the event that generated the data.", "default" : null }, { "name" : "trace", "type" : ["null", { "type" : "record", "name" : "Trace", "doc" : "Trace", "fields" : [ { "name" : "traceId", "type" : [ "null", "headerworks.datatype.UUID" ], "doc" : "Trace Identifier", "default" : null } ] } ], "doc" : "Trace information not redundant with this object", "default" : null } ] } ], "doc" : "Core data information required for any event", "default" : null } ] }

Error: `gogen-avro.v3 --container --package test vendor/test schemas/core.avsc
Error writing source files to directory "vendor/test" - Error formatting file trace.go - 14:38: expected ';', found '.' (and 2 more errors)

Contents: /*

CODE GENERATED AUTOMATICALLY WITH github.com/alanctgardner/gogen-avro
THIS FILE SHOULD NOT BE EDITED BY HAND
SOURCE:
```
schemas/core.avsc
```

*/
package test
import (
"io"
)

type Trace struct {
TraceId UnionNullheaderworks.datatype.UUID

}

func DeserializeTrace(r io.Reader) (*Trace, error) {
return readTrace(r)
}

func (r Trace) Serialize(w io.Writer) error {
return writeTrace(&r, w)
}`

is this an unsupported function?

Support AVSC "doc" attribute as a struct/field comment.

This is an eye-candy, but should not be too hard to implement. I personally would welcome support for AVRO schema "doc" attribute resulting in a comment directly before the record-related struct definition and field if provided.

Following AVSC...

{
  "type": "record",
  "name": "MyRecord",
  "doc": "MyRecord is a record that represents something that is mine.",
  "fields": [
    {
      "name": "MyField",
      "type": "string",
      "doc": "MyField contains my special value."
    },
  ]
}

...could produce following struct definition...

// MyRecord is a record that represents something that is mine.
type MyRecord struct {
	MyField string // MyField contains my special value.
}

Add support for JSON tags on the generated structs automatically

Provide support for generating struct tags for json parsing according to the AvroSchema Sepcification.

Use case: Marshaling Avro Schemas to Json responses from Rest API. Useful when publishing Avro records to Confluent Rest Proxy which takes an Avro Schema + Json Records.

Example of a Json tagged struct:

type User struct {
  Id        int       `json:"id"`
  Name      string    `json:"name"`
  Bio       string    `json:"about,omitempty"`
  Active    bool      `json:"active"`
  Admin     bool      `json:"-"`
  CreatedAt time.Time `json:"created_at"`
}

By default json:name should be the original name defined in the schema.

Optional:
Ability to ignore a field json:"-"
Ability to omit empty (not sure on use within tools like Confluent Rest Proxy, if it expects explicit null definitions or not)

Record Schema method become camel case

Hi,

Currently, when generating the struct for a record, field names get converted to camel case, e.g. ip_address -> IPAddress. This is so struct field names are more Go idiomatic.

That said, I also noticed that the generated Schema method of a record will contain the CamelCase field names, instead of the original names. This means that the schema loses the original casing of the names.

Output to multiple files like the Java code generator

Split different Record classes into different source files, instead of concatenating everything together.

Read OFC files?

Hi, I'm new to Golang, so feel free to close if this is very obvious.

What's the canonical way to read multiple records from an OFC container file using this repo? If performing a naive read, you get an error (presumably because you're trying to read records where the OFC headers are).

Make `primitive.go` independent of *.avsc files

I just had a package with a single file containing

package avro

//go:generate $GOPATH/bin/gogen-avro --containers . users.avsc
//go:generate $GOPATH/bin/gogen-avro --containers . items.avsc

When running go generate, primitive.go was overwritten by //go:generate $GOPATH/bin/gogen-avro --containers . items.avsc which lead to writeUser(...) and readUser(...) not being available. After some debugging I realized 1) that

//go:generate $GOPATH/bin/gogen-avro --containers . *.avsc

doesn't work and 2) I had to do

//go:generate $GOPATH/bin/gogen-avro --containers . users.avsc items.avsc

Above said, I would have avoided having to debug this if a new go:generate would not overwrite a previous one. My proposal is that all user-specific functions are put in the user.go-generated file instead.

Thoughts?

New tag needed

A new tag is needed in order to apply the fix #58

Option to disable UnionNull... struct code generation

Hello, I would like to propose a tweak (a code generation option) that would simplify the amount of generated structures just by using pointers in following situations:

when a null and primitive type is used, for example ["null", "int"]
when a null and record is used ["null", { "type" : "record" ...

Above tweaks are explained by using following examples:

The first tweak can be explained on an example mentioned in existing README.md

Instead of:

type EnclosingRecord struct {
	Field UnionNullInt
}

type UnionNullInt struct {
	// All the possible types the union could take on
	Null               interface{}
	Int                int32
	// Which field actually has data in it - defaults to the first type in the list, "null"
	UnionType          UnionNullTypeEnum
}

I am proposing: let's generate just pointer to primitive type - in this particular example it would be:

type EnclosingRecord struct {
	Field *int
}

The second tweak:

Instead of:

type EnclosingRecord struct {
	Field UnionNullRecordXY
}

type UnionNullRecordXY struct {
	// All the possible types the union could take on
	Null               interface{}
	Int                *RecordXY
	// Which field actually has data in it - defaults to the first type in the list, "null"
	UnionType          UnionNullTypeEnum
}

I am proposing: let's generate just pointer to record type - in this particular example it would be:

type EnclosingRecord struct {
	Field *RecordXY
}

What do you think? If it makes sense to you, I could provide a pull request.

Thanks

Jozef

Allow adding tags to struct fields

As of #67, it's possible to add comments to the generated structs. AFAICT, however, it's impossible to add tags. This makes it tricky to use libraries that depend on tags, such as validator. Please consider adding a way to add tags to struct fields, such as:

{
  "type": "record",
  "name": "MyRecord",
  "fields": [
    {
      "name": "MyField",
      "type": "string",
      "go_tags": "validate:\"required\""
    },
  ]
}

... generates ...

type MyRecord struct {
	MyField string `validate:"required"`
}

Go doc comments don't follow best-practise

...of starting with //.

Common interfaces

Hey Alan,

I would like to discuss something with you.

It's come up that the generated code contains some things that could be better represented as interfaces. Specifically, the following two:

type Serializable interface {
	Serialize(w io.Writer) error
}

type Writer interface {
	WriteRecord(v Serializable) error
	Flush() error
}

Internally, for us, it made a lot of sense to define those interfaces in another package (avroutil) and make some changes to the generated code that allow us to treat some of the functions more generically. Examples:

Changing the function signature of Deserialize to:

func(r io.Reader) (avroutil.Serializable, error)

Changing the function signature of NewContainerWriter to:

func(writer io.Writer, codec string, recordsPerBlock int64) (avroutil.Writer, error)

Allows us to treat the generated Deserialize and NewContainerWriter functions as having the same type (which lets you pass them around, etc).

I believe this might be beneficial to gogen-avro in general but it would require defining those interfaces somewhere and the generated code trying to import them, which I'm not sure if you would be interested in doing or not.

Very long type names

Hi everyone,

First of all thank you for your efforts and for writing this really nice library. I came across it the other day and decided to give it a try so that we could hopefully start using it in production at Foodchain.

So I tried to generate some code out of a schema that we wrote a while ago and I ended up with 2 files that have very long names. More than the file name what concerns me is the name of the types inside which of course are more or less of the same length as the filename ones.

Here's an example:

// Code generated by github.com/actgardner/gogen-avro. DO NOT EDIT.
/*
 * SOURCE:
 *     products-events.avsc
 */

package product_event

type UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdated struct {
	SupplierCreated                *SupplierCreated
	ProductCreated                 *ProductCreated
	ProductAddedToGroupCatalog     *ProductAddedToGroupCatalog
	ProductRemovedFromGroupCatalog *ProductRemovedFromGroupCatalog
	SupplierDetailsUpdated         *SupplierDetailsUpdated
	ProductDetailsUpdated          *ProductDetailsUpdated
	UnionType                      UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum
}

type UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum int

const (
	UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnumSupplierCreated                UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum = 0
	UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnumProductCreated                 UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum = 1
	UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnumProductAddedToGroupCatalog     UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum = 2
	UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnumProductRemovedFromGroupCatalog UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum = 3
	UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnumSupplierDetailsUpdated         UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum = 4
	UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnumProductDetailsUpdated          UnionSupplierCreatedProductCreatedProductAddedToGroupCatalogProductRemovedFromGroupCatalogSupplierDetailsUpdatedProductDetailsUpdatedTypeEnum = 5
)

Attached the schema I used to generate the code above and below the generate.go file:

package avro

// Use a go:generate directive to build the Go structs for `example.avsc`
// These files are used for all of the example projects
// Source files will be in a package called `example/avro`

//go:generate mkdir -p ./product_event
//go:generate $GOPATH/bin/gogen-avro --package=product_event ./product_event ./../../schemas/products-events.avsc

Would there be a way to avoid generating such long names?

Thanks!

products-events.zip

Error writing file xxx.go open xxx.go: permission denied

run gogen-avro avro/ schemas/*.avsc error with permission denied, all file created as read-only (as expected)

-r--r----- 1 simshi simshi 1521 Jul  6 11:49 a_v3.go
-r--r----- 1 simshi simshi 1201 Jul  6 11:49 b_v2.go
-r--r----- 1 simshi simshi 1038 Jul  6 11:49 c_v1.go
-r--r----- 1 simshi simshi 1043 Jul  6 11:49 d_v1.go
-r--r----- 1 simshi simshi 8936 Jul  6 11:49 primitive.go

so, maybe it's trying to write a file twice?

Evolution branch

Is there a plan to merge evolution branch?
Would be good to support Schema Evolution to have writer and reader Schema on deserialisation.

Conflicting definitions when they are identical

My use case is that I have IDL protocols generating multiple avsc files which overlap (default behavior for avro-tools-1.8.2.jar idl2schemata). When registering definitions it would be nice if the contents was checked for equality to determine if it's ok to proceed or not. Right now, any definition with duplicate FQN or aliases will abort the processing.

Relevant parts:
https://github.com/alanctgardner/gogen-avro/blob/master/types/schema.go#L71
https://github.com/alanctgardner/gogen-avro/blob/master/types/schema.go#L77

Supporting confluent compliant schema registry

We use schema registry to maintain our schemas & versions. It would be great to have an integration with the same.

This is what I propose the solution should look like:

Generator to accept 2 new arguments:
- Schema Registry URL
- Schema subject name in the registry
- Schema version
Generated code should talk to the schema registry instead of using the schema text in the code to get the schema
While deserializing, it strips out confluent specific bytes (first 5) and deserializes the object.
Serializer prepends 5 extra bytes to make it compatible with the schema registry

I am happy to take it up and open a PR since we will be using it in-house very actively.

actgardner / gogen-avro Goto Github PK

gogen-avro's People

Contributors

Stargazers

Watchers

Forkers

gogen-avro's Issues

Recommend Projects

Recommend Topics

Recommend Org