Coder Social home page Coder Social logo

go-avro's People

Contributors

arunmk avatar aviflax avatar edgefox avatar erimatnor avatar joestein avatar serejja avatar spenczar avatar spirentnilesh avatar yanzay avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-avro's Issues

Basic Documentation / Readme

It would be great to have a basic README.md to explain the status of this project. A simple usage example plus a note about what is and is not supported would go a long way. Alternatively, if this library is just for internal use, even a note saying that would be useful.

End of file reached

What could be the problem if I get this error:

End of file reached

At:

decoder := avro.NewBinaryDecoder(message)
decodedRecord := avro.NewGenericRecord(avroSchema)
err := avroReader.Read(decodedRecord, decoder)

The messages will be decoded correctly by another avro decoder app (Java).

The schema contains only "string" type, but a lot of fields.

{
"type":"record",
"name":"schema",
"fields":[
{
"name":"prop1",
"type":"string"
},
{
"name":"prop2",
"type":"string"
},
...
]
}

Panic: reflect: call of reflect.Value.Elem on struct Value

I’m confused… I’ve mostly just copied-and-pasted your example code from specific_datum.go and adjusted it to use my own values, but it’s not working. I’m sure I’m just missing something… help?

The panic:

reflect: call of reflect.Value.Elem on struct Value
    /usr/local/Cellar/go/1.4.2/libexec/src/runtime/panic.go:387

    Full Stack Trace
    /usr/local/Cellar/go/1.4.2/libexec/src/runtime/panic.go:387 (0x15328)
        gopanic: reflectcall(unsafe.Pointer(d.fn), deferArgs(d), uint32(d.siz), uint32(d.siz))
    /usr/local/Cellar/go/1.4.2/libexec/src/reflect/value.go:703 (0x113db5)
        Value.Elem: panic(&ValueError{"reflect.Value.Elem", v.kind()})
    /Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:104 (0xfd32c)
        (*SpecificDatumWriter).findField: elem := where.Elem() //TODO maybe check first?
    /Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:93 (0xfd197)
        (*SpecificDatumWriter).writeRecord: field, err := this.findField(v, schemaField.Name)
    /Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:55 (0xfcad8)
        (*SpecificDatumWriter).write: return this.writeRecord(v, enc, s)
    /Users/thavi/dev/go/src/github.com/stealthly/go-avro/datum_writer.go:34 (0xfc9ea)
        (*SpecificDatumWriter).Write: return this.write(rv, enc, this.schema)
    /Users/thavi/dev/go/src/github.com/timehop/streams/tests/integration/lastopens_integration_test.go:209 (0x6386a)
        serialize: writer.Write(appopen, encoder)

My schema and target struct:

package events

// This is an Avro deserialization “target” or “template”.
type AppOpen struct{
    Timestamp int64 // Unix timestamp (the number of seconds elapsed since January 1, 1970 UTC)
    UserID    int64
    Platform  string
}

type schema string

const AppOpenSchema schema = `{
     "type": "record",
     "name": "AppOpen",
     "fields": [
       { "name": "Timestamp", "type": "long" },
       { "name": "UserID", "type": "long" },
       { "name": "Platform", "type": "string" }
     ]
}`

My serialize func:

func serialize(appopens []events.AppOpen, schema avro.Schema) [][]byte {
    writer := avro.NewSpecificDatumWriter()
    writer.SetSchema(schema)
    results := make([][]byte, len(appopens))
    for i, appopen := range appopens {
        buffer := new(bytes.Buffer)
        encoder := avro.NewBinaryEncoder(buffer)
        fmt.Println("About to serialize:", appopen, "to", writer, "using", buffer, "and", encoder)
        writer.Write(appopen, encoder)
        results[i] = buffer.Bytes()
    }
    return results
}

the schema is parsed outside of this func but that seems to be working just fine.

What am I missing?

Invalid default value for duration field of type long

Hi,
why is 1 an invalid default value for a long type? For int its the same. Thanks!

{
    "name": "Packet",
    "type": "record",
    "fields": [{
                "name": "duration",
                "type": "long",
                "default": 1
    }]
}
E:\Repositorys\hemera-golang>go run codegen.go --schema schemas/packet.avsc --out foo.go
Invalid default value for duration field of type long
exit status 1

AVSC file reusing schema

I have a schema file containing an array of record schemas like so:

[
  {
    "name": "ReusedRecord",
    "type": "record",
    "fields": [
      {
        "name": "aString",
        "type": "string"
      }
    ]
  },
  {
     "name": "MainRecord",
     "type": "record",
     "fields": [
        {
          "name": "reusedRecord1",
          "type": "ReusedRecord"
        },
        {
          "name": "reusedRecord1",
          "type": "ReusedRecord"
        }
     ]
  }
]

The code generation works as expected in java through avro-tools but using go-avro it fails with this error: https://github.com/elodina/go-avro/blob/master/codegen.go#L93-L96 since the root schema is a UnionSchema and not a RecordSchema.

Does someone know how to circumvent this issue? Or should I repeat the schema x times?

Panic when name is omitted

I tried to use the codegen to generate code for the following schema, obtained from:

http://wiki.pentaho.com/display/EAI/Avro+Input

{
"type": "map",
"values":{
"type": "record",
"name":"ATM",
"fields": [
{"name": "serial_no", "type": "string"},
{"name": "location", "type": "string"}
]
}
}

However, a panic occurs:

panic: interface conversion: interface is nil, not string

goroutine 1 [running]:
github.com/stealthly/go-avro.parseSchemaField(0x234780, 0x8205cde90, 0x8205a5128, 0x8205cde60, 0x22, 0x1, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:1028 +0x142
github.com/stealthly/go-avro.parseRecordSchema(0x8205cde30, 0x8205a5128, 0x8205cde60, 0x22, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:1013 +0x969
github.com/stealthly/go-avro.schemaByType(0x234780, 0x8205cde30, 0x8205a5128, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:962 +0x1361
github.com/stealthly/go-avro.ParseSchemaWithRegistry(0x3815c0, 0x246, 0x8205a5128, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:882 +0x182
github.com/stealthly/go-avro.ParseSchema(0x3815c0, 0x246, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/schema.go:870 +0xd2
github.com/stealthly/go-avro.(_CodeGenerator).Generate(0x8205a5560, 0x0, 0x0, 0x0, 0x0)
/mypath/golang/src/github.com/stealthly/go-avro/codegen.go:88 +0xe7
main.main.func1(0x82057c3c0)
/mypath/golang/src/.../schema.go:88 +0xaef
github.com/codegangsta/cli.Command.Run(0x2feb18, 0x7, 0x0, 0x0, 0x0, 0x0, 0x0, 0x338360, 0x1d, 0x0, ...)
/mypath/golang/src/github.com/codegangsta/cli/command.go:127 +0x1052
github.com/codegangsta/cli.(_App).Run(0x820594200, 0x820552100, 0x4, 0x4, 0x0, 0x0)
/mypath/golang/src/github.com/codegangsta/cli/app.go:159 +0xc2f
main.main()

If a name property is added to the map object, then at least the panic is resolved...

Proposed API updates/breakages

Since you're talking about having master be an API break in #47, I would like to collect a few changes to make this library a bit more idiomatic with go usages:

  • Hide the underlying implementations of BinaryEncoder and BinaryDecoder. Either turn them to unexported types, or move them to a subpackage. This will drastically reduce the visual clutter in the docs
    • NewBinaryDecoder should return a Decoder interface, not the concrete type
    • Same deal for NewBinaryEncoder returns an Encoder interface
  • Remove Tell() from the Encoder interface. From what I can tell, it's not used at all and constrains writing your own Encoder should you desire.
    • Have NewBinaryEncoder subsequently take an io.Writer at construction. This is actually not a breaking change for user code because *bytes.Buffer satisfies io.Writer, but it allows people to pass in other writers, like, for example, a network socket to encode avro directly to a network connection, or a file, or the like.
  • Consider moving all the concrete implementations of schema types to subpackages (that is, FixedSchema, EnumSchema, IntSchema and so on. This reduces visual clutter, and tab completion confusion. It should also be noted that even though there are pieces which switch on type codes using schema.Type(), the datum writers do type asserts to get the concrete types of many of the the schema types so it's not like someone can simply implement the Schema interface with their own type (or even embed the type) and use it as a replacement as it stands.
    • To deal with those special cases, you could have superinterfaces of Schema such as a RecordSchema interface with getters to get the list of fields, and so on.

Nearly all of these changes, while technically breaking, shouldn't break the vast majority of user code, because most users aren't manipulating schema types or embedding the BinaryEncoder type, they just want to encode and decode from / to avro.

I'm happy to submit PR's for any and all of these changes if you approve of them

AVDL-based codegen

We just recently open sourced our fork of the AVRO java code generator, which generates golang avro objects from avdl files. It currently generates bindings and associated roundtrip unit tests that use the AVRO C libraries for serialization/deserialization, as at the time it was written last summer we were not aware of any golang avro implementations.

Unfortunately, we no longer use golang in our environment, so we aren't planning on updating the code generator. However, with a little work, I suspect either this project or the other golang avro project could adapt it to use golang bindings instead of the C library.

Has go<>java binary avro interop been verified?

I have a basic avro schema that includes some nested records, enums, and some arrays & maps.

We are observing no issues when roundtripping this data purely in go. By this, I mean serializing a record to binary data using go and then deserializing the same data back into memory.

However, a simple java test program using the java 1.7.7 avro libraries cannot deserialize binary avro data written in go. The inverse is not working either, i.e. we are not able to deserialize (in go) data generated using the same schema (by java).

As far as I can grok from looking at the binary data generated by each runtime, java appears to generate more densely packed 'array' and 'map' count values which proceed the actual data.

I have a suspicion that java is generating one byte 'count' values and this library is generating two byte values.

Has anybody tried this kind of java <> go interop for binary serialized data (specifically, maps/arrays)?

I can possibly supply some sample java & go code.

NewDataFileReader returns all nil values on fields and ends with "Block read is unfinished"

Here's my code (pretty much copy pasted from the example)...

                reader, err := avro.NewDataFileReader(fileName, avro.NewSpecificDatumReader())
		if err != nil {
			fmt.Println(err)
			return
		}
		for {
			obj := &PADirectJustListedItem{}
			ok, err := reader.Next(obj)
			if !ok {
				if err != nil {
					fmt.Println(err)
					return
				}
				break
			} else {
				fmt.Printf("%#v\n", obj)
			}
		}

ouput:
go run main.go
&main.PADirectJustListedItem{snapshotdate:(*int64)(nil), propertyid:(*int32)(nil), accountid:(*int32)(nil), bedrooms:(*int)(nil), bathrooms:(*string)(nil), finishedsqft:(*int)(nil), lotsizesqft:(*int)(nil), city:(*string)(nil), state:(*string)(nil), postalcode:(*string)(nil), propetyaddress:(*string)(nil), image1id:(*int64)(nil), image2id:(*int64)(nil), image3id:(*int64)(nil), manualimageid:(*int64)(nil), sellingpricedollarcnt:(*int32)(nil), realestatebrokerid:(*int32)(nil), daysonzillow:(*int32)(nil), multiplelistingservicecode:(*string)(nil), postingid:(*int32)(nil), postingdateinitial:(*int64)(nil), auditdatecreated:(*int64)(nil)}
&main.PADirectJustListedItem{snapshotdate:(*int64)(nil), propertyid:(*int32)(nil), accountid:(*int32)(nil), bedrooms:(*int)(nil), bathrooms:(*string)(nil), finishedsqft:(*int)(nil), lotsizesqft:(*int)(nil), city:(*string)(nil), state:(*string)(nil), postalcode:(*string)(nil), propetyaddress:(*string)(nil), image1id:(*int64)(nil), image2id:(*int64)(nil), image3id:(*int64)(nil), manualimageid:(*int64)(nil), sellingpricedollarcnt:(*int32)(nil), realestatebrokerid:(*int32)(nil), daysonzillow:(*int32)(nil), multiplelistingservicecode:(*string)(nil), postingid:(*int32)(nil), postingdateinitial:(*int64)(nil), auditdatecreated:(*int64)(nil)}
Block read is unfinished

I have a couple files to test with. All of them I'm able to use the avro tools to convert them to json and it works fine:

java -jar avro-tools-1.8.2.jar tojson part-m-00000.avro > 00001.json

Issues in enum caching code.

I was going to take issue with the enum caching code from 2f447d1 partially because it disambiguates on the schema name and I don't feel like that's a good idea; enums can have very common names like 'Type' which is a type of thing that is contextually different based on what the enclosing schema is. I have a potential solution to that coming down the line which still provides the speedup advantage.

However, more importantly while reading it over, I found a data race. The source of the issue is that the map reads are not locked via mutexes. This is not allowed if the map can be written to (even if the writes are locked via mutex); and the end result is this can cause run-time crashes when running an application which runs multiple goroutines.

With these tests: crast@enum-data-race-proof

Runtime with go test -race finds this race:

$ go test -race -run Race -v
=== RUN   TestEnumCachingRace
==================
WARNING: DATA RACE
Read by goroutine 8:
  runtime.mapaccess1_faststr()
      /usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap_fast.go:179 +0x0
  github.com/elodina/go-avro.(*GenericDatumReader).mapEnum()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:477 +0x1a9
  github.com/elodina/go-avro.(*GenericDatumReader).readValue()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:422 +0x14a
  github.com/elodina/go-avro.(*GenericDatumReader).findAndSet()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:382 +0xb5
  github.com/elodina/go-avro.(*GenericDatumReader).mapRecord()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:555 +0x266
  github.com/elodina/go-avro.(*GenericDatumReader).readValue()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:430 +0xd4
  github.com/elodina/go-avro.(*GenericDatumReader).Read()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:364 +0x1c6
  github.com/elodina/go-avro.enumRaceTest.func1()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:363 +0x3b7
  github.com/elodina/go-avro.parallelF.func1()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:375 +0x74

Previous write by goroutine 7:
  runtime.mapassign1()
      /usr/local/Cellar/go/1.5.3/libexec/src/runtime/hashmap.go:411 +0x0
  github.com/elodina/go-avro.(*GenericDatumReader).mapEnum()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:484 +0x438
  github.com/elodina/go-avro.(*GenericDatumReader).readValue()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:422 +0x14a
  github.com/elodina/go-avro.(*GenericDatumReader).findAndSet()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:382 +0xb5
  github.com/elodina/go-avro.(*GenericDatumReader).mapRecord()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:555 +0x266
  github.com/elodina/go-avro.(*GenericDatumReader).readValue()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:430 +0xd4
  github.com/elodina/go-avro.(*GenericDatumReader).Read()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader.go:364 +0x1c6
  github.com/elodina/go-avro.enumRaceTest.func1()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:363 +0x3b7
  github.com/elodina/go-avro.parallelF.func1()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:375 +0x74

Goroutine 8 (running) created at:
  github.com/elodina/go-avro.parallelF()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:377 +0xb4
  github.com/elodina/go-avro.enumRaceTest()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:364 +0x18d
  github.com/elodina/go-avro.TestEnumCachingRace()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:346 +0xc2
  testing.tRunner()
      /usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc

Goroutine 7 (running) created at:
  github.com/elodina/go-avro.parallelF()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:377 +0xb4
  github.com/elodina/go-avro.enumRaceTest()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:364 +0x18d
  github.com/elodina/go-avro.TestEnumCachingRace()
      $GOPATH/src/github.com/elodina/go-avro/datum_reader_test.go:346 +0xc2
  testing.tRunner()
      /usr/local/Cellar/go/1.5.3/libexec/src/testing/testing.go:456 +0xdc
==================
--- PASS: TestEnumCachingRace (0.01s)
=== RUN   TestEnumCachingRace2
--- PASS: TestEnumCachingRace2 (0.00s)
PASS
Found 1 data race(s)
exit status 66
FAIL    github.com/elodina/go-avro  1.090s

GenericDatumWriter encodes Fixed type as Bytes type

I'm not sure if this is intentional or not, but the GenericDatumWriter currently writes the Fixed type as a Bytes type. That is to say it prepends the bytes with the length of the byte array. This does not follow avro's serialization for Fixed types (which does not include the length of the array). I noticed that in SpecificDatumWriter, the Fixed type is encoded as a Fixed type, hence why I'm not sure if this was intentional or not.

func (writer *GenericDatumWriter) writeFixed(v interface{}, enc Encoder, s Schema) error {
return writer.writeBytes(v, enc)
}

Serialize array of strings?

Hi Team,

When I try to serialize Array of strings, I am getting an index out of range panic. Can you please post an example of writing an array of strings?

GenericDatum: Process default values correctly

Currently:

  1. Default values are not processed well in GenericDatum. They are not set at all.
  2. If a number literal is used as a default value, it is converted internally to float64 due to GoLang JSON unmarshalling.

The above need to be fixed.

Can the code gen handle maps (et al)?

I tried this, which is based on an example I groked on Pentaho's website:

...

schemas = []string {
                    `{
    "type": "record",
    "namespace": "com.philips.lighting.dna.ingestion",
    "name": "LongList",
    "fields": [
        {
            "type": "map",
            "name": "inner_name",
            "values": {
                "type": "record",
                "name": "ATM",
                "fields": [
                    {
                        "name": "serial_no",
                        "type": "string"
                    },
                    {
                        "name": "location",
                        "type": "string"
                    }
                ]
            }
        }
    ]
}`
}
            gen := avro.NewCodeGenerator(schemas)
            code, err := gen.Generate()

And the following error is generated:

2015/09/21 11:58:05 Unknown type name: map
make: *** [codegen] Error 1

It looks like the code that parses the "type" sees a string value associated with the "type: map" line, and then only can interpret basic types, excluding maps, arrays, enums etc.

Perhaps I misunderstood or then this example avro schema is invalid? (source for the schema: http://wiki.pentaho.com/display/EAI/Avro+Input)

Add Godoc

We need some Godoc once API is more or less stable

Codegen - Generate schema dependent record

Hi there,

I'm trying to generate code with Codegen using two schemas.

In device.avsc, I have a record of type Device.

In status.avsc, I have a record of type Status that have a field of type Device.

I tried to first generate device.avsc and then status.avsc and to use both schema to generate one go file but Codegen says that the type Device is undefined.

Is there a way to make that happen ? If not, do you have ideas to modify Codegen in order to make that possible ? I'm willing to help.

Thanks and regards,
Albin.

usage multiple schema test case / code example

Project is looking very interesting, but it misses example how to load multiple avro schemas and then use them.

I can not see in test cases example where one would load few avro schemas where one is depended on another, and then serialize/deserialize message.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.