Coder Social home page Coder Social logo

Comments (4)

samv avatar samv commented on June 10, 2024

FWIW, the Avro spec does not specify a format for this. Typical behavior is to write segments of 10 or so rows with the embedded schema at the front (github.com/linkedin/goavro), write a "schema ID" at the start of emitted rows (bottledwater, and presumably the schema ID is the 128-bit MD5 specified in the RPC handshake section of the spec), or just write out JSON rows (eg, confluent Kafka REST proxy and at least one Database -> Kafka CDC tool I looked at).

This is because unlike Thrift, Protocol Buffers, etc, Avro's binary format is not forward compatible. This "saves space" on larger files and "forces everyone to implement the schema protocol" or something like that.

from go-avro.

crast avatar crast commented on June 10, 2024

The avro container file format includes the schema, so that in the future a reader could be able to parse the file, even if schemas were to change.

As a matter of course though, the schema is not technically necessary so long as the receiving/reading end knows the schema of what it's getting. This could be simply hard-coded or agreed upon by the communicating ends, or communicated a different way than sending an object container format, like by sending the MD5 hash of the schema before sending the record (which is used in the Avro-RPC protocol, for example). How you implement that though is not handled within a formal part of the avro spec.

Important note: Avro Binary serialization are not inherently forwards or backwards compatible unless the reader can know the exact schema the record was encoded with. This means that if you make any changes, including new fields, adding defaults, adding new options to a type union, or even adding entries to an enum, this is considered a new and different schema and without knowing that this is a different schema, the reader is likely to fail.

from go-avro.

samv avatar samv commented on June 10, 2024

I don't dispute any of that. However I should issue a correction I've discovered: the Confluent platform has invented its own avro binary format for efficient binary representation of a single row. I thought it was writing JSON but it appears I read the Java sources wrong. The row format consists of null byte, a 32-bit schema ID, and then binary data column by column. I'm not sure how the 32-bit schema ID is generated; it's nothing canonical (and might be a Kafka Schema Registry allocated identifier)

from go-avro.

crast avatar crast commented on June 10, 2024

Side note, I was trying to avoid advertising, but since this project has stopped responding to PR's for 1 year now, and I contacted the original maintainer back last year and he said he is no longer able to access the elodina project, I am going to mention that I've forked this project here:

https://github.com/go-avro/avro#about-this-fork

The new Go import path is gopkg.in/avro.v0

from go-avro.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.