airbytehq / json-avro-converter Goto Github PK
View Code? Open in Web Editor NEWThis project forked from allegro/json-avro-converter
Airbyte-specific Json to Avro object converter used in blob storage destinations.
License: Other
This project forked from allegro/json-avro-converter
Airbyte-specific Json to Avro object converter used in blob storage destinations.
License: Other
I was playing around with Airbyte and wanted to pass a binary
field from a source system to an S3 destination using Parquet. The S3 / Parquet destination uses json-avro-converter
library to transform JSON data to Avro before storing as Parquet files.
My data was originally Hex encoded. Looking at the source code only base64 encoded data is supported. So I had to change my source to produce base64 encoded Strings. However even with this change things did not work. And I believe the issue is in this library. I wrote the following test case that confirms the issue (I also implemented a fix here).
@Test
public void testJsonToAvroConverterBinary() throws JsonProcessingException {
String base64String = "4E/QIOo6aRCi2AgAKzAwnQ=="; // Corresponds to Hex: e04fd020ea3a6910a2d808002b30309d
// The following works because all bytes have a mapping when using UTF-8 encoding.
// echo "r@nd0mT3xt" | base64 --> ckBuZDBtVDN4dAo=
// String base64String = "ckBuZDBtVDN4dAo=";
final JsonAvroConverter converter = JsonAvroConverter.builder()
.build();
final String avroSchema = """
{
"type": "record",
"name": "test_schema",
"fields": [
{
"name": "binary_field_base64",
"type": "bytes"
}
]
}""";
final Schema schema = new Schema.Parser().parse(avroSchema);
final JsonNode jsonObject = JsonHelper.deserialize(String.format("{\"binary_field_base64\":\"%s\"}", base64String));
final GenericData.Record actualAvroObject = converter.convertToGenericDataRecord(WRITER.writeValueAsBytes(jsonObject), schema);
java.nio.ByteBuffer retrievedByteBuffer = (java.nio.ByteBuffer) actualAvroObject.get("binary_field_base64");
Assertions.assertArrayEquals(java.util.Base64.getDecoder().decode(base64String), retrievedByteBuffer.array());
}
Are you interested in PRing the fix ?
Does it make sense to support hexadecimal formatted Strings, as well ? For example Postgres outputs hex strings for bytea column.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.