- Protobuf, short for 'Protocol buffers' are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data โ think XML/JSON, but smaller, faster, and simpler.
- You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages.
- See how a sample
.proto
file which contains the schema definition, looks like. - See how the proto compiler generated python code(for the purpose of serializing/deserializing the data) looks like.To generate the Java, Python, or C++ code you need to work with the message types defined in a
.proto
file, you need to run the protocol buffer compilerprotoc
on the.proto
.) - See how you can generate a sample data as per your defined schema and serialize it to its binary form, all using the python data access classes.
- See how you can deserialize the sample data from its binary form, back to its original form.
- See how efficient it is, in terms of the size that it takes to store the data/send across the network, as compared to JSON.
RUN Following:
conda env create -f conda-environment.yml
conda activate protobuf-python
RUN Following:
python -m venv ./protobuf-python
source ./protobuf-python/bin/activate
pip install -r requirements.txt
- from the project root, RUN
protoc protos/person.proto --proto_path generated=protos/ --python_out=protos/
########################################################################################################################
Encoding -> person
########################################################################################################################
'person in python-dictionary form' representation:
------------------------------------------------------------------------------------------------------------------------
{'firstName': 'Naushad', 'lastName': 'Shukoor', 'age': 25, 'email': '[email protected]'}
------------------------------------------------------------------------------------------------------------------------
'person in json form' representation:
------------------------------------------------------------------------------------------------------------------------
{"firstName": "Naushad", "lastName": "Shukoor", "age": 25, "email": "[email protected]"}
------------------------------------------------------------------------------------------------------------------------
Took 95 bytes to store json data in json format
'person in proto form' representation:
------------------------------------------------------------------------------------------------------------------------
firstName: "Naushad"
lastName: "Shukoor"
age: 25
email: "[email protected]"
------------------------------------------------------------------------------------------------------------------------
'person in binary form' representation:
------------------------------------------------------------------------------------------------------------------------
b'\n\x07Naushad\x12\x07Shukoor\x18\x19"\[email protected]'
------------------------------------------------------------------------------------------------------------------------
Took 46 bytes to store serialized data(using protobuf) in binary format
########################################################################################################################
Encoding -> persons (person array)
########################################################################################################################
'persons in python-list form' representation:
------------------------------------------------------------------------------------------------------------------------
[{'firstName': 'Naushad', 'lastName': 'Shukoor', 'age': 25, 'email': '[email protected]'}, {'firstName': 'John', 'lastName': 'Doe', 'age': 26, 'email': '[email protected]'}, {'firstName': 'Bruce', 'lastName': 'Wayne', 'age': 27, 'email': '[email protected]'}, {'firstName': 'Clark', 'lastName': 'Kent', 'age': 28, 'email': '[email protected]'}, {'firstName': 'Peter', 'lastName': 'Parker', 'age': 29, 'email': '[email protected]'}]
------------------------------------------------------------------------------------------------------------------------
'persons in JSON form' representation:
------------------------------------------------------------------------------------------------------------------------
[{"firstName": "Naushad", "lastName": "Shukoor", "age": 25, "email": "[email protected]"}, {"firstName": "John", "lastName": "Doe", "age": 26, "email": "[email protected]"}, {"firstName": "Bruce", "lastName": "Wayne", "age": 27, "email": "[email protected]"}, {"firstName": "Clark", "lastName": "Kent", "age": 28, "email": "[email protected]"}, {"firstName": "Peter", "lastName": "Parker", "age": 29, "email": "[email protected]"}]
------------------------------------------------------------------------------------------------------------------------
Took 443 bytes to store json data in json format
'persons in proto form' representation:
------------------------------------------------------------------------------------------------------------------------
persons {
firstName: "Naushad"
lastName: "Shukoor"
age: 25
email: "[email protected]"
}
persons {
firstName: "John"
lastName: "Doe"
age: 26
email: "[email protected]"
}
persons {
firstName: "Bruce"
lastName: "Wayne"
age: 27
email: "[email protected]"
}
persons {
firstName: "Clark"
lastName: "Kent"
age: 28
email: "[email protected]"
}
persons {
firstName: "Peter"
lastName: "Parker"
age: 29
email: "[email protected]"
}
------------------------------------------------------------------------------------------------------------------------
'persons in binary form' representation:
------------------------------------------------------------------------------------------------------------------------
b'\n.\n\x07Naushad\x12\x07Shukoor\x18\x19"\[email protected]\n\x1f\n\x04John\x12\x03Doe\x18\x1a"\[email protected]\n%\n\x05Bruce\x12\x05Wayne\x18\x1b"\[email protected]\n#\n\x05Clark\x12\x04Kent\x18\x1c"\[email protected]\n\'\n\x05Peter\x12\x06Parker\x18\x1d"\[email protected]'
------------------------------------------------------------------------------------------------------------------------
Took 198 bytes to store serialized data(using protobuf) (5 records) in binary format
Took 9988890 bytes(9.99 MB) to store data (100k records) in JSON format
Took 4983486 bytes(4.98 MB) to store serialized data(using protobuf) (100k records) in binary format
########################################################################################################################
Summary
########################################################################################################################
person(json):95 bytes, person(binary):46 bytes ------> (0.48% reduction in storage size)
persons_5_records(json):443 bytes, persons_5_records(binary):198 bytes ------> (0.45% reduction in storage size)
persons_100k_records(json):9988890 bytes (9.99 MB), persons_100k_records(binary):4983486 bytes (4.98 MB) ------> (0.5% reduction in storage size)
########################################################################################################################
Decoding -> person
########################################################################################################################
person(in binary) deserialized:
------------------------------------------------------------------------------------------------------------------------
firstName: "Naushad"
lastName: "Shukoor"
age: 25
email: "[email protected]"
------------------------------------------------------------------------------------------------------------------------
type -> <class 'generated.person_pb2.Person'>
convert person to Dict using MessageToDict(from google.protobuf.json_format):
------------------------------------------------------------------------------------------------------------------------
{'firstName': 'Naushad', 'lastName': 'Shukoor', 'age': 25, 'email': '[email protected]'}
------------------------------------------------------------------------------------------------------------------------
type -> <class 'dict'>
convert person to JSON using MessageToJson(from google.protobuf.json_format):
------------------------------------------------------------------------------------------------------------------------
{
"firstName": "Naushad",
"lastName": "Shukoor",
"age": 25,
"email": "[email protected]"
}
------------------------------------------------------------------------------------------------------------------------
type -> <class 'str'>
person(in JSON file) read back:
------------------------------------------------------------------------------------------------------------------------
{'firstName': 'Naushad', 'lastName': 'Shukoor', 'age': 25, 'email': '[email protected]'}
------------------------------------------------------------------------------------------------------------------------
type -> <class 'dict'>
########################################################################################################################
Decoding -> persons (person array)
########################################################################################################################
persons(in binary) deserialized:
------------------------------------------------------------------------------------------------------------------------
persons {
firstName: "Naushad"
lastName: "Shukoor"
age: 25
email: "[email protected]"
}
persons {
firstName: "John"
lastName: "Doe"
age: 26
email: "[email protected]"
}
persons {
firstName: "Bruce"
lastName: "Wayne"
age: 27
email: "[email protected]"
}
persons {
firstName: "Clark"
lastName: "Kent"
age: 28
email: "[email protected]"
}
persons {
firstName: "Peter"
lastName: "Parker"
age: 29
email: "[email protected]"
}
------------------------------------------------------------------------------------------------------------------------
type -> <class 'generated.person_pb2.Persons'>
convert person to Dict using MessageToDict(from google.protobuf.json_format):
------------------------------------------------------------------------------------------------------------------------
{'persons': [{'firstName': 'Naushad', 'lastName': 'Shukoor', 'age': 25, 'email': '[email protected]'}, {'firstName': 'John', 'lastName': 'Doe', 'age': 26, 'email': '[email protected]'}, {'firstName': 'Bruce', 'lastName': 'Wayne', 'age': 27, 'email': '[email protected]'}, {'firstName': 'Clark', 'lastName': 'Kent', 'age': 28, 'email': '[email protected]'}, {'firstName': 'Peter', 'lastName': 'Parker', 'age': 29, 'email': '[email protected]'}]}
------------------------------------------------------------------------------------------------------------------------
type -> <class 'dict'>
convert person to JSON using MessageToJson(from google.protobuf.json_format):
------------------------------------------------------------------------------------------------------------------------
{
"persons": [
{
"firstName": "Naushad",
"lastName": "Shukoor",
"age": 25,
"email": "[email protected]"
},
{
"firstName": "John",
"lastName": "Doe",
"age": 26,
"email": "[email protected]"
},
{
"firstName": "Bruce",
"lastName": "Wayne",
"age": 27,
"email": "[email protected]"
},
{
"firstName": "Clark",
"lastName": "Kent",
"age": 28,
"email": "[email protected]"
},
{
"firstName": "Peter",
"lastName": "Parker",
"age": 29,
"email": "[email protected]"
}
]
}
------------------------------------------------------------------------------------------------------------------------
type -> <class 'str'>
Printing value of 'persons' key from the Dict:
------------------------------------------------------------------------------------------------------------------------
[{'firstName': 'Naushad', 'lastName': 'Shukoor', 'age': 25, 'email': '[email protected]'}, {'firstName': 'John', 'lastName': 'Doe', 'age': 26, 'email': '[email protected]'}, {'firstName': 'Bruce', 'lastName': 'Wayne', 'age': 27, 'email': '[email protected]'}, {'firstName': 'Clark', 'lastName': 'Kent', 'age': 28, 'email': '[email protected]'}, {'firstName': 'Peter', 'lastName': 'Parker', 'age': 29, 'email': '[email protected]'}]
------------------------------------------------------------------------------------------------------------------------
type -> <class 'list'>
persons(in JSON file) read back:
------------------------------------------------------------------------------------------------------------------------
[{'firstName': 'Naushad', 'lastName': 'Shukoor', 'age': 25, 'email': '[email protected]'}, {'firstName': 'John', 'lastName': 'Doe', 'age': 26, 'email': '[email protected]'}, {'firstName': 'Bruce', 'lastName': 'Wayne', 'age': 27, 'email': '[email protected]'}, {'firstName': 'Clark', 'lastName': 'Kent', 'age': 28, 'email': '[email protected]'}, {'firstName': 'Peter', 'lastName': 'Parker', 'age': 29, 'email': '[email protected]'}]
------------------------------------------------------------------------------------------------------------------------
type -> <class 'list'>