Protocol Buffers is a serialization format used for data exchange and data storage. Use-cases include batch/streaming processing and communication between multiple microservices in a platform-neutral way. Protocol Buffers focuses only on the ability to serialize and deserialize data as quick as possible and to make the data as small as possible to reduce the bandwidth required. Furthermore, Protocol Buffers, like AVRO, supports schema evolution. It uses a binary file for the schema definition. On the other hand, Protocol Buffers does not split the data like CSV and does not support data compression (unlike ORC, Parquet and AVRO).
Protocol Buffers was created by Google in 2008 as ProtoBuf. It is the most common serialisation format used by gRPC. Protocol Buffers initially supported only three languages: C++, Java and Python. Today, Protocol Buffers supports additional languages like Go, Ruby, JS, PHP, C# and Objective C.
- Learn more
In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposes…
By Aida NGOM
Jul 23, 2020
Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers…
By David WORMS
Jun 18, 2018