Protocol Buffers

Protocol Buffers is a serialization format used for data exchange and data storage. Use-cases include batch/streaming processing and communication between multiple microservices in a platform-neutral way. Protocol Buffers focuses only on the ability to serialize and deserialize data as quick as possible and to make the data as small as possible to reduce the bandwidth required. Furthermore, Protocol Buffers, like AVRO, supports schema evolution. It uses a binary file for the schema definition. On the other hand, Protocol Buffers does not split the data like CSV and does not support data compression (unlike ORC, Parquet and AVRO).

Protocol Buffers was created by Google in 2008 as ProtoBuf. It is the most common serialisation format used by gRPC. Protocol Buffers initially supported only three languages: C++, Java and Python. Today, Protocol Buffers supports additional languages like Go, Ruby, JS, PHP, C# and Objective C.

Learn more: Wikipedia
Related tags: Big Data; gRPC

Data Lake ingestion best practices

Categories: Big Data, Data Engineering | Tags: Data Governance, HDF, Operation, Avro, Hive, NiFi, ORC, Spark, Data Lake, File Format, Protocol Buffers, Registry, Schema

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers…

By David WORMS

Jun 18, 2018

Comparison of different file formats in Big Data

Categories: Big Data, Data Engineering | Tags: Business intelligence, Data structures, Avro, HDFS, ORC, Parquet, Batch processing, Big Data, CSV, JavaScript Object Notation (JSON), Kubernetes, Protocol Buffers

In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposes…

By Aida NGOM

Jul 23, 2020

Protocol Buffers

Related articles

Data Lake ingestion best practices

Comparison of different file formats in Big Data