Support Ukrain
Adaltas logoAdaltasAdaltas logoAdaltas

Apache Avro

Avro is a row based data serialization format hosted by the Apache Foundation. An Avro file consists of a header format serialized in JSON followed by the data. Data is serialized in JSON or binary. The majority of applications store data in the binary format for performance reasons. It is smaller and faster. Thus, the schema is interpretable by machines while remaining readable by humans and the data is highly optimized. Another key feature is that avro binary files are compressible and divisible.

Avro is particularly suited for use cases requiring schema migration. Indeed, it supports dynamic typing of the data, as the schema can be modified. Different versions of the schema are saved, allowing schema conflict resolution. This is useful to manage data quality in data stream processing applications like Kafka. The consumers can adapt to the current available schema. In addition, consumers and Hadoop MapReduce tasks can take advantage of the divisibility of the binary files for parallel processing.

The supported data types are:

  • Primitive: null, boolean, int, long, float, double, bytes, and string.
  • Complex: arrays, enums, fixed, maps, records, and unions.

Avro can also be used to exchange data (RPC) by sharing the schema during the connection. The compressibility of the files increases the efficiency of data exchanges and storage.

Related articles

H2O in practice: a protocol combining AutoML with traditional modeling approaches

H2O in practice: a protocol combining AutoML with traditional modeling approaches

Categories: Data Science, Learning | Tags: Automation, Cloud, H2O, Machine Learning, MLOps, On-premises, Open source, Python, XGBoost

H20 comes with a lot of functionalities. The second part of the series H2O in practice proposes a protocol to combine AutoML modeling with traditional modeling and optimization approach. The objective…

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.