Apache ORC

Related articles

Comparaison of different file formats in Big Data

Categories: Big Data, Data Engineering | Tags: Analytics, Avro, HDFS, Hive, Kafka, MapReduce, ORC, Spark, Batch processing, Big Data, CSV, Data Analytics, Data structures, Database, JSON, Protocol Buffers, Hadoop, Parquet, Kubernetes, XML

In data processing, there are different types of files formats to store your data sets. Each format has its own pros and cons depending upon the use cases and exists to serve one or several purposes…

Aida NGOM

By Aida NGOM

Jul 23, 2020

Data Lake ingestion best practices

Categories: Big Data, Data Engineering | Tags: Avro, Hive, NiFi, ORC, Spark, Data Governance, HDF, Operation, Protocol Buffers, Registry, Schema, Data Lake, File Format

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers…

David WORMS

By David WORMS

Jun 18, 2018

What's new in Apache Spark 2.3?

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Arrow, ORC, Spark, PySpark, Docker, Streaming, Tuning, Spark MLlib, Kubernetes, pandas

Let’s dive into the new features offered by the 2.3 distribution of Apache Spark. This article is a composition of the following talks seen at the DataWorks Summit 2018 and additional research: Apache…

César BEREZOWSKI

By César BEREZOWSKI

May 23, 2018

Essential questions about Time Series

Categories: Big Data | Tags: Druid, Hive, ORC, Elasticsearch, Graphana, IOT, HBase

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. We…

David WORMS

By David WORMS

Mar 19, 2018

HDFS and Hive storage - comparing file formats and compression methods

Categories: Big Data | Tags: Analytics, Hive, ORC, Parquet, File Format

A few days ago, we have conducted a test in order to compare various Hive file formats and compression methods. Among those file formats, some are native to HDFS and apply to all Hadoop users. The…

David WORMS

By David WORMS

Mar 13, 2012

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.