File Format

Related articles

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop

Spark Streaming part 2: run Spark Structured Streaming pipelines in Hadoop

Categories: Data Engineering, Learning | Tags: Spark, Apache Spark Streaming, Big Data, File Format, Data Governance, Python, Streaming, Hadoop

Spark can process streaming data on a multi-node Hadoop cluster relying on HDFS for the storage and YARN for the scheduling of jobs. Thus, Spark Structured Streaming integrates well with Big Data…

By Oskar RYNKIEWICZ

May 28, 2019

Data Lake ingestion best practices

Data Lake ingestion best practices

Categories: Big Data, Data Engineering | Tags: Avro, Hive, NiFi, ORC, Spark, Data Lake, File Format, Data Governance, HDF, Operation, Protocol Buffers, Registry, Schema

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers…

By David WORMS

Jun 18, 2018

State of the Hadoop open-source ecosystem in early 2013

State of the Hadoop open-source ecosystem in early 2013

Categories: Big Data | Tags: Flume, Kafka, Mahout, Mesos, Phoenix, Pig, File Format, Hadoop

Hadoop is already a large ecosystem and my guess is that 2013 will be the year where it grows even larger. There are some pieces that we no longer need to present. ZooKeeper, hbase, Hive, Pig, Flume…

By David WORMS

Jul 8, 2013

Apache Hive Essentials How-to by Darren Lee

Apache Hive Essentials How-to by Darren Lee

Categories: Business Intelligence, Learning | Tags: Hive, File Format, UDF, Hadoop, SQL

Recently, I’ve been ask to review a new book on Apache Hive called “Apache Hive Essentials How-to” written by Darren Lee and published by Packt Publishing. To say it short, I sincerely recommend it. I…

By David WORMS

Apr 23, 2013

Convert .flac music files to .mp3 on osx

Convert .flac music files to .mp3 on osx

Categories: Hack | Tags: File Format, OS X

As an osx user for years now, one should know by then that iTunes doesn’t support the flac format. We are now in 2012, I’ve been waiting for this to happen since years know. Loosing patience, dark…

By David WORMS

Jul 20, 2012

HDFS and Hive storage - comparing file formats and compression methods

HDFS and Hive storage - comparing file formats and compression methods

Categories: Big Data | Tags: Analytics, HBase, HDFS, Hive, ORC, Parquet, File Format

A few days ago, we have conducted a test in order to compare various Hive file formats and compression methods. Among those file formats, some are native to HDFS and apply to all Hadoop users. The…

By David WORMS

Mar 13, 2012

Timeseries storage in Hadoop and Hive

Timeseries storage in Hadoop and Hive

Categories: Data Engineering | Tags: HDFS, Hive, CRM, File Format, timeseries, Tuning, Hadoop

In the next few weeks, we will be exploring the storage and analytic of a large generated dataset. This dataset is composed of CRM tables associated to one timeserie table of about 7,000 billiard rows…

By David WORMS

Jan 10, 2012

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.