Data Engineering

Apache Flink: past, present and future

Apache Flink is a little gem which deserves a lot more attention. Let’s dive into Flink’s past, its current state and the future it is heading to by following the keynotes and presentations at Flink Forward 2018. […]

By |2018-11-15T11:47:31+00:00November 5th, 2018|Categories: Big Data, Data Engineering|Tags: , , , , , , |0 Comments

Data Lake ingestion best practices

Creating a Data Lake requires rigor and experience. Here are some good practices around data ingestion both for batch and stream architectures that we recommend and implement with our customers. […]

By |2018-06-18T09:29:50+00:00June 18th, 2018|Categories: Data Engineering, DevOps|Tags: , , , , , , , |1 Comment

Accelerating query processing with materialized views in Apache Hive

Jesus Camacho Rodriguez from Hortonworks held a talk “Accelerating query processing with materialized views in Apache Hive” about the new materialized view feature coming in Apache Hive 3.0. This article covers the main principle of this feature, gives some examples and the improvements that are in the roadmap. […]

By |2019-08-15T22:55:03+00:00May 31st, 2018|Categories: Data Engineering, DataWorks Summit 2018|1 Comment

Apache Beam: a unified programming model for data processing pipelines

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. […]

Essential questions about Time Series

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. […]

By |2019-08-14T23:13:42+00:00March 19th, 2018|Categories: Big Data, Data Engineering|Tags: , , , , , |0 Comments

MiNiFi: Data at Scales & the Values of Starting Small

This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it's a NiFi minimal agent to deploy on small devices to bring data to a cluster's NiFi pipeline (ex: IoT). This post is part of the Series of the Dataworks Summit 2017 (ex-Hadoop Summit) and the speaker is Aldrin Piri from Hortonworks. [...]

By |2019-08-05T21:05:24+00:00July 8th, 2017|Categories: Blog, Data Engineering, Events, Infrastructure|Tags: , , , , |0 Comments

EclairJS – Putting a Spark in Web Apps

Presentation by David Fallside from IBM, images extracted from the presentation. Introduction Web Apps development has moved from Java to NodeJS and Javascript. It provides a simple and rich environment with NPM. EclairJS is a NodeJS library that provides bindings to a Spark application : An RDD is bound to a JS object that is made [...]

By |2019-06-21T22:26:53+00:00July 17th, 2016|Categories: Data Engineering, Events|Tags: , , , |0 Comments