Blog, last published articles

Present and future of Hadoop workflow scheduling: Oozie 5.x

During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Ambari’s Workflow Scheduler and it’s way to design and visualize Apache Oozie workflows. The talk was given by Artem Ervits, solutions engineer at Hortonworks, and Clay Baenziger, member of the Hadoop Infrastructure team at Bloomberg. They [...]

By | 2018-05-23T17:27:20+00:00 May 23rd, 2018|Categories: Big Data, DataWorks Summit 2018|Tags: , |0 Comments

What’s new in Apache Spark 2.3 ?

Let’s dive into the new features offered by the new 2.3 distribution of Apache Spark. […]

Essential questions about Time Series

Today, the bulk of Big Data is temporal. We see it in the media and among our customers: smart meters, banking transactions, smart factories, connected vehicles … IoT and Big Data go hand in hand. […]

By | 2018-03-20T10:25:13+00:00 March 19th, 2018|Categories: Big Data, Data Engineering|Tags: , , , , , |0 Comments

Notes after Katacoda Training on Kubernetes Container Orchestration

A few weeks ago, I dedicated two days to follow the turorials available on Katacoda, the interactive learning platform for Kubernetes or any other container orchestration platform. I’m sharing my notes which I happen to use regularly as a cheat sheet. […]

By | 2017-12-15T20:21:40+00:00 December 14th, 2017|Categories: Container|Tags: , , , |0 Comments

Open Source Summit 2017 – a week in Pragues

The Adaltas crew went to the Open Source Summit 2017 as well as the Mesos Summit 2017 held in Pragues about 3 weeks back. On this occasion, we compiled a series of articles about the conferences that have marked us most. Over the 3-day period of the Open Source Summit, there is no doubt [...]

By | 2017-11-26T21:29:33+00:00 November 23rd, 2017|Categories: Events|0 Comments

Scaling massive, real-time data pipelines with Go

Last week at the Open Source Summit in Prague, Jean de Klerk held a talk called Scaling massive, real-time data pipelines with Go. This article goes over the main points of the talk, detailing the steps Jean went through when optimising his pipelines, explaining critical parts of his code and reproducing his benchmark results. [...]

Mesos Introduction

Apache Mesos is an open source cluster management project designed to implement and optimize distributed systems. Mesos enables the management and sharing of resources in a fine and dynamic way between different nodes and for various applications. This article covers Mesos architecture, its fundamentals, and its support for NVIDIA GPUs . […]

By | 2017-11-23T10:16:34+00:00 November 15th, 2017|Categories: Open Source Summit Europe 2017|Tags: , , , , , |0 Comments