Events

Notes and articles based on events such as meetups and conventions

Running Apache Hive 3, new features and tips and tricks

Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since July 2018 as part of HDP3 (Hortonworks Data Platform version 3). I will first review the new features available with [...]

By |2019-07-25T22:40:14+00:00July 25th, 2019|Categories: Big Data, DataWorks Summit 2019|Tags: , , , , , , , |0 Comments

Auto-scaling Druid with Kubernetes

Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talk “Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes” by Jinchul Kim during DataWorks Summit 2019 Europe in Barcelona. […]

Google Cloud Summit Paris Notes

Google organized its yearly Summit edition 2019 in Paris on the 18th of June. This year's event was the biggest yet in Paris, which reflect Google's commitment to position itself in the French market. In term of Cloud market shares, Google Cloud Platform (GCP) is still far behind its competitor Amazon AWS and Microsoft Azure. [...]

By |2019-06-26T19:23:32+00:00June 26th, 2019|Categories: Events|Tags: , , , , , |0 Comments

Multihoming on Hadoop

Multihoming, which means having multiple networks attached to one node, is one of the main components to manage the heterogeneous network usage of an Apache Hadoop cluster. This article is an introduction to the concept and its applications for real-world businesses. […]

By |2019-03-05T18:48:18+00:00March 5th, 2019|Categories: Adaltas Summit 2018, Big Data, Data Engineering|Tags: , , |0 Comments

Apache Knox made easy!

Apache Knox is the secure entry point of a Hadoop cluster, but can it also be the entry point for my REST applications? […]

Self-sovereign identities with verifiable claims

Towards a trusted, personal, persistent, and portable digital identity for all. […]

Hadoop cluster takeover with Apache Ambari

We recently migrated a large production Hadoop cluster from a “manual” automated install to Apache Ambari, we called this the Ambari Takeover. This is a risky process and we will detail why this operation was required and how we did it. […]

By |2018-11-20T13:54:41+00:00November 15th, 2018|Categories: Adaltas Summit 2018, Big Data|Tags: , , , |0 Comments

One week to discuss technology in a Moroccan riad

Adaltas organise the year its first conference between the 22 and 26 of October. On the agenda of these 5 days of conference: discuss technology in one of the most beautiful riad of Marrakech. Mix the useful with the pleasant, learn and share the feet in the swimming pool. The rule is simple, each participant [...]

By |2019-08-26T23:14:15+00:00October 11th, 2018|Categories: Adaltas Summit 2018|Tags: , , , , , , , |0 Comments

Deep learning on YARN: running Tensorflow and friends on Hadoop cluster

With the arrival of Hadoop 3, YARN offer more flexibility in resource management. It is now possible to perform Deep Learning analysis on GPUs with specific development environments, leveraging available resources. This article is a based on the presentation of Wandga Tan, Apache Hadoop PMC menber, at the DataWorks Summit 2018. It mostly focus on [...]

By |2018-07-24T19:43:12+00:00July 24th, 2018|Categories: Data Science, DataWorks Summit 2018|Tags: , , , |0 Comments