Gauthier Leonard is a Data Engineer in Big Data recently graduated. During his internship at Adaltas, he became familiar with the Hadoop ecosystem and the deployment of secure clusters by developing a cluster provisioning automation tool.

Gauthier consolidated his skills during his first assignment as the Big Data referent in a Data Lake project. He helped the customer to design and install an HDP 3 cluster, and set up a first data pipeline using NiFi, Hive 3 (Hive ACID and Hive LLAP) and Oozie.

Published articles

Running Apache Hive 3, new features and tips and tricks

Running Apache Hive 3, new features and tips and tricks

Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: Druid, Hive, Kafka, Cloudera, Data Warehouse, JDBC, LLAP, Active Directory, Release and features, Hadoop

Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since…

By Gauthier LEONARD

Jul 25, 2019

Jumbo, the Hadoop cluster bootstrapper

Jumbo, the Hadoop cluster bootstrapper

Categories: Infrastructure | Tags: Ansible, Ambari, Automation, HDP, REST, Vagrant

Introducing Jumbo, a Hadoop cluster bootstrapper for developers. Jumbo helps you deploy development environments for Big Data technologies. It takes a few minutes to get a custom virtualized Hadoop…

By Gauthier LEONARD

Nov 29, 2018

KVM machines for Vagrant on Archlinux

KVM machines for Vagrant on Archlinux

Categories: DevOps & SRE | Tags: Arch Linux, KVM, Linux, Vagrant, Virtualization, VM

Vagrant supports different providers to manage virtualization. In a Linux environment, you can dramatically improve VM performance by using the libvirt provider and the KVM hypervisor. This tutorial…

By Gauthier LEONARD

Sep 19, 2018

Apache Beam: a unified programming model for data processing pipelines

Apache Beam: a unified programming model for data processing pipelines

Categories: Data Engineering, DataWorks Summit 2018 | Tags: Apex, Beam, Flink, Spark, Batch processing, Java, Pipeline, Python, Streaming

In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. At Dataworks Summit 2018 in…

By Gauthier LEONARD

May 24, 2018

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.