Gauthier Leonard is a Data Engineer in Big Data recently graduated. During his internship at Adaltas, he became familiar with the Hadoop ecosystem and the deployment of secure clusters by developing a cluster provisioning automation tool.
Gauthier consolidated his skills during his first assignment as the Big Data referent in a Data Lake project. He helped the customer to design and install an HDP 3 cluster, and set up a first data pipeline using NiFi, Hive 3 (Hive ACID and Hive LLAP) and Oozie.
Published articles
Connecting to ADLS Gen2 from Hadoop (HDP) and Nifi (HDF)
Categories: Big Data, Cloud Computing, Data Engineering | Tags: HDFS, NiFi, Authentication, Authorization, Hadoop, Azure Data Lake Storage (ADLS), Azure, OAuth2
As data projects built in the Cloud are becoming more and more frequent, a common use case is to interact with Cloud storage from an existing on premise Big Data platform. Microsoft Azure recently…
Nov 5, 2020
Running Apache Hive 3, new features and tips and tricks
Categories: Big Data, Business Intelligence, DataWorks Summit 2019 | Tags: Druid, Hive, Kafka, JDBC, LLAP, Hadoop, Release and features
Apache Hive 3 brings a bunch of new and nice features to the data warehouse. Unfortunately, like many major FOSS releases, it comes with a few bugs and not much documentation. It is available since…
Jul 25, 2019
Jumbo, the Hadoop cluster bootstrapper
Categories: Infrastructure | Tags: Ansible, Ambari, Automation, HDP, REST, Cluster, Vagrant
Introducing Jumbo, a Hadoop cluster bootstrapper for developers. Jumbo helps you deploy development environments for Big Data technologies. It takes a few minutes to get a custom virtualized Hadoop…
Nov 29, 2018
KVM machines for Vagrant on Archlinux
Categories: DevOps & SRE | Tags: Arch Linux, KVM, Linux, Virtualization, VM, Vagrant
Vagrant supports different providers to manage virtualization. In a Linux environment, you can dramatically improve VM performance by using the libvirt provider and the KVM hypervisor. This tutorial…
Sep 19, 2018
Apache Beam: a unified programming model for data processing pipelines
Categories: Data Engineering, DataWorks Summit 2018 | Tags: Apex, Beam, Flink, Spark, Pipeline
In this article, we will review the concepts, the history and the future of Apache Beam, that may well become the new standard for data processing pipelines definition. At Dataworks Summit 2018 in…
May 24, 2018