Léo is a Big Data & Hadoop solution architect with sereral years of experience on Hadoop and Distributed Systems. He designs, develops and operates data ingestion workflows and real-time services while accompanying his clients in defining their needs and implementing them.
He is versatile on Big Data platforms, from planning, design and architecture of cluster deployment, administration, maintenance and prototyping and application industrialization in collaboration with business users, analysts, Data Scientists, Engineers and Operations Teams. More recently he started working with Kubernetes and its integration with the Big Data ecosystem.
Published articles

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT
Categories: Big Data, DevOps & SRE, Infrastructure | Tags: Ansible, Ranger, DevOps, Hortonworks, Hadoop, HBase, Knox, Spark, Cloudera, CDP, CDH, Open source, TDP
Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the “best of both worlds…
Apr 14, 2022

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin
Categories: Big Data, Infrastructure | Tags: Hive, Maven, Unit tests, Hadoop, HBase, Spark, Git, Release and features, TDP
The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARN…
Dec 18, 2020

Rebuilding HDP Hive: patch, test and build
Categories: Big Data, Infrastructure | Tags: Hive, Maven, GitHub, Java, Unit tests, Git, Release and features, TDP
The Hortonworks HDP distribution will soon be deprecated in favor of Cloudera’s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunity…
Oct 6, 2020

Installing Hadoop from source: build, patch and run
Categories: Big Data, Infrastructure | Tags: Maven, Java, LXD, Unit tests, Hadoop, HDFS, Docker, TDP
Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsights…
Aug 4, 2020

Expose a Rook-based Ceph cluster outside of Kubernetes
Categories: Containers Orchestration | Tags: Debug, Rook, Ceph, Container, Docker, Kubernetes
We recently deployed a LXD based Hadoop cluster and we wanted to be able to apply size quotas on some filesystems (ie: service logs, user homes). Quota is a built in feature of the Linux kernel used…
Apr 16, 2020

Install and debug Kubernetes inside LXD
Categories: Containers Orchestration | Tags: Debug, Linux, LXD, Container, Docker, Kubernetes, Node
We recently deployed a Kubernetes cluster with the need to maintain clusters isolation on our bare metal nodes across our infrastructure. We knew that Virtual Machines would provide the required…
Feb 4, 2020

Policy enforcing with Open Policy Agent
Categories: Cyber Security, Data Governance | Tags: Ranger, REST, Kafka, Authorization, Cloud, Kubernetes, SSL/TLS
Open Policy Agent is an open-source multi-purpose policy engine. Its main goal is to unify policy enforcement across the cloud native stack. The project was created by Styra and it is currently…
Jan 22, 2020

Auto-scaling Druid with Kubernetes
Categories: Big Data, Business Intelligence, Containers Orchestration | Tags: EC2, Druid, CNCF, Container Orchestration, Data Analytics, Helm, Metrics, OLAP, Operation, Cloud, Kubernetes, Prometheus, Python
Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talk…
Jul 16, 2019

Hadoop cluster takeover with Apache Ambari
Categories: Big Data, DevOps & SRE, Adaltas Summit 2018 | Tags: Ambari, Automation, HDP, iptables, Kerberos, Nikita, REST, Systemd, Cluster, Node, Node.js
We recently migrated a large production Hadoop cluster from a “manual” automated install to Apache Ambari, we called this the Ambari Takeover. This is a risky process and we will detail why this…
Nov 15, 2018

Present and future of Hadoop workflow scheduling: Oozie 5.x
Categories: Big Data, DataWorks Summit 2018 | Tags: Hive, Sqoop, HDP, REST, Hadoop, Oozie, CDH
During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Oozie. It covers the new features released in Oozie 5.0, including future features of…
May 23, 2018

Apache Thrift vs REST
Categories: DevOps & SRE, Open Source Summit Europe 2017 | Tags: Thrift, gRPC, HTTP, REST, JavaScript Object Notation (JSON)
Adaltas recently attended the Open Source Summit Europe 2017 in Prague. I had the opportunity to follow a presentation made by Randy Abernethy and Jens Geyer of RM-X, a cloud native consulting company…
Oct 28, 2017