Leo SCHOUKROUN

Big Data Solution Architect

Published articles

Operating Kafka in Kubernetes with Strimzi

Operating Kafka in Kubernetes with Strimzi

Categories: Big Data, Containers Orchestration, Infrastructure | Tags: Kafka, Big Data, Kubernetes, Open source, Streaming

Kubernetes is not the first platform that comes to mind to run Apache Kafka clusters. Indeed, Kafkaā€™s strong dependency on storage might be a pain point regarding Kubernetesā€™ way of doing things whenā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Mar 7, 2023

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Introducing Trunk Data Platform: the Open-Source Big Data Distribution Curated by TOSIT

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: DevOps, Hortonworks, Ansible, Hadoop, HBase, Knox, Ranger, Spark, Cloudera, CDP, CDH, Open source, TDP

Ever since Cloudera and Hortonworks merged, the choice of commercial Hadoop distributions for on-prem workloads essentially boils down to CDP Private Cloud. CDP can be seen as the ā€œbest of both worldsā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Apr 14, 2022

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Build your open source Big Data distribution with Hadoop, HBase, Spark, Hive & Zeppelin

Categories: Big Data, Infrastructure | Tags: Maven, Hadoop, HBase, Hive, Spark, Git, Release and features, TDP, Unit tests

The Hadoop ecosystem gave birth to many popular projects including HBase, Spark and Hive. While technologies like Kubernetes and S3 compatible object storages are growing in popularity, HDFS and YARNā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Dec 18, 2020

Rebuilding HDP Hive: patch, test and build

Rebuilding HDP Hive: patch, test and build

Categories: Big Data, Infrastructure | Tags: Maven, Java, Hive, Git, GitHub, Release and features, TDP, Unit tests

The Hortonworks HDP distribution will soon be deprecated in favor of Clouderaā€™s CDP. One of our clients wanted a new Apache Hive feature backported into HDP 2.6.0. We thought it was a good opportunityā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Oct 6, 2020

Installing Hadoop from source: build, patch and run

Installing Hadoop from source: build, patch and run

Categories: Big Data, Infrastructure | Tags: Maven, Java, LXD, Hadoop, HDFS, Docker, TDP, Unit tests

Commercial Apache Hadoop distributions have come and gone. The two leaders, Cloudera and Hortonworks, have merged: HDP is no more and CDH is now CDP. MapR has been acquired by HP and IBM BigInsightsā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Aug 4, 2020

Expose a Rook-based Ceph cluster outside of Kubernetes

Expose a Rook-based Ceph cluster outside of Kubernetes

Categories: Containers Orchestration | Tags: Debug, Rook, Ceph, Docker, Kubernetes

We recently deployed a LXD based Hadoop cluster and we wanted to be able to apply size quotas on some filesystems (ie: service logs, user homes). Quota is a built in feature of the Linux kernel usedā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Apr 16, 2020

Install and debug Kubernetes inside LXD

Install and debug Kubernetes inside LXD

Categories: Containers Orchestration | Tags: Debug, Linux, LXD, Docker, Kubernetes, Node

We recently deployed a Kubernetes cluster with the need to maintain clusters isolation on our bare metal nodes across our infrastructure. We knew that Virtual Machines would provide the requiredā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Feb 4, 2020

Policy enforcing with Open Policy Agent

Policy enforcing with Open Policy Agent

Categories: Cyber Security, Data Governance | Tags: Kafka, Ranger, Authorization, Cloud, Kubernetes, REST, SSL/TLS

Open Policy Agent is an open-source multi-purpose policy engine. Its main goal is to unify policy enforcement across the cloud native stack. The project was created by Styra and it is currentlyā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Jan 22, 2020

Auto-scaling Druid with Kubernetes

Auto-scaling Druid with Kubernetes

Categories: Big Data, Business Intelligence, Containers Orchestration | Tags: CNCF, Helm, Metrics, OLAP, Operation, Container Orchestration, EC2, Druid, Cloud, Data Analytics, Kubernetes, Prometheus, Python

Apache Druid is an open-source analytics data store which could leverage the auto-scaling abilities of Kubernetes due to its distributed nature and its reliance on memory. I was inspired by the talkā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Jul 16, 2019

Hadoop cluster takeover with Apache Ambari

Hadoop cluster takeover with Apache Ambari

Categories: Big Data, DevOps & SRE, Adaltas Summit 2018 | Tags: Ambari, Automation, iptables, Kerberos, Nikita, Systemd, Cluster, HDP, Node, Node.js, REST

We recently migrated a large production Hadoop cluster from a ā€œmanualā€ automated install to Apache Ambari, we called this the Ambari Takeover. This is a risky process and we will detail why thisā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Nov 15, 2018

Present and future of Hadoop workflow scheduling: Oozie 5.x

Present and future of Hadoop workflow scheduling: Oozie 5.x

Categories: Big Data, DataWorks Summit 2018 | Tags: Hadoop, Hive, Oozie, Sqoop, CDH, HDP, REST

During the DataWorks Summit Europe 2018 in Berlin, I had the opportunity to attend a breakout session on Apache Oozie. It covers the new features released in Oozie 5.0, including future features ofā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

May 23, 2018

Apache Thrift vs REST

Apache Thrift vs REST

Categories: DevOps & SRE, Open Source Summit Europe 2017 | Tags: Thrift, gRPC, HTTP, JavaScript Object Notation (JSON), REST

Adaltas recently attended the Open Source Summit Europe 2017 in Prague. I had the opportunity to follow a presentation made by Randy Abernethy and Jens Geyer of RM-X, a cloud native consulting companyā€¦

Leo SCHOUKROUN

By Leo SCHOUKROUN

Oct 28, 2017

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Scienceā€¦

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain