Jumbo - Presentation and future of the Hadoop cluster bootstrapper

Introduction

As experienced Data Engineers, you probably have deployed tens of Hadoop clusters on your computer or in the cloud, and you know how time-consuming it can be to manually change whatever scripts you are using for provisioning. Jumbo was made to bootstrap those scripts in minutes based on your needs.

  • Speaker: Gauthier Leonard
  • Duration: 1h15
  • Format: talk

Presentation

Jumbo is an Open Source project hosted on GitHub that was developed at Adaltas by two interns who had to gain experience with the Hadoop ecosystem. It is a CLI tool written in Python. It offers an abstraction layer that allows any user, experienced or not with Big Data technologies, to describe a cluster that has to be provisioned. It then generates scripts and leverages trusted DevOps tools to provision the cluster.

In its latest version, Jumbo is able to create and provision virtual clusters with the HDP (Hortonworks Data Platform) stack and to Kerberise them, using Vagrant (with VirtualBox or KVM), Ansible and Ambari. Future versions will allow deploying other Hadoop stacks (e.g. CDH - Cloudera Distribution for Hadoop), and other Big Data technologies (e.g. Elasticsearch).

In the talk, we will go throw the concepts that Jumbo uses to generate deployment scripts and how it leverages DevOps tools under the hood. We will also take a look at what's to come for Jumbo and how you can get involved. The talk will be followed by a demo/tutorial of Jumbo.

I invite you to bring your laptop so that you can see the magic in action. To be able to follow the demo, Vagrant, VirtualBox or KVM, and Python 3 have to be installed on your computer!

Author

I am Gauthier Leonard, a Data Engineer working at Adaltas since September 2018. I was an intern in the very same company, where I developed Jumbo with my colleague Xavier Hermand.

I am currently in a mission for Stago, a leader in blood analysis equipment production, as the Big Data referent in a starting Data Lake project. The project involves the two Big Data stacks of Hortonworks HDP (Data Platform) and HDF (DataFlow).

I like designing coherent and optimized Big Data architectures, although I still have a lot to learn in that field. I am also a grammar Nazi when it comes to coding.

Canada - Morocco - France

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.

Support Ukrain