Infrastructure

Because of its importance within a Big Data project, we help you define and implement the appropriate infrastructure that is compatible with your existing and anticipated IT environment.

Our skills cover key topics in design and architecture such as networking, monitoring, diagnostics and reporting, automated deployment, configuration and security. Our expertise extends to a multitude of technologies and distributions.

We have repeatedly secured with Kerberos distributions from Hortonworks, Cloudera and MapR, and have the experience of conducting workshops with several players in your organization to integrate Big Data platforms with technologies such as SSL, Active Directory, FreeIPA, MIT Kerberos, and OpenLDAP.

Infrastructure

Articles related to IT infrastructure

Multihoming on Hadoop

Multihoming on Hadoop

Categories: Infrastructure | Tags: HDFS, Kerberos, Network, Hadoop

Multihoming, which means having multiple networks attached to one node, is one of the main components to manage the heterogeneous network usage of an Apache Hadoop cluster. This article is an…

By Joris RUMMENS

Mar 5, 2019

Jumbo, the Hadoop cluster bootstrapper

Jumbo, the Hadoop cluster bootstrapper

Categories: Infrastructure | Tags: Ansible, Ambari, Automation, HDP, REST, Vagrant

Introducing Jumbo, a Hadoop cluster bootstrapper for developers. Jumbo helps you deploy development environments for Big Data technologies. It takes a few minutes to get a custom virtualized Hadoop…

By Gauthier LEONARD

Nov 29, 2018

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Clusters and workloads migration from Hadoop 2 to Hadoop 3

Categories: Big Data, Infrastructure | Tags: HBase, HDFS, Oozie, Slider, Spark, YARN, Docker, Erasure Coding, Operation, Rolling Upgrade, SLA, Hadoop

Hadoop 2 to Hadoop 3 migration is a hot subject. How to upgrade your clusters, which features present in the new release may solve current problems and bring new opportunities, how are your current…

By Lucas BAKALIAN

Jul 25, 2018

A CoreOS development cluster with Vagrant and VirtualBox

A CoreOS development cluster with Vagrant and VirtualBox

Categories: Hack, Infrastructure | Tags: Arch Linux, Clustering, CoreOS, Linux, Vagrant, VirtualBox, etcd

Following CoreOS’s instructions on how to set up a development environment in VirtualBox did not work out well for me. Here are the steps I followed to get Container Linux up and running with Vagrant…

By Arthur BUSSER

Jun 20, 2018

Lightweight containerization with Tupperware

Lightweight containerization with Tupperware

Categories: Containers Orchestration, Open Source Summit Europe 2017, Infrastructure | Tags: Zookeeper, Btrfs, Cloud, LXD, Red Hat, Systemd

In this article, I will present lightweight containerization set up by Facebook called Tupperware. What is Tupperware Tupperware is a homemade framework written and used internally at Facebook…

By Lucas BAKALIAN

Nov 3, 2017

Nobody* puts Java in a Container

Nobody* puts Java in a Container

Categories: Containers Orchestration, Open Source Summit Europe 2017, Infrastructure | Tags: cgroups, Docker, Java, JRE, JVM, Namespaces

This talk was about the issues of putting Java in a container and how, in its latest version, the JDK is now more aware of the container it is running in. The presentation is led by @joerg_schad…

By Paul-Adrien CORDONNIER

Oct 28, 2017

MariaDB integration with Hadoop

MariaDB integration with Hadoop

Categories: Infrastructure | Tags: Hive, Database, HA, MariaDB, Hadoop

During a workshop with one of our customers, Adaltas has identified a potential risk to use MariaDB’s High Availability (HA) strategy. Since the customer selected Cloudera’s CDH 5 distribution, the…

By David WORMS

Jul 31, 2017

Exposing Kafka on two different networks

Exposing Kafka on two different networks

Categories: Infrastructure | Tags: Kafka, Cloudera, CDH, Cyber Security, Network, VLAN

A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system…

By César BEREZOWSKI

Jul 22, 2017

MiNiFi: Data at Scales & the Values of Starting Small

MiNiFi: Data at Scales & the Values of Starting Small

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: MiNiFi, NiFi, Cloudera, C++, HDP, HDF, IOT

This conference presented rapidly Apache NiFi and explained where MiNiFi came from: basically it’s a NiFi minimal agent to deploy on small devices to bring data to a cluster’s NiFi pipeline (ex: IoT…

By César BEREZOWSKI

Jul 8, 2017

Advanced multi-tenant Hadoop and Zookeeper protection

Advanced multi-tenant Hadoop and Zookeeper protection

Categories: Big Data, Infrastructure | Tags: Zookeeper, Clustering, DoS, iptables, Operation, Scalability

Zookeeper is a critical component to Hadoop’s high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protect…

By Pierre SAUVAGE

Jul 5, 2017

HDP cluster monitoring

HDP cluster monitoring

Categories: Big Data, DevOps & SRE, Infrastructure | Tags: Alert, Ambari, HDP, Metrics, Monitoring, REST

With the current growth of BigData technologies, more and more companies are building their own clusters in hope to get some value of their data. One main concern while building these infrastructures…

By Joris RUMMENS

Jul 5, 2017

Hadoop development cluster of virtual machines with static IP using VirtualBox

Hadoop development cluster of virtual machines with static IP using VirtualBox

Categories: Infrastructure | Tags: Ambari, Cloudera, Hortonworks, Network, Red Hat, VirtualBox, VM, VMware

A few days ago, I explained how to set up a cluster of virtual machine with static IPsand Internet access suitable to host your Hadoop cluster locally for development. At the time I made use of VMWare…

By David WORMS

Mar 14, 2013

Virtual machines with static IP for your Hadoop development cluster

Virtual machines with static IP for your Hadoop development cluster

Categories: Infrastructure | Tags: Ambari, Cloudera, Hortonworks, Network, Red Hat, VirtualBox, VM, VMware

While I am about to install and test Ambari, this article is the occasion to illustrate how I set up my development environment with multiple virtual machines. Ambari, the deployment and monitoring…

By David WORMS

Feb 27, 2013

Canada - Morocco - France

International locations

10 rue de la Kasbah
2393 Rabbat
Canada

We are a team of Open Source enthusiasts doing consulting in Big Data, Cloud, DevOps, Data Engineering, Data Science…

We provide our customers with accurate insights on how to leverage technologies to convert their use cases to projects in production, how to reduce their costs and increase the time to market.

If you enjoy reading our publications and have an interest in what we do, contact us and we will be thrilled to cooperate with you.