Cluster

A cluster is a group of two or more nodes which work together. Each node is identified in the cluster by its IP address and/or domain name and has its own storage, RAM, CPU, etc. Server clusters provide access to better resource availability, scalability and reliability.

Learn more: Wikipedia

Hadoop cluster takeover with Apache Ambari

Categories: Big Data, DevOps & SRE, Adaltas Summit 2018 | Tags: Ambari, Automation, iptables, Nikita, Systemd, Cluster, HDP, Kerberos, Node, Node.js, REST

We recently migrated a large production Hadoop cluster from a “manual” automated install to Apache Ambari, we called this the Ambari Takeover. This is a risky process and we will detail why this…

By Leo SCHOUKROUN

Nov 15, 2018

Jumbo, the Hadoop cluster bootstrapper

Categories: Infrastructure | Tags: Ambari, Automation, Ansible, Cluster, Vagrant, HDP, REST

Introducing Jumbo, a Hadoop cluster bootstrapper for developers. Jumbo helps you deploy development environments for Big Data technologies. It takes a few minutes to get a custom virtualized Hadoop…

By Gauthier LEONARD

Nov 29, 2018

Monitoring a production Hadoop cluster with Kubernetes

Categories: DevOps & SRE | Tags: Thrift, Shinken, Hadoop, Knox, Cluster, Docker, Elasticsearch, Grafana, Kubernetes, Node, Node.js, Prometheus, Python

Monitoring a production grade Hadoop cluster is a real challenge and needs to be constantly evolving. The software we use today is based on Nagios. Very efficient when it comes to the simplest…

By Paul-Adrien CORDONNIER

Dec 21, 2018

Rook with Ceph doesn't provision my Persistent Volume Claims!

Categories: DevOps & SRE | Tags: PVC, Linux, Rook, Ubuntu, Ceph, Cluster, Internship, Kubernetes

Ceph installation inside Kubernetes can be provisioned using Rook. Currently doing an internship at Adaltas, I was in charge of participating in the setup of a Kubernetes (k8s) cluster. To avoid…

By Eyal CHOJNOWSKI

Sep 9, 2019

Hadoop Ozone part 1: an introduction of the new filesystem

Categories: Infrastructure | Tags: HDFS, Ozone, Cluster, Kubernetes

Hadoop Ozone is an object store for Hadoop. It is designed to scale to billions of objects of varying sizes. It is currently in development. The roadmap is available on the project wiki. This article…

By Paul-Adrien CORDONNIER

Dec 3, 2019

Hadoop Ozone part 3: advanced replication strategy with Copyset

Categories: Infrastructure | Tags: HDFS, Ozone, Cluster, Kubernetes, Node

Hadoop Ozone provide a way of setting a ReplicationType for every write you make on the cluster. Right now is supported HDFS and Ratis but more advanced replication strategies can be achieved. In this…

By Paul-Adrien CORDONNIER

Dec 3, 2019

Hadoop Ozone part 2: tutorial and getting started of its features

Categories: Infrastructure | Tags: CLI, Learning and tutorial, HDFS, Ozone, Amazon S3, Cluster, REST

The releases of Hadoop Ozone come with a handy docker-compose file to try out Ozone. The below instructions provide details on how to use it. You can also use the Katacoda training sandbox which…

By Paul-Adrien CORDONNIER

Dec 3, 2019

Using Cloudera Deploy to install Cloudera Data Platform (CDP) Private Cloud

Categories: Big Data, Cloud Computing | Tags: Ansible, Cloudera, CDP, Cluster, Data Warehouse, Vagrant, IaC

Following our recent Cloudera Data Platform (CDP) overview, we cover how to deploy CDP private Cloud on you local infrastructure. It is entirely automated with the Ansible cookbooks published by…

By Alexander HOFFMANN

Jul 23, 2021

MinIO object storage within a Kubernetes cluster

Categories: Big Data, Data Governance, Learning | Tags: Amazon S3, Big Data, Cluster, Data Lake, Kubernetes, Storage

MinIO is a popular object storage solution. Often recommended for its simple setup and ease of use, it is not only a great way to get started with object storage: it also provides excellent…

By Luka BIGOT

Jul 9, 2022

Ceph object storage within a Kubernetes cluster with Rook

Categories: Big Data, Data Governance, Learning | Tags: Amazon S3, Big Data, Ceph, Cluster, Data Lake, Kubernetes, Storage

Ceph is a distributed all-in-one storage system. Reliable and mature, its first stable version was released in 2012 and has since then been the reference for open source storage. Ceph’s main perk is…

By Luka BIGOT

Aug 4, 2022

Ingresses and Load Balancers in Kubernetes with MetalLB and nginx-ingress

Categories: Containers Orchestration, Infrastructure, Tech Radar | Tags: Kubeadm, Cluster, Deployment, Ingress, Kubernetes

When it comes to exposing services from a Kubernetes cluster and making it accessible from outside the cluster, the recommended option is to use a load-balancer type service to redirect incoming…

By Kellian COTTART

Sep 8, 2022

Big data infrastructure internship

Categories: Big Data, Data Engineering, DevOps & SRE, Infrastructure | Tags: Infrastructure, Hadoop, Big Data, Cluster, Internship, Kubernetes, TDP

Job description Big Data and distributed computing are at the core of Adaltas. We accompagny our partners in the deployment, maintenance, and optimization of some of the largest clusters in France…

By Stephan BAUM

Dec 2, 2022

Cluster

Related articles