A cluster represents a group of servers seen as a single one to obtain more power and availability.
Several architectures exist, the most common being the so-called "active/active" one in which each server is permanently ready to work. This architecture requires a load distribution that can be static or dynamic. Requests are then distributed according to precise rules (static) or according to a scheduling algorithm (dynamic).
The deployment of clusters includes notions of fault tolerance such as the transfer of a server's process in the event of its failure, or the ability to integrate servers into a cluster without having to restart it completely.
- Learn more
Apache Spark is an in-memory data processing tool widely used in companies to deal with Big Data issues. Running a Spark application in production requires user-defined resources. This article…
Mar 30, 2020
Spark MLlib is an Apache’s Spark library offering scalable implementations of various supervised and unsupervised Machine Learning algorithms. Thus, Spark framework can serve as a platform for…
Jun 27, 2019
Following CoreOS’s instructions on how to set up a development environment in VirtualBox did not work out well for me. Here are the steps I followed to get Container Linux up and running with Vagrant…
Jun 20, 2018
Zookeeper is a critical component to Hadoop’s high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protect…
Jul 5, 2017
Apache Mahout is a machine learning library built for scalability. Its core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop…
By David WORMS
Mar 8, 2013