Advanced multi-tenant Hadoop and Zookeeper protection
Zookeeper is a critical component to Hadoop’s high availability operation. The latter protects itself by limiting the number of maximum connections (maxConns = 400). However Zookeeper does not protect himself intelligently, he refuses connections once the threshold is reached. In such case, the core components (HBase RegionServers / HDFS ZKFC) will no longer be able to initialize a connection and the service will be degraded or unavailable!
It’s very easy to do a DoS attack on Zookeeper. On the other hand, because most of Zookeeper installation are inside trusted or semi-trusted zones, these attacks are often involuntary. It’s enough for a developer to be unresponsive and launch a custom code that opens looping zookeeper sessions without closing them. In this case, zookeeper is compromised, and all the components with it.
Several workarounds can be set up independently or jointly.
The observers are particular zookeepers nodes:
- They do not participate in Quorum
- They synchronize on participating nodes
- They transfer write requests to participating nodes
They therefore make it possible to increase the number of nodes without slowing down the election process.
It is possible to protect yourself from external DoS via iptables. Indeed we can limit the number of connections on the port of zookeeper (2181) by IP address. This makes it possible to put a lower limit on the Zookeeper maxConns and thus block a particular address without blocking access from another machine.
Imagine the following cluster topology:
- 3 edge nodes:
edge1.adaltas.com, edge2.adaltas.com edge3.adaltas.com
- 3 master nodes:
master1.adaltas.com, master2.adaltas.com, master3.adaltas.com
- n worker nodes: n’interviennent pas dans ce cas
- These machines are located in the subnet:
The masters nodes are used as the elective quorum and the edges as observers in order to increase the load.
NB: the even number of nodes is not a problem, only 3 are participants
We set up these configurations (/etc/zookeeper/conf/zoo.cfg):
On the master nodes:
clientPort=2181 maxClientCnxns=200 peerType=participant server.1=master1.adaltas.com:2888:3888 server.2=master2.adaltas.com:2888:3888 server.3=master3.adaltas.com:2888:3888 server.4=edge1.adaltas.com:2888:3888 server.5=edge2.adaltas.com:2888:3888 server.6=edge3.adaltas.com:2888:3888
On the edge nodes:
clientPort=2181 maxClientCnxns=200 peerType=observer server.1=master1.adaltas.com:2888:3888 server.2=master2.adaltas.com:2888:3888 server.3=master3.adaltas.com:2888:3888 server.4=edge1.adaltas.com:2888:3888 server.5=edge2.adaltas.com:2888:3888 server.6=edge3.adaltas.com:2888:3888
On the master nodes, it is forbidden to communicate with external machines on port 2181 (only the local network is allowed) via the following iptables rule:
-A INPUT -m state --state NEW -m tcp -p tcp -s 10.10.10.0/24 --dport 2181 -j ACCEPT
Thus these zookeeper instances are only accessed by our internal services and processes.
Edge nodes limit communication with external machines on port 2181 to 100 simultaneous IP connections via the rule:
iptables -A INPUT -p tcp --syn --dport 2181 -m connlimit --connlimit-above 100 --connlimit-mask 32 -j REJECT --reject-with tcp-reset
For Hadoop services (HDFS ZKFC, HBase Master, etc.), the following connection string is specified:
For “client” configurations (YARN containers, client hbase, third-party applications, etc.) the string is specified:
Thus, when an external job or application launches, it can not saturate the quorum and does not compromise the state of the cluster.
If an external “fraudulent” application uses the string
edge1.adaltas.com:2181,edge2.adaltas.com:2181,edge3.adaltas.com:2181 then it is possible that it saturates the 3 observed nodes. Thus, although the cluster remains stable, some services will be unavailable, since customers will no longer be able to view Zookeeper.
We can limit the impact of the saturation of the observers by decomposing the chain into several substrings which will be specified in the client configurations. Example:
- Chain 1:
- Chain 2:
- Chain 3:
Thus, if string 1 is saturated, edge3 remains available. Applications targeting channel 2 and 3 will not be blocked.