A Big Data setup usually requires you to have multiple networking interface, let’s see how to set up Kafka on more than one of them. Kafka is a open-source stream processing software platform system wich functions like a publish/subscribe distributed messaging. It is designed for high throughput with built-in partitioning, replication, and fault tolerance.

This article was implemented using CDH 5.7.1 with Kafka installed using parcels.

One of the clusters we are working on has the following network configuration:

  • A “data” network exposing our edge, Kafka and master nodes to the outside world
  • An “internal” network dedicated to the cluster for our worker nodes

We use Kafka for data ingestion and also to send processed data to another system exposing UIs for the analysts so we have:

  • A Spark Streaming job consuming Kafka topics from YARN (our “internal” network)
  • The other system’s app consuming Kafka topics from the outside (our “data” network)

Thus, Kafka must be available on two different networks. To do so, the following configuration must be applied on each Kafka broker in the kafka.properties safety valve  input and the Kafka nodes must share the same hostname on both networks:

That’s it!

NB: Kafka is listening on every interface instead of just the one you need. Supposedly, Kafka accepts the following configuration to set specific IP addresses:

however, it will throw this exception on startup:

and a variation of “Each listener must have a different protocol” when changing the ports.