Setting up Apache Kafka for use with an Apache ZooKeeper quorum on Ubuntu

There are lots of guides out there describing how to set up simple Apache Kafka configurations but they generally stop short of describing how to use this with a three Apache ZooKeeper quorum so that ZooKeeper isn’t a single point of failure. The configuration of machines that I am working with are running these components:

  • Server1 (static 192.168.10.11) – ZooKeeper
  • Server2 (static 192.168.10.12) – ZooKeeper
  • Server3 (static 192.168.10.13) – ZooKeeper, Kafka broker
  • Desktop (static 192.168.10.14) – Kafka producer and Kafka consumer

This setup doesn’t use multiple Kafka brokers but that’s a relatively simple extension.

Setting up the ZooKeeper quorum

The first thing to do is to get the ZooKeeper quorum configured. I am using ZooKeeper 3.4.7 downloaded from here and it comes with an example configuration called conf/zoo_sample.cfg. Copy this to a file called conf/zoo.cfg. Then add three lines at the bottom of conf/zoo.cfg:

server.1=192.168.10.11:2888:3888
server.2=192.168.10.12:2888:3888
server.3=192.168.10.13:2888:3888

Also, change the dataDir path in the file to be something appropriate. The resulting file can be copied (using scp for example) to the other two servers as the conf/zoo.cfg files can all be the same. Note that, even if hostname resolution is working, it appears that the server on which the ZooKeeper instance is running must have an IP address in the server entry, not a hostname (or localhost). So, the simplest thing is to just to use IP addresses for all entries to keep the files all the same (this may not be an issue if DNS is being used to resolve local hostnames however).

Create a file in the directory pointed to by dataDir called myid. This should contain a single number, the server number used in the zoo.cfg for this server. So Server1 would have this in the myid file:

1

The other servers would have 2 and 3 in their myid files.

So now it is just a case of starting up the ZooKeepers. On each server, navigate to the ZooKeeper directory and enter:

bin/zkServer.sh start

To see if things are ok, enter:

bin/zkServer.sh status

If things are working properly, two of the servers should respond:

Mode: follower

and the other:

Mode: leader

If so, that’s the ZooKeeper quorum established.

Setting up Kafka to use the ZooKeeper quorum

For this example, I am going to start a Kafka broker on Server3. Kafka can be downloaded from here and I am using version 0.9.0.0. Included is a config/server.properties example file. Copy this to config/server3z.properties.  The example file is set to use localhost as the standalone ZooKeeper location. To change this to use the quorum, change the zookeeper.connect line to be:

zookeeper.connect=Server1:2181,Server2:2181,Server3:2181

Note that it is possible to use  IP addresses here. However, running Kafka on multiple machines does not seem to work unless hostname resolution is working. The easiest way to make this happen is to add some entries in the /etc/hosts file:

192,168.10.11  Server1
192.168.10.12  Server2
192.168.10.13  Server3
192.168.10.14  Desktop

This should be added to the /etc/hosts file of all machines and networking restarted. It’s obvious why static IP addresses for the servers makes life a lot easier!

Now it’s time to start the broker on Server3 – navigate to the Kafka directory and enter:

bin/kafka-server-start.sh config/server3z.properties

To check if things are working, I used the example console producer and consumer included in the Kafka download. So, download Kafka to the Desktop and using a couple of console windows enter:

bin/kafka-console-producer.sh --broker-list 192.168.10.13:9092 --topic test

and on the other:

bin/kafka-console-consumer.sh --zookeeper Server1:2181,Server2:2181,Server3:2181 --topic test

Now with any luck, if you type lines into the producer window, the lines will appear in the consumer window. Note that the producer seems to want an IP address rather than hostname for the broker address. I am not using DNS locally and I assume that the Kafka producer isn’t looking in /etc/hosts. On the other hand, the consumer is happy to get hostnames for the ZooKeeper quorum. I am guessing that using DNS for local hostname resolution will make all of this more consistent but setting that up is for another day…

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s