There are lots of guides out there describing how to set up simple Apache Kafka configurations but they generally stop short of describing how to use this with a three Apache ZooKeeper quorum so that ZooKeeper isn’t a single point of failure. The configuration of machines that I am working with are running these components:
- Server1 (static 192.168.10.11) – ZooKeeper
- Server2 (static 192.168.10.12) – ZooKeeper
- Server3 (static 192.168.10.13) – ZooKeeper, Kafka broker
- Desktop (static 192.168.10.14) – Kafka producer and Kafka consumer
This setup doesn’t use multiple Kafka brokers but that’s a relatively simple extension.
Setting up the ZooKeeper quorum
The first thing to do is to get the ZooKeeper quorum configured. I am using ZooKeeper 3.4.7 downloaded from here and it comes with an example configuration called conf/zoo_sample.cfg. Copy this to a file called conf/zoo.cfg. Then add three lines at the bottom of conf/zoo.cfg:
server.1=192.168.10.11:2888:3888 server.2=192.168.10.12:2888:3888 server.3=192.168.10.13:2888:3888
Also, change the dataDir path in the file to be something appropriate. The resulting file can be copied (using scp for example) to the other two servers as the conf/zoo.cfg files can all be the same. Note that, even if hostname resolution is working, it appears that the server on which the ZooKeeper instance is running must have an IP address in the server entry, not a hostname (or localhost). So, the simplest thing is to just to use IP addresses for all entries to keep the files all the same (this may not be an issue if DNS is being used to resolve local hostnames however).
Create a file in the directory pointed to by dataDir called myid. This should contain a single number, the server number used in the zoo.cfg for this server. So Server1 would have this in the myid file:
The other servers would have 2 and 3 in their myid files.
So now it is just a case of starting up the ZooKeepers. On each server, navigate to the ZooKeeper directory and enter:
To see if things are ok, enter:
If things are working properly, two of the servers should respond:
and the other:
If so, that’s the ZooKeeper quorum established.
Setting up Kafka to use the ZooKeeper quorum
For this example, I am going to start a Kafka broker on Server3. Kafka can be downloaded from here and I am using version 0.9.0.0. Included is a config/server.properties example file. Copy this to config/server3z.properties. The example file is set to use localhost as the standalone ZooKeeper location. To change this to use the quorum, change the zookeeper.connect line to be:
Note that it is possible to use IP addresses here. However, running Kafka on multiple machines does not seem to work unless hostname resolution is working. The easiest way to make this happen is to add some entries in the /etc/hosts file:
192,168.10.11 Server1 192.168.10.12 Server2 192.168.10.13 Server3 192.168.10.14 Desktop
This should be added to the /etc/hosts file of all machines and networking restarted. It’s obvious why static IP addresses for the servers makes life a lot easier!
Now it’s time to start the broker on Server3 – navigate to the Kafka directory and enter:
To check if things are working, I used the example console producer and consumer included in the Kafka download. So, download Kafka to the Desktop and using a couple of console windows enter:
bin/kafka-console-producer.sh --broker-list 192.168.10.13:9092 --topic test
and on the other:
bin/kafka-console-consumer.sh --zookeeper Server1:2181,Server2:2181,Server3:2181 --topic test
Now with any luck, if you type lines into the producer window, the lines will appear in the consumer window. Note that the producer seems to want an IP address rather than hostname for the broker address. I am not using DNS locally and I assume that the Kafka producer isn’t looking in /etc/hosts. On the other hand, the consumer is happy to get hostnames for the ZooKeeper quorum. I am guessing that using DNS for local hostname resolution will make all of this more consistent but setting that up is for another day…