Apache Kafka is a distributed streaming platform. With it’s rich API (Application Programming Interface) set, we can connect mostly anything to Kafka as source of data, and on the other end, we can set up a large number of consumers that will receive the steam of records for processing. Kafka is highly scaleable, and stores the streams of data in a reliable and fault-tolerant way. From the connectivity perspective, Kafka can serve as a bridge between many heterogeneous systems, which in turn can rely on it’s capabilities to transfer and persist the data provided.
In this tutorial we will install Apache Kafka on a Red Hat Enterprise Linux 8, create the systemd
unit files for ease of management, and test the functionality with the shipped command line tools.
In this tutorial you will learn:
- How to install Apache Kafka
- How to create systemd services for Kafka and Zookeeper
- How to test Kafka with command line clients
Software Requirements and Conventions Used
Category | Requirements, Conventions or Software Version Used |
---|---|
System | Red Hat Enterprise Linux 8 |
Software | Apache Kafka 2.11 |
Other | Privileged access to your Linux system as root or via the sudo command. |
Conventions |
# – requires given linux commands to be executed with root privileges either directly as a root user or by use of sudo command$ – requires given linux commands to be executed as a regular non-privileged user |
How to install kafka on Redhat 8 step by step instructions
Apache Kafka is written in Java, so all we need is OpenJDK 8 installed to proceed with the installation. Kafka relies on Apache Zookeeper, a distributed coordination service, that is also written in Java, and is shipped with the package we will download. While installing HA (High Availability) services to a single node does kill their purpose, we’ll install and run Zookeeper for Kafka’s sake.
- To download Kafka from the closest mirror, we need to consult the official download site. We can copy the URL of the
.tar.gz
file from there. We’ll usewget
, and the URL pasted to download the package to the target machine:# wget https://www-eu.apache.org/dist/kafka/2.1.0/kafka_2.11-2.1.0.tgz -O /opt/kafka_2.11-2.1.0.tgz
- We enter the
/opt
directory, and extract the archive:# cd /opt # tar -xvf kafka_2.11-2.1.0.tgz
And create a symlink called
/opt/kafka
that points to the now created/opt/kafka_2_11-2.1.0
directory to make our lives easier.ln -s /opt/kafka_2.11-2.1.0 /opt/kafka
- We create a non-privileged user that will run both
zookeeper
andkafka
service.# useradd kafka
- And set the new user as owner of the whole directory we extracted, recursively:
# chown -R kafka:kafka /opt/kafka*
- We create the unit file
/etc/systemd/system/zookeeper.service
with the following content:
[Unit] Description=zookeeper After=syslog.target network.target [Service] Type=simple User=kafka Group=kafka ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh [Install] WantedBy=multi-user.target
Note that we don’t need to write the version number three times because of the symlink we created. The same applies to the next unit file for Kafka,
/etc/systemd/system/kafka.service
, that contains the following lines of configuration:[Unit] Description=Apache Kafka Requires=zookeeper.service After=zookeeper.service [Service] Type=simple User=kafka Group=kafka ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties ExecStop=/opt/kafka/bin/kafka-server-stop.sh [Install] WantedBy=multi-user.target
- We need to reload
systemd
to get it read the new unit files:
# systemctl daemon-reload
- Now we can start our new services (in this order):
# systemctl start zookeeper # systemctl start kafka
If all goes well,
systemd
should report running state on both service’s status, similar to the outputs below:# systemctl status zookeeper.service zookeeper.service - zookeeper Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2019-01-10 20:44:37 CET; 6s ago Main PID: 11628 (java) Tasks: 23 (limit: 12544) Memory: 57.0M CGroup: /system.slice/zookeeper.service 11628 java -Xmx512M -Xms512M -server [...] # systemctl status kafka.service kafka.service - Apache Kafka Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled) Active: active (running) since Thu 2019-01-10 20:45:11 CET; 11s ago Main PID: 11949 (java) Tasks: 64 (limit: 12544) Memory: 322.2M CGroup: /system.slice/kafka.service 11949 java -Xmx1G -Xms1G -server [...]
- Optionally we can enable automatic start on boot for both services:
# systemctl enable zookeeper.service # systemctl enable kafka.service
- To test functionality, we’ll connect to Kafka with one producer and one consumer client. The messages provided by the producer should appear on the console of the consumer. But before this we need a medium these two exchange messages on. We create a new channel of data called
topic
in Kafka’s terms, where the provider will publish, and where the consumer will subscribe to. We’ll call the topic
FirstKafkaTopic
. We’ll use thekafka
user to create the topic:$ /opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic FirstKafkaTopic
- We start a consumer client from the command line that will subscribe to the (at this point empty) topic created in the previous step:
$ /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic FirstKafkaTopic --from-beginning
We leave the console and the client running in it open. This console is where we will receive the message we publish with the producer client.
- On another terminal, we start a producer client, and publish some messages to the topic we created. We can query Kafka for available topics:
$ /opt/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181 FirstKafkaTopic
And connect to the one the consumer is subscribed, then send a message:
$ /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic FirstKafkaTopic > new message published by producer from console #2
At the consumer terminal, the message should appear shortly:
$ /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic FirstKafkaTopic --from-beginning new message published by producer from console #2
If the message appears, our test is successful, and our Kafka installation is working as intended. Many clients could provide and consume one or more topic records the same way, even with a single node setup we created in this tutorial.