Categories
Kafka Python

Kafka : Message Streaming using Python

Kafka is a paltform that works with message steaming and processing. This open source platform was developed by LinkedIn and later donated it to Apache Software Foundation. It is built to speed up data stream with higher throughput and low latency. It mainly works on publish-subscribe based model. Kafka is made of Java, Scala and it uses a binary TCP based protocol. In this article I am going to discuss the using of kafka with python.

The main fundamental of Kafka are made up of the following components.

  • Topic : Category/Feed name to which messages are going to be stored and published. Messages are generally byte arrays and can store any object in any format.
  • Consumer : Part of consumers group subsribes to particular topic and listens for incoming messages.
  • Producer : Acts like a client and responsible for sending messages to be received by consumers via topics.
  • Broker : Receives messages from producers and stores them. It allows consumers to fetch messages by topics, partitions etc. A Kafka cluster consists of one or more servers (brokers).
  • Zookeeper : Zookeeper Responsible for storing configuration, leader detection for brokers. It tracks status of cluster nodes.

OUR OBJECTIVE

Firstly, we will create a Topic, a Kafka consumer and a producer. After that, our producer will send Json message(consisting a timestamp and a serial number) to a topic by 1 second of interval and the consumer will receive and display the message. We will implement this application using Python3.

KAFKA & Installations

Kafka

You can install kafka from this awesome DigitalOcean link.

Python libraries

  • pip : Follow this link to download.
  • Kafka-python : Python client for Apache Kafka. Install this library from this GitHub link.

TOPIC CREATION

If you haven’t created a topic as said in the above DigitalOcean link, create it now. However, We will use a topic named ‘MyTopic’. Generally in ubuntu you can create it by the following command

KAFKA CONSUMER & PRODUCER

Now we will create a consumer that is ready for accepting messages from our topic ‘MyTopic’.

In the above code segment we have defined a function named ‘start_kafka_consumer’. It will take host name and topic as arguments. Our consumer is listening in ‘MyTopic’ topic. Whenever it is getting a new message it is parsing it to a JsonObject and accessing time, serial values to display them. As kafka server runs on 9092 by default we are not providing port here.

To run it from the terminal, we will issue the following command.

and it will show a message KAFKA Consumer Ready to Accept

After that, we will create our producer, who will be sending messages to the topic.

In the above code, we are creating a timestamp and a serial number by for loop. After that, we are sending it to ‘MyTopic’ topic. We are converting our object to string as it takes byte arrays to send via producer.

Run this producer by

Soon after running the producer you will see messages are being sent to the consumer by producer. Output will look like

Fun right! Now it’s up to you how you use it to format your data and build your application.

REMOTE CALL & CHANGE PORT

Kafka producer and consumer generally runs internally in same server. But you need to change it to run it in remote machine. For instance, a consumer located in remote a server and producer in a different server. Then you can change it by editing configuration files. In ubuntu, you can do it by modifying the [kafka directory]/config/server.properties

Remove the comment the following line in server.properties file.

and change it to

This way you can also change the port for kafka server from 9092 to other. Remember to unblock firewall settings for that port.

Similarly, you can change the port for internal consumer and producer. Modify the port number in config/server.properties file like

After modifying don’t forget to restart the kafka service. Job done!

If you like this article, share it then. Let us know for improvements.

You can also check More Posts by me.

By Abhishek Pachal

Abhishek is a developer cum blogger working more than 6 years. He loves programming especially open stack technologies. He has decent knowledge in Android development, Wordpress, MongoDB, Node.js and so on. Beside this Abhishek finds himself busy in painting, front-end designing.