Kafka Basics: Theory and Python Examples

Apache Kafka is a distributed platform for handling real‑time event streams. It provides three main capabilities:

Publish and subscribe to streams of events.
Store those streams durably and reliably for as long as needed.
Process streams of events as they occur or later.

Events, Producers, and Consumers

An event (or record/message) captures that “something happened” in your system. Client applications that write events are called producers, while those that read events are consumers.

Topics and Partitions

Events are grouped in topics, similar to folders in a filesystem.
Topics are immutable; once data is written it cannot be changed.
Each topic is split into partitions. Messages in a partition are ordered and assigned a sequential offset.
Kafka retains data for a configurable period (default: one week). Offsets are never reused, even if old events expire.
Ordering is guaranteed only within a single partition. Without specifying a key, events are distributed round‑robin across partitions. Using a key ensures all events with the same key go to the same partition.

Replication and Durability

Topics can be replicated across brokers for high availability. A replication factor of three is common in production.
Each partition has a leader and one or more followers. Producers and consumers interact with the leader.
In‑Sync Replicas (ISR) are followers that have fully caught up with the leader. Kafka can elect a new leader from the ISR if the current leader fails.
Producers choose the desired reliability via acknowledgements:
- acks=0 – no confirmation (risk of loss).
- acks=1 – leader only.
- acks=all – leader and all ISRs.

Consumers and Groups

Kafka uses a pull model: consumers request data from brokers.
Consumers can form consumer groups for parallelism and fault tolerance. Each partition is consumed by only one member of a group at a time. Multiple groups can independently read the same topic.
The broker tracks the last processed offset for each group in the __consumer_offsets internal topic.
Delivery semantics when committing offsets manually:
- At most once – commit before processing.
- At least once – commit after processing (may lead to duplicates).
- Exactly once – requires idempotent processing and transactional APIs.

Brokers and Coordination

A Kafka broker hosts topic partitions and has a unique ID. After connecting to any bootstrap broker, clients discover the rest of the cluster.
Legacy deployments rely on ZooKeeper for coordination, but modern Kafka versions support KRaft mode, removing ZooKeeper and improving scalability.

Useful CLI Commands

# Create a topic
kafka-topics.sh --create --bootstrap-server $BROKERS \
  --replication-factor 3 --partitions 2 --topic demo

# Produce messages (with optional keys)
kafka-console-producer.sh --bootstrap-server $BROKERS \
  --topic demo --property parse.key=true --property key.separator=:
>key1:value1

# Consume from the beginning
kafka-console-consumer.sh --bootstrap-server $BROKERS \
  --topic demo --from-beginning

# Inspect consumer groups
kafka-consumer-groups.sh --bootstrap-server $BROKERS --list

# Reset offsets for a group
kafka-consumer-groups.sh --bootstrap-server $BROKERS \
  --group app-1 --reset-offsets --to-earliest --topic demo --execute

Python Client Tutorial

1. Run Kafka with Docker

Create a simple docker-compose.yml:

version: '3'
services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.5.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
  kafka:
    image: confluentinc/cp-kafka:7.5.0
    depends_on:
      - zookeeper
    ports:
      - "9092:9092"
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

Start the cluster:

docker compose up -d

2. Listen with a console consumer

In one terminal, wait for messages:

kafka-console-consumer.sh --bootstrap-server localhost:9092 \
  --topic demo --from-beginning \
  --property print.key=true --property print.value=true \
  --property key.separator=:
# waiting for messages ...

3. Python producer

Install the client library and create producer.py:

pip install kafka-python

from kafka import KafkaProducer

producer = KafkaProducer(
    bootstrap_servers=["localhost:9092"],
    key_serializer=lambda k: k.encode("utf-8") if k else None,
    value_serializer=lambda v: v.encode("utf-8"),
)

for i in range(3):
    producer.send("demo", key=f"key-{i}", value=f"hello-{i}")
producer.flush()

Run the script:

python producer.py

The console consumer prints:

key-0:hello-0
key-1:hello-1
key-2:hello-2

4. Python consumer

Create consumer.py:

from kafka import KafkaConsumer

consumer = KafkaConsumer(
    "demo",
    bootstrap_servers=["localhost:9092"],
    group_id="python-app",
    auto_offset_reset="earliest",
    key_deserializer=lambda k: k.decode("utf-8") if k else None,
    value_deserializer=lambda v: v.decode("utf-8"),
)

for msg in consumer:
    print(
        f"partition={msg.partition} offset={msg.offset} key={msg.key} value={msg.value}"
    )

Produce a message from the CLI:

kafka-console-producer.sh --bootstrap-server localhost:9092 --topic demo
>cli-msg

Then run the consumer:

python consumer.py

Sample output:

partition=0 offset=3 key=None value=cli-msg

These scripts show how Python clients interact with Kafka alongside the command-line tools.

Summary

Kafka’s combination of publish/subscribe, durable storage, and stream processing makes it well suited for event‑driven architectures. Understanding topics, partitions, replication, and consumer groups is essential before building applications atop Kafka.

Volodymyr Zavalskyi