Introduction
Apache Kafka is the go-to distributed streaming platform for handling large-scale real-time data streams. In 2026, KRaft mode (Kafka Raft Metadata) has replaced ZooKeeper, simplifying deployments by eliminating an external component while boosting resilience and performance. This expert tutorial walks you through deploying a high-availability 3-broker cluster with Docker Compose, creating partitioned topics, implementing Python producers and consumers, and tuning for production.
Why it matters: In event-driven architectures (microservices, IoT, real-time analytics), Kafka handles millions of events per second with guaranteed persistence. You'll master precise KRaft configs, replication pitfalls, and copy-paste-ready examples. By the end, your cluster will be ready for Kafka Streams or Connect. Ideal for senior data engineers managing critical pipelines.
Prerequisites
- Docker 25+ and Docker Compose 2.24+ installed and working.
- Python 3.11+ with
pipfor producer/consumer clients. - 4 GB RAM minimum (8 GB recommended for load testing).
- Advanced knowledge of Docker networking, YAML, and Kafka concepts (topics, partitions, replication).
- Ports 29092, 9092-9094 free on localhost.
Docker Compose Configuration for KRaft Cluster
version: '3.8'
services:
broker-1:
image: apache/kafka:3.7.0
container_name: kafka-broker-1
ports:
- "9092:9092"
environment:
KAFKA_NODE_ID: 1
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker-1:29092,PLAINTEXT_HOST://localhost:9092
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker-1:29093,2@broker-2:29093,3@broker-3:29093'
KAFKA_PROCESS_ROLES: 'controller,broker'
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
KAFKA_LISTENERS: 'PLAINTEXT://broker-1:29092,CONTROLLER://broker-1:29093,PLAINTEXT_HOST://0.0.0.0:9092'
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
volumes:
- ./data1:/tmp/kraft-combined-logs
networks:
- kafka-net
broker-2:
image: apache/kafka:3.7.0
container_name: kafka-broker-2
ports:
- "9093:9093"
environment:
KAFKA_NODE_ID: 2
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker-2:29092,PLAINTEXT_HOST://localhost:9093
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker-1:29093,2@broker-2:29093,3@broker-3:29093'
KAFKA_PROCESS_ROLES: 'broker'
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
KAFKA_LISTENERS: 'PLAINTEXT://broker-2:29092,CONTROLLER://broker-2:29093,PLAINTEXT_HOST://0.0.0.0:9093'
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
volumes:
- ./data2:/tmp/kraft-combined-logs
networks:
- kafka-net
depends_on:
- broker-1
broker-3:
image: apache/kafka:3.7.0
container_name: kafka-broker-3
ports:
- "9094:9094"
environment:
KAFKA_NODE_ID: 3
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker-3:29092,PLAINTEXT_HOST://localhost:9094
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker-1:29093,2@broker-2:29093,3@broker-3:29093'
KAFKA_PROCESS_ROLES: 'broker'
KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
KAFKA_LISTENERS: 'PLAINTEXT://broker-3:29092,CONTROLLER://broker-3:29093,PLAINTEXT_HOST://0.0.0.0:9094'
KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
volumes:
- ./data3:/tmp/kraft-combined-logs
networks:
- kafka-net
depends_on:
- broker-1
networks:
kafka-net:
driver: bridgeThis docker-compose.yml deploys a 3-broker cluster in combined KRaft mode (broker + controller on broker-1). Env vars set up internal listeners (inter-broker), host listeners (exposed on localhost), and Raft quorum for high-availability metadata. Generate a unique CLUSTER_ID with kafka-storage.sh random-uuid. Create data1/, data2/, data3/ folders before launch for persistence.
Launching and Verifying the Cluster
Save the code above as docker-compose.yml. Run docker compose up -d to start. Check with docker compose logs broker-1: look for 'Started Kafka Server' and 'Registered broker'. Test connectivity: docker exec -it kafka-broker-1 kafka-broker-api-versions --bootstrap-server localhost:9092. The cluster is ready when all brokers are registered. This setup tolerates 1 failure (3-node quorum).
Creating a Partitioned Topic
#!/bin/bash
docker exec -it kafka-broker-1 \
kafka-topics --bootstrap-server localhost:9092 \
--create --topic expert-tutorial \
--partitions 6 \
--replication-factor 3 \
--config cleanup.policy=delete \
--config retention.ms=3600000
docker exec -it kafka-broker-1 \
kafka-topics --bootstrap-server localhost:9092 \
--describe --topic expert-tutorialThis script creates an 'expert-tutorial' topic with 6 partitions and replication factor 3 (HA). cleanup.policy=delete and retention.ms=1h for short tests. The describe command confirms replication across all 3 brokers. Run it after cluster startup.
Testing with Console Producers and Consumers
Before Python clients, validate with Kafka tools. Producer: docker exec -it kafka-broker-1 kafka-console-producer --bootstrap-server localhost:9092 --topic expert-tutorial. Type messages, Ctrl+C to exit. Consumer: docker exec -it kafka-broker-1 kafka-console-consumer --bootstrap-server localhost:9092 --topic expert-tutorial --from-beginning. You'll see replicated messages. This confirms cluster health.
Complete Asynchronous Python Producer
from kafka import KafkaProducer
import json
import time
import sys
# Prior installation: pip install kafka-python
producer = KafkaProducer(
bootstrap_servers=['localhost:9092', 'localhost:9093', 'localhost:9094'],
value_serializer=lambda v: json.dumps(v).encode('utf-8'),
acks='all', # Durability: wait for full replication
retries=3,
max_in_flight_requests_per_connection=1 # Guaranteed ordering
)
for i in range(10):
message = {'id': i, 'event': f"Expert Kafka event {i}", 'timestamp': time.time()}
future = producer.send('expert-tutorial', value=message)
result = future.get(timeout=10)
print(f"Message {i} envoyé à partition {result.partition}")
producer.flush()
producer.close()
print("Production terminée.")This asynchronous producer sends 10 JSON messages with built-in serialization. bootstrap_servers lists all brokers for resilience. acks='all' + retries=3 + max_in_flight=1 ensures exactly-once semantics and ordering. Copy-paste after pip install kafka-python; run python producer.py.
Python Consumer with Group and Offset Management
from kafka import KafkaConsumer
import json
consumer = KafkaConsumer(
'expert-tutorial',
bootstrap_servers=['localhost:9092', 'localhost:9093', 'localhost:9094'],
group_id='expert-group',
value_deserializer=lambda m: json.loads(m.decode('utf-8')),
auto_offset_commit=True,
enable_auto_commit=True,
auto_offset_reset='earliest',
fetch_min_bytes=1,
max_poll_records=100
)
print("Démarrage consumer...")
for message in consumer:
print(f"Reçu: {message.value} de partition {message.partition} offset {message.offset}")
if message.value['id'] == 9: # Arrêt après dernier message test
break
consumer.close()
print("Consumption terminée.")This consumer uses a group_id for load balancing across partitions. JSON deserialization and auto offset commits. auto_offset_reset='earliest' reads from the beginning for testing. Run in a separate terminal after the producer; scale with multiple consumers for parallelism.
Cluster Metrics Monitoring
#!/bin/bash
# Check brokers
docker exec -it kafka-broker-1 kafka-broker-api-versions --bootstrap-server localhost:9092
# Topics and partitions
docker exec -it kafka-broker-1 kafka-topics --bootstrap-server localhost:9092 --list
# Consumer groups
docker exec -it kafka-broker-1 kafka-consumer-groups --bootstrap-server localhost:9092 --list
# Recent broker-1 logs
tail -f data1/kraft-combined-logs/meta.properties
docker compose logs --tail=20 broker-1This script monitors brokers, topics, groups, and logs. Use it for debugging: check offset lag, under-replicated partitions. kafka-broker-api-versions confirms supported protocols. Integrate with cron or Prometheus for production.
Best Practices
- Partition tuning: Aim for 10k messages/s per partition; use
min.insync.replicas=2for durability without downtime. - Security: Enable SASL/SCRAM or mTLS in production; set
KAFKA_AUTHORIZE...for ACLs. - Scaling: Add brokers dynamically with
kafka-storage.sh format; monitor CPU/IO with JMX Exporter. - Idempotence: Always set
enable.idempotence=trueon producers to avoid duplicates. - KRaft tuning:
controller.socket.timeout.ms=30000for large clusters; back up metadata via snapshots.
Common Errors to Avoid
- CLUSTER_ID mismatch: All brokers must share the same UUID; regenerate if you see 'InconsistentClusterId' error.
- Misconfigured listeners: Distinguish advertised_listeners (clients) from internal listeners; test with
nc -zv localhost 9092. - No persistent volumes: Without
./dataX, metadata/logs are lost on restart → KRaft corruption. - Consumer lag spikes: Increase
session.timeout.ms=30sandmax.poll.interval.ms=5minfor GC pauses.
Next Steps
- Implement Kafka Streams for stateful processing.
- Integrate Kafka Connect for sources/sinks (e.g., PostgreSQL).
- Advanced security: Official KRaft Security Docs.
- Check out our Learni Data Streaming Courses for Confluent/Kafka expert certifications.