Skip to content
Learni
View all tutorials
Data Engineering

How to Deploy an Apache Kafka Cluster in KRaft in 2026

Lire en français

Introduction

Apache Kafka is the go-to distributed streaming platform for handling large-scale real-time data streams. In 2026, KRaft mode (Kafka Raft Metadata) has replaced ZooKeeper, simplifying deployments by eliminating an external component while boosting resilience and performance. This expert tutorial walks you through deploying a high-availability 3-broker cluster with Docker Compose, creating partitioned topics, implementing Python producers and consumers, and tuning for production.

Why it matters: In event-driven architectures (microservices, IoT, real-time analytics), Kafka handles millions of events per second with guaranteed persistence. You'll master precise KRaft configs, replication pitfalls, and copy-paste-ready examples. By the end, your cluster will be ready for Kafka Streams or Connect. Ideal for senior data engineers managing critical pipelines.

Prerequisites

  • Docker 25+ and Docker Compose 2.24+ installed and working.
  • Python 3.11+ with pip for producer/consumer clients.
  • 4 GB RAM minimum (8 GB recommended for load testing).
  • Advanced knowledge of Docker networking, YAML, and Kafka concepts (topics, partitions, replication).
  • Ports 29092, 9092-9094 free on localhost.

Docker Compose Configuration for KRaft Cluster

docker-compose.yml
version: '3.8'
services:
  broker-1:
    image: apache/kafka:3.7.0
    container_name: kafka-broker-1
    ports:
      - "9092:9092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker-1:29092,PLAINTEXT_HOST://localhost:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker-1:29093,2@broker-2:29093,3@broker-3:29093'
      KAFKA_PROCESS_ROLES: 'controller,broker'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      KAFKA_LISTENERS: 'PLAINTEXT://broker-1:29092,CONTROLLER://broker-1:29093,PLAINTEXT_HOST://0.0.0.0:9092'
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
    volumes:
      - ./data1:/tmp/kraft-combined-logs
    networks:
      - kafka-net
  broker-2:
    image: apache/kafka:3.7.0
    container_name: kafka-broker-2
    ports:
      - "9093:9093"
    environment:
      KAFKA_NODE_ID: 2
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker-2:29092,PLAINTEXT_HOST://localhost:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker-1:29093,2@broker-2:29093,3@broker-3:29093'
      KAFKA_PROCESS_ROLES: 'broker'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      KAFKA_LISTENERS: 'PLAINTEXT://broker-2:29092,CONTROLLER://broker-2:29093,PLAINTEXT_HOST://0.0.0.0:9093'
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
    volumes:
      - ./data2:/tmp/kraft-combined-logs
    networks:
      - kafka-net
    depends_on:
      - broker-1
  broker-3:
    image: apache/kafka:3.7.0
    container_name: kafka-broker-3
    ports:
      - "9094:9094"
    environment:
      KAFKA_NODE_ID: 3
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://broker-3:29092,PLAINTEXT_HOST://localhost:9094
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: '1@broker-1:29093,2@broker-2:29093,3@broker-3:29093'
      KAFKA_PROCESS_ROLES: 'broker'
      KAFKA_CONTROLLER_LISTENER_NAMES: 'CONTROLLER'
      KAFKA_LISTENERS: 'PLAINTEXT://broker-3:29092,CONTROLLER://broker-3:29093,PLAINTEXT_HOST://0.0.0.0:9094'
      KAFKA_LOG_DIRS: '/tmp/kraft-combined-logs'
      CLUSTER_ID: 'MkU3OEVBNTcwNTJENDM2Qk'
    volumes:
      - ./data3:/tmp/kraft-combined-logs
    networks:
      - kafka-net
    depends_on:
      - broker-1
networks:
  kafka-net:
    driver: bridge

This docker-compose.yml deploys a 3-broker cluster in combined KRaft mode (broker + controller on broker-1). Env vars set up internal listeners (inter-broker), host listeners (exposed on localhost), and Raft quorum for high-availability metadata. Generate a unique CLUSTER_ID with kafka-storage.sh random-uuid. Create data1/, data2/, data3/ folders before launch for persistence.

Launching and Verifying the Cluster

Save the code above as docker-compose.yml. Run docker compose up -d to start. Check with docker compose logs broker-1: look for 'Started Kafka Server' and 'Registered broker'. Test connectivity: docker exec -it kafka-broker-1 kafka-broker-api-versions --bootstrap-server localhost:9092. The cluster is ready when all brokers are registered. This setup tolerates 1 failure (3-node quorum).

Creating a Partitioned Topic

create-topic.sh
#!/bin/bash
docker exec -it kafka-broker-1 \
  kafka-topics --bootstrap-server localhost:9092 \
  --create --topic expert-tutorial \
  --partitions 6 \
  --replication-factor 3 \
  --config cleanup.policy=delete \
  --config retention.ms=3600000

docker exec -it kafka-broker-1 \
  kafka-topics --bootstrap-server localhost:9092 \
  --describe --topic expert-tutorial

This script creates an 'expert-tutorial' topic with 6 partitions and replication factor 3 (HA). cleanup.policy=delete and retention.ms=1h for short tests. The describe command confirms replication across all 3 brokers. Run it after cluster startup.

Testing with Console Producers and Consumers

Before Python clients, validate with Kafka tools. Producer: docker exec -it kafka-broker-1 kafka-console-producer --bootstrap-server localhost:9092 --topic expert-tutorial. Type messages, Ctrl+C to exit. Consumer: docker exec -it kafka-broker-1 kafka-console-consumer --bootstrap-server localhost:9092 --topic expert-tutorial --from-beginning. You'll see replicated messages. This confirms cluster health.

Complete Asynchronous Python Producer

producer.py
from kafka import KafkaProducer
import json
import time
import sys

# Prior installation: pip install kafka-python
producer = KafkaProducer(
    bootstrap_servers=['localhost:9092', 'localhost:9093', 'localhost:9094'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',  # Durability: wait for full replication
    retries=3,
    max_in_flight_requests_per_connection=1  # Guaranteed ordering
)

for i in range(10):
    message = {'id': i, 'event': f"Expert Kafka event {i}", 'timestamp': time.time()}
    future = producer.send('expert-tutorial', value=message)
    result = future.get(timeout=10)
    print(f"Message {i} envoyé à partition {result.partition}")

producer.flush()
producer.close()
print("Production terminée.")

This asynchronous producer sends 10 JSON messages with built-in serialization. bootstrap_servers lists all brokers for resilience. acks='all' + retries=3 + max_in_flight=1 ensures exactly-once semantics and ordering. Copy-paste after pip install kafka-python; run python producer.py.

Python Consumer with Group and Offset Management

consumer.py
from kafka import KafkaConsumer
import json

consumer = KafkaConsumer(
    'expert-tutorial',
    bootstrap_servers=['localhost:9092', 'localhost:9093', 'localhost:9094'],
    group_id='expert-group',
    value_deserializer=lambda m: json.loads(m.decode('utf-8')),
    auto_offset_commit=True,
    enable_auto_commit=True,
    auto_offset_reset='earliest',
    fetch_min_bytes=1,
    max_poll_records=100
)

print("Démarrage consumer...")
for message in consumer:
    print(f"Reçu: {message.value} de partition {message.partition} offset {message.offset}")
    if message.value['id'] == 9:  # Arrêt après dernier message test
        break

consumer.close()
print("Consumption terminée.")

This consumer uses a group_id for load balancing across partitions. JSON deserialization and auto offset commits. auto_offset_reset='earliest' reads from the beginning for testing. Run in a separate terminal after the producer; scale with multiple consumers for parallelism.

Cluster Metrics Monitoring

monitor.sh
#!/bin/bash
# Check brokers
docker exec -it kafka-broker-1 kafka-broker-api-versions --bootstrap-server localhost:9092

# Topics and partitions
docker exec -it kafka-broker-1 kafka-topics --bootstrap-server localhost:9092 --list

# Consumer groups
docker exec -it kafka-broker-1 kafka-consumer-groups --bootstrap-server localhost:9092 --list

# Recent broker-1 logs
tail -f data1/kraft-combined-logs/meta.properties
docker compose logs --tail=20 broker-1

This script monitors brokers, topics, groups, and logs. Use it for debugging: check offset lag, under-replicated partitions. kafka-broker-api-versions confirms supported protocols. Integrate with cron or Prometheus for production.

Best Practices

  • Partition tuning: Aim for 10k messages/s per partition; use min.insync.replicas=2 for durability without downtime.
  • Security: Enable SASL/SCRAM or mTLS in production; set KAFKA_AUTHORIZE... for ACLs.
  • Scaling: Add brokers dynamically with kafka-storage.sh format; monitor CPU/IO with JMX Exporter.
  • Idempotence: Always set enable.idempotence=true on producers to avoid duplicates.
  • KRaft tuning: controller.socket.timeout.ms=30000 for large clusters; back up metadata via snapshots.

Common Errors to Avoid

  • CLUSTER_ID mismatch: All brokers must share the same UUID; regenerate if you see 'InconsistentClusterId' error.
  • Misconfigured listeners: Distinguish advertised_listeners (clients) from internal listeners; test with nc -zv localhost 9092.
  • No persistent volumes: Without ./dataX, metadata/logs are lost on restart → KRaft corruption.
  • Consumer lag spikes: Increase session.timeout.ms=30s and max.poll.interval.ms=5min for GC pauses.

Next Steps