Introduction
Data Mesh revolutionizes data management by decentralizing ownership to business domains, turning data into consumable products, providing self-serve infrastructure, and enforcing federated governance. Unlike monolithic data lakes that create bottlenecks, Data Mesh operates like a federation of data micro-enterprises: each domain manages its data sovereignly while adhering to shared standards.
This intermediate tutorial guides you through implementing these four patterns with a concrete e-commerce example featuring Products, Orders, and Users domains. We use Postgres for domains, FastAPI for data products, Kafka for inter-domain events, and YAML configs for self-serve infra. The result: a scalable, observable, and maintainable system. By the end, you'll have a working prototype in 30 minutes, ready for Kubernetes.
Why 2026? Mature tools like dbt Cloud and Confluent Schema Registry now natively support these patterns, accelerating enterprise adoption.
Prerequisites
- Docker and Docker Compose installed
- Python 3.11+ with pip
- Basic knowledge of SQL, Python, and YAML
- Postgres 15+ (via Docker)
- Kafka (via Docker Compose)
- Estimated time: 45 minutes
1. Set Up Data Domains with Postgres Schemas
CREATE SCHEMA IF NOT EXISTS products;
CREATE TABLE products.product (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10,2) NOT NULL,
stock INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE SCHEMA IF NOT EXISTS orders;
CREATE TABLE orders.order (
id SERIAL PRIMARY KEY,
product_id INTEGER REFERENCES products.product(id),
user_id INTEGER,
quantity INTEGER NOT NULL,
total DECIMAL(10,2),
ordered_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
INSERT INTO products.product (name, price, stock) VALUES
('Laptop Pro', 1299.99, 50),
('Souris Ergonomique', 29.99, 200);This SQL script creates separate Postgres schemas per domain (products, orders), embodying the domain ownership pattern. Each domain has its own tables with foreign keys for inter-domain relationships. Run it with docker run -e POSTGRES_PASSWORD=pass postgres -v $(pwd)/init.sql:/docker-entrypoint-initdb.d/init.sql postgres:15. Pitfall: avoid shared schemas to prevent tight coupling.
Domain Ownership Pattern Explained
The first pattern assigns data ownership to domain-oriented teams. Think of autonomous 'data countries': Products handles catalog and stock, Orders manages transactions. Postgres schemas isolate data, enabling horizontal scalability. Benefit: business teams become data producers, reducing reliance on a central data team.
2. Build a Data Product API for the Products Domain
from fastapi import FastAPI, Depends, HTTPException
from sqlalchemy import create_engine, text
from sqlalchemy.orm import sessionmaker
app = FastAPI(title="Products Data Product")
DATABASE_URL = "postgresql://postgres:pass@localhost:5432/postgres"
engine = create_engine(DATABASE_URL)
SessionLocal = sessionmaker(autocommit=False, autoflush=False, bind=engine)
def get_db():
db = SessionLocal()
try:
yield db
finally:
db.close()
@app.get("/products")
def list_products(db=Depends(get_db)):
result = db.execute(text("SELECT id, name, price, stock FROM products.product"))
return [dict(row._mapping) for row in result]
@app.get("/products/{product_id}")
def get_product(product_id: int, db=Depends(get_db)):
result = db.execute(text("SELECT * FROM products.product WHERE id = :id"), {"id": product_id})
row = result.fetchone()
if not row:
raise HTTPException(status_code=404, detail="Product not found")
return dict(row._mapping)
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)This FastAPI app exposes the Products domain as a data product: a consumable REST API with implicit metadata (auto-generated OpenAPI docs). SQLAlchemy connects to the isolated schema. Install with pip install fastapi uvicorn sqlalchemy psycopg2-binary and run python products_api.py. Pitfall: always add pagination (limit/offset) for lists in production.
Data as a Product Pattern in Action
A data product is a stable, documented interface to reliable data, like a data SaaS. Here, the FastAPI API provides CRUD-like access to Products, with auto-validation and Swagger docs at /docs. Consumers (other domains) subscribe without touching raw data, improving discoverability.
3. Implement Inter-Domain Event Streaming with Kafka
version: '3.8'
services:
zookeeper:
image: confluentinc/cp-zookeeper:7.6.0
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
ports:
- "2181:2181"
kafka:
image: confluentinc/cp-kafka:7.6.0
depends_on:
- zookeeper
ports:
- "9092:9092"
environment:
KAFKA_BROKER_ID: 1
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
schema-registry:
image: confluentinc/cp-schema-registry:7.6.0
depends_on:
- kafka
ports:
- "8081:8081"
environment:
SCHEMA_REGISTRY_HOST_NAME: schema-registry
SCHEMA_REGISTRY_KAFKASTORE_BOOTSTRAP_SERVERS: kafka:9092This Docker Compose file deploys Kafka + Zookeeper + Schema Registry for asynchronous events between domains (e.g., Orders notifies Products on stock updates). Run docker compose up -d. Avro schemas ensure governance. Pitfall: set low retention.ms for ephemeral topics.
Self-Serve via Event Streaming
- Domains publish/subscribe to Kafka events without coupling APIs.
- Example: An order triggers an
OrderPlacedevent to Products to decrement stock.
4. Kafka Producer for Orders Events
from kafka import KafkaProducer
import json
producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
value_serializer=lambda v: json.dumps(v).encode('utf-8')
)
# Exemple d'événement inter-domaine
event = {
"event_type": "OrderPlaced",
"order_id": 1,
"product_id": 1,
"quantity": 2,
"timestamp": "2026-01-01T10:00:00Z"
}
producer.send('orders-topic', value=event)
producer.flush()
print("Event sent to Kafka")This Python producer publishes events from the Orders domain to Kafka, decoupling domains. Install with pip install kafka-python. Integrate it into your Orders API. Pitfall: use Schema Registry to validate Avro schemas and avoid breaking changes.
5. Self-Serve Infrastructure with Basic Helm Chart
domain: products
replicaCount: 2
image:
repository: yourrepo/products-api
tag: "v1.0.0"
service:
type: ClusterIP
port: 8000
postgres:
schema: products
host: postgres-rw
dbname: postgres
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
kafka:
bootstrapServers: kafka:9092
topic: products-eventsThis values.yaml for a Helm template lets domain teams deploy their data products self-service on Kubernetes. Run helm install products ./chart -f values-domain.yaml. Customize per domain without rewriting the chart. Pitfall: use namespaces per domain for isolation.
Self-Serve Data Infrastructure Pattern
Self-serve infra is a golden path platform: Helm templates, GitOps CI/CD (ArgoCD), and catalogs (Backstage). Teams clone a chart, tweak values.yaml, and deploy without involving the ops team.
6. Federated Governance with OPA Config
package data_mesh.governance
import future.keywords
default allow := false
# Règle : schémas doivent inclure created_at
allow {
input.op == "create"
schema := input.schema
has_field(schema, "created_at")
schema.created_at == "TIMESTAMP DEFAULT CURRENT_TIMESTAMP"
}
# Règle : topics Kafka par domaine
allow {
input.op == "create_topic"
starts_with(input.topic, input.domain + "-")
}
has_field(schema, field) {
schema[field]
}OPA (Open Policy Agent) enforces federated rules: validating SQL schemas and Kafka topics. Integrate into CI pipelines. Run opa test .. Pitfall: thoroughly test policies to avoid false positives.
Best Practices
- Domain Observability: Integrate Prometheus/Grafana metrics specific to each domain (e.g., queries/sec per schema).
- Centralized Metadata: Use a Data Catalog (Amundsen) for data product discoverability.
- Data Product Versioning: APIs v1/v2, backward-compatible schema evolution.
- Contract Testing: Pact for APIs, schema evolution tests for Kafka.
- Security: RBAC per domain with Keycloak, at-rest encryption for Postgres.
Common Pitfalls to Avoid
- Re-centralizing: Don't recreate a data lake; enforce strict ownership.
- Skipping Governance: Without standards (e.g., common schemas), semantic chaos ensues.
- Events Without Routing: Use Kafka Streams for inter-domain projections.
- Self-Serve Without Guardrails: Always validate templates via CI/CD.
Next Steps
- Foundational book: Data Mesh
- Advanced tools: dbt for transformations, Collibra for governance.
- Deploy on Kubernetes with ArgoCD.
- Check out our Data Engineering trainings at Learni for enterprise Data Mesh masterclasses.