Introduction
A data warehouse centralizes your company's data for analysis. Unlike a transactional database, it is optimized for complex queries and historical tracking. This tutorial guides you step by step to create a simple data warehouse with PostgreSQL in 2026.
Prerequisites
- PostgreSQL 16+
- Basic SQL knowledge
- Python 3.11+
- Docker Desktop installed
Launch PostgreSQL with Docker
docker run --name dw-postgres -e POSTGRES_PASSWORD=secret -p 5432:5432 -d postgres:16This command starts a persistent PostgreSQL container. Port 5432 is exposed for local connections.
Create the Database and Star Schema
CREATE DATABASE datawarehouse;
\c datawarehouse;
CREATE TABLE dim_client (
client_id SERIAL PRIMARY KEY,
nom VARCHAR(100),
ville VARCHAR(50)
);
CREATE TABLE dim_produit (
produit_id SERIAL PRIMARY KEY,
nom VARCHAR(100),
categorie VARCHAR(50)
);
CREATE TABLE fact_ventes (
vente_id SERIAL PRIMARY KEY,
client_id INT REFERENCES dim_client(client_id),
produit_id INT REFERENCES dim_produit(produit_id),
montant DECIMAL(10,2),
date_vente DATE
);We create a classic star schema structure: two dimension tables and one fact table. This optimizes analytical joins.
Simple Python ETL Script
import psycopg2
conn = psycopg2.connect("host=localhost dbname=datawarehouse user=postgres password=secret")
cur = conn.cursor()
cur.execute("INSERT INTO dim_client (nom, ville) VALUES ('Jean Dupont', 'Paris');")
cur.execute("INSERT INTO dim_produit (nom, categorie) VALUES ('Laptop', 'Informatique');")
cur.execute("INSERT INTO fact_ventes (client_id, produit_id, montant, date_vente) VALUES (1, 1, 999.99, '2026-01-15');")
conn.commit()
cur.close()
conn.close()This script inserts test data into the tables. In production, replace the values with data extracted from external sources.
Analysis Query
SELECT
dc.ville,
dp.categorie,
SUM(fv.montant) AS total_ventes
FROM fact_ventes fv
JOIN dim_client dc ON fv.client_id = dc.client_id
JOIN dim_produit dp ON fv.produit_id = dp.produit_id
GROUP BY dc.ville, dp.categorie
ORDER BY total_ventes DESC;This query aggregates sales by city and category. It leverages the star schema for optimal performance.
Configure the .env File
DB_HOST=localhost
DB_PORT=5432
DB_NAME=datawarehouse
DB_USER=postgres
DB_PASSWORD=secretStore connection credentials in a .env file to avoid hardcoding them in source code.
Best Practices
- Always use surrogate primary keys
- Separate raw data (staging) from modeled data
- Document every table and column
- Automate ETL loads with tools like Airflow
Common Mistakes to Avoid
- Forgetting indexes on foreign keys
- Loading data without prior cleaning
- Using overly generic data types (TEXT instead of VARCHAR)
- Ignoring date and timezone management
Going Further
Check out our Learni courses on data engineering and advanced data warehouses.