How to Master Advanced SQL Queries in 2026

Introduction

In a world exploding with data, advanced SQL queries are essential for extracting valuable insights without overloading your database. This intermediate tutorial guides you through mastering complex joins, CTEs (Common Table Expressions), aggregations with GROUP BY/HAVING, subqueries, and window functions.

Why is this crucial in 2026? Modern databases like PostgreSQL or MySQL handle terabytes, but a poorly optimized query can cripple your app. Imagine analyzing e-commerce sales: top customers, average basket sizes, product trends... We use a realistic sample database (customers, orders, products).

By the end, you'll write scalable, testable queries. Ready to level up from junior to SQL pro? (128 words)

Prerequisites

Basic SQL knowledge (SELECT, WHERE, INSERT/UPDATE/DELETE)
A DBMS installed: PostgreSQL 16+ or MySQL 8+ (recommended: Docker for testing)
Client tool: pgAdmin, DBeaver, or VS Code with SQL extension
Estimated time: 20 minutes

1. Create the sample database

init.sql

DROP DATABASE IF EXISTS ecommerce;
CREATE DATABASE ecommerce;

\c ecommerce;

CREATE TABLE customers (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  name VARCHAR(100) NOT NULL,
  price DECIMAL(10,2) NOT NULL
);

CREATE TABLE orders (
  id SERIAL PRIMARY KEY,
  customer_id INTEGER REFERENCES customers(id),
  order_date DATE NOT NULL,
  total DECIMAL(10,2) NOT NULL
);

CREATE TABLE order_items (
  id SERIAL PRIMARY KEY,
  order_id INTEGER REFERENCES orders(id),
  product_id INTEGER REFERENCES products(id),
  quantity INTEGER NOT NULL
);

-- Insert sample data
INSERT INTO customers (name, email) VALUES
('Alice Dupont', 'alice@ex.com'),
('Bob Martin', 'bob@ex.com'),
('Charlie Roux', 'charlie@ex.com'),
('Diana Lefevre', 'diana@ex.com');

INSERT INTO products (name, price) VALUES
('iPhone 15', 999.99),
('MacBook Pro', 2499.99),
('AirPods', 199.99),
('iPad Air', 599.99),
('Apple Watch', 399.99);

INSERT INTO orders (customer_id, order_date, total) VALUES
(1, '2026-01-15', 1199.98),
(1, '2026-02-20', 2699.98),
(2, '2026-01-10', 399.98),
(2, '2026-03-05', 999.99),
(3, '2026-02-28', 599.99),
(4, '2026-01-25', 1999.98); 

INSERT INTO order_items (order_id, product_id, quantity) VALUES
(1, 1, 1),
(1, 3, 1),
(2, 2, 1),
(2, 3, 1),
(3, 1, 2),
(4, 5, 1),
(5, 4, 1),
(6, 2, 1),
(6, 1, 1);

This script creates a complete e-commerce database with 4 tables linked by foreign keys. It includes 4 customers, 5 products, and 6 orders with line items. Run it in PostgreSQL to test all the following queries. Note: Adapt DROP/CREATE for MySQL (use USE instead of \c).

Understanding the data structure

The tables form a star-like schema: customers and products at the center, connected via orders and order_items. This simulates a real e-commerce setup. Each subsequent query leverages these relationships for realistic analyses, like revenue per customer or top-selling products.

2. Multiple joins for order details

jointures.sql

SELECT 
  c.name AS client,
  o.order_date,
  o.total,
  p.name AS produit,
  oi.quantity,
  (p.price * oi.quantity) AS ligne_total
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
ORDER BY o.order_date DESC, c.name;

This INNER JOIN query links 4 tables to list all order line items with totals. Use INNER to exclude NULLs; LEFT JOIN for optional data. Pitfall: Without indexes on foreign keys, it slows down on large volumes—add them in production.

Moving to aggregations

Joins lay the foundation; now aggregate for KPIs: total revenue, average basket, top products. GROUP BY segments data, HAVING filters groups.

3. Aggregations with GROUP BY and HAVING

agregations.sql

SELECT 
  c.name AS client,
  COUNT(o.id) AS nb_commandes,
  SUM(o.total) AS ca_total,
  AVG(o.total) AS panier_moyen
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
HAVING COUNT(o.id) >= 1 AND SUM(o.total) > 1000
ORDER BY ca_total DESC;

Aggregates by customer: COUNT/SUM/AVG on orders. LEFT JOIN includes customers without orders. HAVING filters post-group (vs. WHERE pre-group). Result: VIP customers. Pitfall: Forgetting GROUP BY on non-aggregated columns causes an error.

4. Subqueries for top products

sous-requetes.sql

SELECT 
  p.name AS produit,
  SUM(oi.quantity) AS qte_totale,
  SUM(p.price * oi.quantity) AS ca_produit
FROM products p
JOIN order_items oi ON p.id = oi.product_id
GROUP BY p.id, p.name
HAVING SUM(oi.quantity) > (
  SELECT AVG(qte_totale)
  FROM (
    SELECT SUM(quantity) AS qte_totale
    FROM order_items
    GROUP BY product_id
  ) AS moyennes
)
ORDER BY ca_produit DESC;

Subquery calculates average quantities per product; HAVING compares against it. Great for dynamic benchmarks. Pitfall: Correlated subqueries (with outer references) don't scale well—prefer CTEs for complexity.

Introduction to CTEs and window functions

CTEs (WITH) make queries readable like modular code. Window functions (OVER) compute rankings and running totals without sub-grouping, perfect for analytics.

5. CTEs for modular analysis

cte.sql

WITH ventes_par_client AS (
  SELECT 
    c.name,
    SUM(o.total) AS ca
  FROM customers c
  JOIN orders o ON c.id = o.customer_id
  GROUP BY c.id, c.name
),
ventes_par_produit AS (
  SELECT 
    p.name AS produit,
    SUM(p.price * oi.quantity) AS ca_prod
  FROM products p
  JOIN order_items oi ON p.id = oi.product_id
  GROUP BY p.id, p.name
)
SELECT 
  vpc.name AS top_client,
  vpc.ca,
  vpp.produit AS top_produit,
  vpp.ca_prod
FROM ventes_par_client vpc
CROSS JOIN ventes_par_produit vpp
ORDER BY vpc.ca DESC, vpp.ca_prod DESC
LIMIT 2;

CTEs separate logic: one for customers, one for products; CROSS JOIN combines tops. Reusable and indexable in Postgres. Pitfall: Materialized CTEs (Postgres) consume RAM on very large sets.

6. Window functions for rankings

window.sql

SELECT 
  c.name AS client,
  o.order_date,
  o.total,
  SUM(o.total) OVER (PARTITION BY c.id ORDER BY o.order_date) AS ca_cumulatif,
  RANK() OVER (PARTITION BY DATE_TRUNC('month', o.order_date) ORDER BY o.total DESC) AS rang_mensuel,
  AVG(o.total) OVER (PARTITION BY c.id) AS panier_moyen_client
FROM customers c
JOIN orders o ON c.id = o.customer_id
ORDER BY c.name, o.order_date;

OVER PARTITION segments (by customer), ORDER for cumulative/rank. RANK() handles ties. Ideal for dashboards. Pitfall: Without a frame clause (ROWS/RANGE), the default is unbounded—specify for precision.

Best practices

Always index foreign keys and WHERE/GROUP BY columns (use EXPLAIN ANALYZE to check).
Use CTEs for >3 subqueries: better readability + modern optimizations.
Limit results (LIMIT/OFFSET) and paginate in APIs.
Validate types: Use CAST for precise aggregations.
Test performance: pgBadger or EXPLAIN for bottlenecks.

Common errors to avoid

Forgetting GROUP BY on non-aggregated columns: syntax error.
Confusing WHERE/HAVING: WHERE filters rows, HAVING filters groups.
Slow non-correlated subqueries: Refactor to JOIN/CTE.
Window functions without PARTITION: Unexpected global results.

Next steps

Master optimization with PostgreSQL Advanced. Explore ClickHouse for big data. Check out our Learni database training for expert SQL and NoSQL.