Introduction
Grafana is the go-to tool for visualizing observability data in 2026, integrating metrics (Prometheus), logs (Loki), and traces (Tempo). This expert tutorial walks you through deploying a complete stack via Docker Compose, with automatic provisioning of datasources, dashboards, and alerting rules. Why does it matter? In a DevOps world where distributed systems generate petabytes of data, unified observability cuts MTTR by 50% according to CNCF benchmarks. We start with a minimal setup and build to production-ready: authentication, SSL, horizontal scaling, and advanced PromQL/LogQL queries. Every step includes functional, copy-paste code. By the end, you'll have a multi-tenant dashboard ready for any senior SRE.
Prerequisites
- Docker 27+ and Docker Compose 2.29+
- Advanced knowledge of YAML, JSON, and PromQL/LogQL
- Linux server with at least 4 GB RAM (for local testing)
- Root access for ports < 1024
- Git for optional example cloning
Create the stack's docker-compose.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:v2.54.1
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
ports:
- "9090:9090"
restart: unless-stopped
loki:
image: grafana/loki:3.1.1
ports:
- "3100:3100"
command: -config.file=/etc/loki/local-config.yaml
restart: unless-stopped
grafana:
image: grafana/grafana:11.1.0
user: "472:472"
volumes:
- grafana_data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=supersecret
- GF_USERS_ALLOW_SIGN_UP=false
ports:
- "3000:3000"
depends_on:
- prometheus
- loki
restart: unless-stopped
volumes:
prometheus_data: {}
grafana_data: {}This docker-compose.yml orchestrates Prometheus for metrics, Loki for logs, and Grafana for visualization. Volumes persist data; Prometheus's advanced commands optimize retention (200h). Avoid pitfalls: always specify 'user: 472:472' for Grafana to prevent root permission issues on host volumes.
Launch the Stack and Verify
Save the file and run docker compose up -d. Check logs with docker compose logs -f. Access Grafana at http://localhost:3000 (admin/supersecret). Prometheus scrapes its own metrics; Loki is ready for ingests. This foundation scales easily to Kubernetes.
Configure prometheus.yml for Scraping
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alert.rules.yml"
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'grafana'
static_configs:
- targets: ['grafana:3000']
metrics_path: /metrics
scrape_interval: 30s
- job_name: 'loki'
static_configs:
- targets: ['loki:3100']
metrics_path: /metrics
scrape_interval: 30sThis file sets up Prometheus to scrape its own metrics, plus those from Grafana and Loki. The 'grafana' and 'loki' jobs expose /metrics for self-describing observability. Common pitfall: forgetting 'metrics_path' for non-standard endpoints; test with curl localhost:9090/targets.
Provision the Prometheus Datasource
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
orgId: 1
url: http://prometheus:9090
basicAuth: false
isDefault: true
editable: true
jsonData:
exemplarTraceIdDestinations:
- name: traceIdLabel
value: traceID
httpMethod: POST
secureJsonData:
httpHeaderValue1: ''This YAML automatically provisions the Prometheus datasource in Grafana on startup. 'access: proxy' enables secure server-side queries; 'isDefault: true' makes it the priority. Avoid manual edits after provisioning: restart the container to reapply.
Provision the Loki Datasource
apiVersion: 1
datasources:
- name: Loki
type: loki
access: proxy
orgId: 1
url: http://loki:3100
basicAuth: false
isDefault: false
editable: true
jsonData:
maxLines: 1000Provisions Loki for logs, with 'maxLines: 1000' to limit results for better performance. Use 'proxy' to hide internal IPs. Pitfall: without the correct provisioning/ folder, Grafana ignores files; check logs for 'Provisioning datasource'.
Create an Advanced Dashboard
Datasources are now live. Import a JSON dashboard via API or UI. This expert template combines PromQL heatmaps and Loki Explore, with variables for multi-tenancy.
Dashboard JSON for Metrics and Logs
{"__inputs":[{"name":"DS_PROMETHEUS","label":"Prometheus","description":"","type":"datasource","pluginId":"prometheus","pluginName":"Prometheus"}],"dashboard":{"id":null,"title":"Stack Observabilité","tags":["observability"],"timezone":"browser","panels":[{"id":1,"title":"CPU Usage","type":"stat","targets":[{"expr":"rate(process_cpu_seconds_total[5m])","legendFormat":"CPU"}],"fieldConfig":{"defaults":{"color":{"mode":"thresholds"},"thresholds":{"steps":[{"color":"green","value":null},{"color":"yellow","value":0.7},{"color":"red","value":0.9}]}}},"gridPos":{"h":8,"w":12,"x":0,"y":0}},{"id":2,"title":"Logs Errors","type":"logs","datasource":{"type":"loki","uid":"Loki"},"targets":[{"expr":"{job=~\"grafana|prometheus\"} |= \"error\""}],"gridPos":{"h":8,"w":12,"x":12,"y":0}}],"time":{"from":"now-6h","to":"now"},"timepicker":{},"refresh":"30s","schemaVersion":39,"version":1,"links":[]}}This provisionable JSON dashboard includes a Stat panel for Prometheus CPU (with color thresholds) and Logs for Loki errors. DS_PROMETHEUS variables enable reusability. Import via curl -X POST -H 'Content-Type: application/json' http://localhost:3000/api/dashboards/db or provisioning/dashboards/*.json folder.
Prometheus Alerting Rule
groups:
- name: stack_alerts
rules:
- alert: HighCPU
expr: rate(process_cpu_seconds_total[5m]) > 0.8
for: 2m
labels:
severity: critical
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU usage is {{ $value }} above 80%"
- alert: GrafanaDown
expr: up{job="grafana"} == 0
for: 1m
labels:
severity: warning
annotations:
summary: "Grafana instance down"Alerting rules for high CPU and Grafana downtime, evaluated every 15s. 'for: 2m' prevents false positives. Link to AlertManager (port 9093) for Slack/Teams notifications; reload Prometheus via /-/reload.
Bash Script to Generate Test Logs
#!/bin/bash
for i in {1..100}; do
TIMESTAMP=$(date +%s)
if [ $((i % 10)) -eq 0 ]; then
curl -H "Content-Type: application/json" -X POST -d '{"log":"ERROR: Simulated failure at $TIMESTAMP"}' "http://localhost:3100/loki/api/v1/push" &
else
curl -H "Content-Type: application/json" -X POST -d '{"log":"INFO: Normal operation $i"}' "http://localhost:3100/loki/api/v1/push" &
fi
done
wait
echo "Logs générés. Vérifiez dans Grafana Explore > Loki."This script simulates 100 logs (10% errors) to Loki via Promtail-like push API. Run chmod +x generate-logs.sh && ./generate-logs.sh. Pitfall: Loki requires structured JSON format; test query {job=~"grafana"} |= "error" in Explore.
Best Practices
- Always provision datasources/dashboards for reproducibility (CI/CD).
- Use dashboard variables ($__rate_interval) for adaptive queries.
- Secure with GF_AUTH_ANONYMOUS_DISABLED=true and OAuth (GitHub/Keycloak).
- Scale Grafana in HA with PostgreSQL backend and 3+ replicas.
- Monitor Grafana itself via /metrics and sidecar exporter.
Common Errors to Avoid
- Duplicate ports: Check netstat before up; use 3001 if 3000 is taken.
- Non-persistent volumes: Without volumes:, data is lost on restart.
- Invalid PromQL queries: Test first in Prometheus UI (/graph).
- Grafana auth: Change admin/supersecret immediately via UI or env.
Next Steps
Dive deeper with Tempo for traces (add to docker-compose). Explore plugins like 'grafana-clock' or 'business-intelligence'. Check the official Grafana docs and our Learni observability training. Migrate to Kubernetes with Helm charts for production.