Introduction
DORA metrics remain the gold standard in 2026 for evaluating DevOps performance. They help correlate delivery speed with system stability. This tutorial guides you through an expert implementation including real-time collection, distributed computation, and advanced visualization. You will learn to instrument your CI/CD pipelines and observability systems to obtain reliable, actionable data.
Prerequisites
- Kubernetes 1.29+ or equivalent with Prometheus
- GitLab CI or GitHub Actions
- TimescaleDB or ClickHouse database
- Strong knowledge of observability and scripting
- Access to deployment logs and incidents
Deployment Collection
#!/bin/bash
set -e
DEPLOY_TIME=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
COMMIT_SHA=$(git rev-parse HEAD)
SERVICE_NAME="api-gateway"
echo "{\"timestamp\": \"$DEPLOY_TIME\", \"commit\": \"$COMMIT_SHA\", \"service\": \"$SERVICE_NAME\", \"status\": \"success\"}" | curl -X POST http://metrics-collector:8080/deployments -d @-This script sends metadata for each successful deployment to a centralized collector. It captures the UTC timestamp and commit SHA to enable precise lead time and frequency calculations.
Deployment Frequency Calculation
from datetime import datetime, timedelta
import psycopg2
def calculate_deployment_frequency(service: str, days: int = 30):
conn = psycopg2.connect("dbname=metrics user=metrics")
cur = conn.cursor()
cur.execute("""SELECT COUNT(*) FROM deployments
WHERE service = %s AND timestamp > %s""",
(service, datetime.utcnow() - timedelta(days=days)))
count = cur.fetchone()[0]
return count / days # deployments per dayThis function calculates the average daily frequency over 30 days. It queries the deployments database directly for precise, historical measurements.
Lead Time Measurement
SELECT
service,
PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY EXTRACT(EPOCH FROM (deploy_time - commit_time))/3600) AS p95_lead_time_hours
FROM deployments
WHERE deploy_time > NOW() - INTERVAL '30 days'
GROUP BY service;This SQL query calculates lead time at the 95th percentile. It measures the time between commit and production deployment to identify bottlenecks.
MTTR Instrumentation
apiVersion: v1
kind: ConfigMap
metadata:
name: dora-mttr-config
data:
rules.yaml: |
- alert: ServiceDown
expr: up{service="api-gateway"} == 0
annotations:
incident_start: "{{ $value }}"
- alert: ServiceRestored
expr: up{service="api-gateway"} == 1
annotations:
incident_end: "{{ $value }}"This Prometheus configuration automatically detects incidents and their resolution. The timestamps are then used to calculate Time to Restore Service.
DORA Grafana Dashboard
{
"dashboard": {
"title": "DORA Metrics 2026",
"panels": [
{
"title": "Deployment Frequency",
"targets": [{"expr": "sum(rate(deployments_total[24h]))"}]
},
{
"title": "Change Failure Rate",
"targets": [{"expr": "sum(failed_deployments)/sum(total_deployments)"}]
]
}
}This JSON file defines a ready-to-use Grafana dashboard. It displays the four DORA metrics with optimized PromQL queries for real-time viewing.
Best Practices
- Always store raw data with a minimum 90-day retention
- Use percentiles rather than averages for lead time and MTTR
- Separate environments (prod vs staging) in calculations
- Automate alerts when metrics fall below elite thresholds
- Correlate DORA metrics with business objectives
Common Mistakes
- Calculating frequency only on manual deployments and ignoring automated pipelines
- Forgetting to filter hotfix deployments in change failure rate
- Using local timestamps instead of UTC for international comparisons
- Not versioning metric calculation scripts
Further Reading
Explore our advanced training on observability and DevOps excellence: https://learni-group.com/formations. You will learn how to build scalable internal metrics platforms.