Skip to content
Learni
View all tutorials
DevOps

How to Implement Opsgenie with Terraform in 2026

Lire en français

Introduction

Opsgenie, Atlassian's incident management solution, shines in advanced SRE environments thanks to its REST APIs and official Terraform provider. In 2026, with IaC maturity, provisioning teams, alert policies, integrations (Prometheus, Slack), and heartbeats via code is standard for scaling without manual errors.

This advanced tutorial guides you step by step through implementing a complete Opsgenie setup: from Terraform initialization to production-ready resources. Think of it like an electrical circuit—the provider is the power source, and resources are interconnected components. The result? A reproducible, auditable, version-controlled stack that cuts MTTR by 40% in production. Ideal for teams handling 100+ alerts per day. Ready to turn your on-call rotations into a well-oiled machine? (128 words)

Prerequisites

  • Opsgenie Enterprise account (trial OK for testing)
  • Terraform 1.9+ installed
  • Opsgenie API token (generated via UI > Settings > API Key Integrations)
  • Advanced knowledge of HCL/Terraform and REST APIs
  • Tools: curl, Git for versioning

Install the Opsgenie Terraform Provider

main.tf
terraform {
  required_providers {
    opsgenie = {
      source  = "registry.terraform.io/atlassian/opsgenie"
      version = "~> 0.4.0"
    }
  }
}

provider "opsgenie" {
  api_key = var.opsgenie_api_key
}

variable "opsgenie_api_key" {
  type        = string
  description = "API key Opsgenie"
  sensitive   = true
}

variable "team_name" {
  type    = string
  default = "SRE-Team"
}

This block initializes the Atlassian Opsgenie provider using the stable 2026 version. The api_key variable is marked sensitive for CI/CD security. Set it via terraform.tfvars or the TF_VAR_opsgenie_api_key environment variable. Avoid hardcoding: leaks expose all your resources.

Provision an SRE Team

Why start with a team? Opsgenie routes alerts through teams, acting as a central hub. After provisioning, assign users and schedules.

Create the Team and a User

team.tf
resource "opsgenie_team" "sre_team" {
  name        = var.team_name
  description = "Équipe SRE pour incidents critiques"
  members {
    id = opsgenie_user.sre_oncall.id
  }
}

resource "opsgenie_user" "sre_oncall" {
  username   = "oncall@example.com"
  full_name  = "SRE On-Call"
  role       = "owner"
  source     = "email"
}

The SRE team includes an on-call user via dynamic ID reference. The 'owner' role grants admin rights. Pitfall: without depends_on, creation order can fail; Terraform handles it via implicit references, but always test with plan.

Configure a Prometheus Integration

Integrations capture external alerts. Prometheus pushes via Opsgenie webhook: filter by severity to avoid alert fatigue.

Add a Generic API Integration

integration.tf
resource "opsgenie_team_integration" "prometheus" {
  team_id = opsgenie_team.sre_team.id
  type    = "Prometheus"
  name    = "Prometheus Alerts"
  is_enabled = true

  prometheus_config {
    endpoint {
      method = "POST"
      uri    = "/v2/alerts/prometheus"
    }
  }

  filter {
    type  = "match-all"
    order = 1
  }
}

Native Prometheus integration with v2 endpoint for rich alerts (labels, annotations). The 'match-all' filter routes everything; refine with conditions for production. Ensure is_enabled is set to activate without downtime.

Trigger an Alert via API

create-alert.sh
#!/bin/bash
API_KEY="your-api-key-here"
TEAM_ID="$(terraform output team_id)"

curl -X POST "https://api.opsgenie.com/v2/alerts" \
  -H "Authorization: GenieKey $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "CPU > 90% sur prod-app",
    "alias": "cpu-high-prod",
    "description": "Métrique Prometheus: cpu_usage=95%",
    "teams": ["'$TEAM_ID'"],
    "priority": "P2",
    "tags": ["prometheus", "cpu"],
    "details": {
      "host": "app-01",
      "value": 95
    }
  }'

Functional Bash script for testing: routes alerts to the Terraform-provisioned team. Use alias for idempotency (dedupes alerts). Replace variables; in production, integrate with Prometheus webhook. Pitfall: without teams, the alert becomes orphaned.

Define an Escalation Policy

Advanced policies: Round-robin across schedules with conditional delays. Analogy: like a relay race, passing the baton if unresolved.

Alert Policy with Escalations

policy.tf
resource "opsgenie_notification_policy" "sre_policy" {
  team_id     = opsgenie_team.sre_team.id
  name        = "SRE Escalation Policy"
  description = "Escalade P1 après 5min"
  is_default  = false

  policy_v2_rule {
    type     = "source"
    key      = "teams"
    is_block = false
    value_v2 {
      type  = "contains"
      value = var.team_name
    }
  }

  notify {
    id            = opsgenie_notification_rule.escalation.id
    delay_in_min  = 5
    condition     = "If not acknowledged"
  }
}

resource "opsgenie_notification_rule" "escalation" {
  name      = "Escalade On-Call"
  direction = "RR"
  notify {
    id   = opsgenie_user.sre_oncall.id
    type = "users"
  }
}

V2 policy filters by team, notifies via round-robin after a delay. The condition prevents unnecessary escalations. Complexity: avoid circular dependencies with explicit references; use iterative terraform apply.

Configure a Heartbeat

heartbeat.tf
resource "opsgenie_heartbeat" "sre_heartbeat" {
  name        = "SRE Prod Heartbeat"
  description = "Vérifie connectivité Prometheus->Opsgenie"
  enabled     = true
  interval    = 60
  grace       = 300
  tags        = ["sre", "monitoring"]

  teams = [opsgenie_team.sre_team.id]

  alert_message = "Heartbeat manquant : SRE Prod down ?"
  alert_priority = "P3"
}

Heartbeat pings every 60 seconds, alerting if silent for >5 minutes (grace period). Links to the team for auto-escalation. In production, use a cron script to call /v2/heartbeats/{name}/ping; Terraform manages the lifecycle.

Apply and Outputs

Run terraform init && plan && apply. Useful outputs: team_id, integration_id. Version in Git, use Atlantis for CI/CD.

Outputs and Apply Script

outputs.tf
output "team_id" {
  value = opsgenie_team.sre_team.id
}

output "policy_id" {
  value = opsgenie_notification_policy.sre_policy.id
}

output "heartbeat_name" {
  value = opsgenie_heartbeat.sre_heartbeat.name
}

Outputs expose IDs for scripts and orchestration. Keep sensitive for GitHub Actions. After apply: use terraform output -json for pipelines.

Best Practices

  • Modular IaC: Split into modules.tf for reusable teams/policies.
  • Secrets management: Use Vault/Terraform Cloud, never commit tfvars to repo.
  • State backend: S3 + DynamoDB for multi-dev collaboration.
  • Testing: terratest to validate post-apply alerts.
  • Drift detection: Weekly plan via cron to catch UI changes.

Common Errors to Avoid

  • Expired API key: Regenerate via UI (not retroactive); run terraform refresh.
  • Rate limits: 100 req/min; batch with count/for_each.
  • Policy V1 vs V2: V2 required in 2026; migrate legacy setups.
  • No depends_on: Add explicitly for users→teams.

Next Steps

Dive deeper with the Terraform Registry Opsgenie. Integrate hybrid PagerDuty setups. Check our Learni DevOps training courses for SRE certification. Read Atlassian 2026 docs on advanced webhooks.

How to Implement Opsgenie with Terraform in 2026 | Learni