supaguardsupaguardDocs
Integrations

PagerDuty Integration: On-Call Alerts for Critical Failures

Connect supaguard to PagerDuty for on-call incident alerting. Set up escalation policies to page engineers only for critical synthetic monitoring failures.

Connect supaguard to PagerDuty to alert your on-call engineers when critical checks fail. This guide covers integration setup, escalation best practices, and reducing alert fatigue.

When to Use PagerDuty vs Slack

ScenarioChannel
Critical outage (site down, checkout broken)PagerDuty
Performance degradationSlack
Informational updatesSlack or Email
Recovery notificationsSlack

PagerDuty should wake people up. Slack should inform. Use both together for effective incident response.

Quick Setup

Step 1: Create a supaguard Service in PagerDuty

  1. Log in to PagerDuty
  2. Go to ServicesService Directory
  3. Click + New Service
  4. Configure the service:
    • Name: "supaguard Synthetic Monitoring"
    • Description: "Alerts from supaguard synthetic checks"
    • Escalation Policy: Select your existing policy or create new
  5. Click Next
  6. On Integrations, select Events API V2
  7. Click Create Service
  8. Copy the Integration Key (also called Routing Key)

Your integration key looks like:

a1b2c3d4e5f6g7h8i9j0k1l2m3n4o5p6

Step 2: Add PagerDuty to supaguard

  1. Go to supaguard dashboard → SettingsCommunications
  2. Click Add Channel
  3. Select PagerDuty
  4. Enter your integration key
  5. Name it (e.g., "On-Call Alerts")
  6. Click Save

Step 3: Create an Alert Policy for Critical Failures

  1. Go to Alert PoliciesCreate Policy
  2. Configure:
    • Name: "Critical - Page On-Call"
    • Trigger: On failure
    • Severity Filter: Critical only
    • Channels: Select your PagerDuty channel
  3. Click Save

Step 4: Assign to Checks

  1. Edit each check that should page on-call
  2. Go to the Alerting tab
  3. Select your "Critical - Page On-Call" policy
  4. Save

Best Practice: Escalation Flow

Don't page immediately for every failure. Use escalation:

Failure Detected (3:00 AM)

    ├── Immediate: Slack notification (#alerts)

    ├── Wait 5 minutes...

    ├── Still failing? → PagerDuty alert

    └── Recovery: Slack notification + PagerDuty auto-resolve

Implementation in supaguard

Create two alert policies:

Policy 1: Slack Immediate

  • Trigger: On failure
  • Delay: None
  • Channel: Slack

Policy 2: PagerDuty Escalation

  • Trigger: On failure
  • Delay: 5 minutes
  • Severity: Critical only
  • Channel: PagerDuty

Assign both policies to critical checks.

PagerDuty Alert Details

supaguard sends rich incident data:

{
  "routing_key": "your-integration-key",
  "event_action": "trigger",
  "dedup_key": "supaguard-check-abc123",
  "payload": {
    "summary": "CRITICAL: Checkout Flow - site checkout is broken",
    "severity": "critical",
    "source": "supaguard",
    "custom_details": {
      "check_name": "Checkout Flow",
      "url": "https://shop.example.com/checkout",
      "location": "San Francisco",
      "error": "Button 'Pay Now' not clickable",
      "duration_ms": 30000,
      "trace_url": "https://app.supaguard.com/trace/..."
    }
  },
  "links": [{
    "href": "https://app.supaguard.com/checks/abc123",
    "text": "View in supaguard"
  }]
}

Auto-Resolution

When a check recovers, supaguard automatically resolves the PagerDuty incident:

{
  "routing_key": "your-integration-key",
  "event_action": "resolve",
  "dedup_key": "supaguard-check-abc123"
}

This ensures incidents don't stay open after the issue is fixed.

Deduplication

supaguard uses consistent dedup_key values per check. This means:

  • Multiple failures of the same check = one PagerDuty incident
  • No duplicate pages for the same issue
  • Clean incident timeline

PagerDuty Service Configuration Tips

Set Appropriate Urgency

Configure your PagerDuty service urgency:

  • High Urgency: For production-critical checks (checkout, login)
  • Low Urgency: For less critical monitoring (docs, marketing pages)

Configure Intelligent Grouping

Enable PagerDuty's Intelligent Alert Grouping to combine related supaguard alerts during widespread outages.

Set Support Hours

If you don't need 3 AM pages for certain checks, configure support hours on the PagerDuty service or use supaguard's scheduling features.

Multiple PagerDuty Services

You might want different escalation paths:

Check TypePagerDuty Service
Payment flowsPayments On-Call
AuthenticationPlatform On-Call
Marketing siteMarketing Team

Create multiple PagerDuty integrations in supaguard and assign appropriate policies to each check.

Troubleshooting

Incidents Not Creating

  1. Verify integration key — Test with PagerDuty's API directly
  2. Check service status — Ensure the PagerDuty service is active
  3. Verify escalation policy — Must have at least one target

Duplicate Incidents

This shouldn't happen with supaguard's deduplication. If you see duplicates:

  1. Check for multiple alert policies assigned
  2. Verify you're using a single PagerDuty channel

Not Auto-Resolving

Ensure:

  1. The PagerDuty integration supports Events API V2
  2. supaguard recovery notifications are enabled

Security Best Practices

  • Use dedicated service — Don't share with unrelated integrations
  • Restrict integration key access — Only admins should see it
  • Enable PagerDuty audit logs — Track who acknowledges/resolves

On this page