Datadog Research//Q2 2026
The Incident Response Reckoning
Research Report  •  April 2026

The Incident Response
Reckoning.

94% of teams have brought AI into their incident response workflow in some form. But across 103 SREs, platform leaders, and engineering executives, 44% say their IR hasn't kept pace with AI-accelerated delivery. New research on where AI in IR is delivering value today, where it's stalling, and the structural conditions that separate the two.

Sample
103 Practitioners
Field Period
April 2026
Audience
SRE / Platform / Eng. Leadership
Coverage
Cloud-Native & Hybrid Orgs
The Finding

AI is now inside almost every incident workflow we surveyed: 94% of teams are using or trialing it. But adoption is not the same as readiness. 44% of respondents say their IR process hasn't kept pace with AI-accelerated delivery, and the gap is widest in the region most assumed to be ahead: only 28% of AMER respondents say their IR has kept pace, vs. 67% in APAC.

Two structural conditions sit underneath these numbers. Half of teams run observability and incident management on completely separate platforms, and three in four say that disconnect slows their root cause analysis. The downstream cost is a loop that won't close: 93% of teams who clearly answered yes/no to recurrence said the same incident has happened more than once. Practitioners describe wanting AI to help close it, and they describe wanting fewer tools, more automation, and less alert noise to get there.

▸AI in IR Today
94%
are using or trialing AI in their incident response workflow, yet 44% report their IR hasn't kept pace with AI-accelerated delivery.
▸The Repeat Loop
93%
of teams who clearly answered yes/no said the same incident has happened more than once.
▸Business Stakes
79%
have already put a dollar figure on what an outage costs their business.
01
Finding 01 / Infrastructure

The stack AI inherited.

Before AI agents and copilots ever entered the incident workflow, the workflow itself was already a relay race — a handoff across half a dozen tools rather than a single connected pipeline. Half of teams operate observability and incident management on completely separate platforms, and a majority run five or more distinct tools across the incident lifecycle. This is the substrate AI is now being asked to operate on: a stack that looks comprehensive on paper and that 77% of respondents say slows down their root cause analysis when observability and IM are on separate platforms.

▸Platform Architecture

Where Observability Lives vs. Where Incidents Get Managed

Of respondents whose answers directly addressed platform architecture, half run observability and incident management on completely separate platforms.

Separate platformsObservability and IM live apart
51%
Partially integratedSome integration, not full
31%
Same platformUnified observability + IM
18%
Asked verbatim"Do your observability tools (metrics, logs, traces) live in the same platform as your incident management, or are they separate?"
▸Tool Sprawl

How Many Tools Touch a Single Incident

54% use 5 or more tools across the incident lifecycle (28% with 5–6, 15% with 7–9, 11% with 10+).

1–2 tools
22%
3–4 tools
24%
5–6 tools
28%
7–9 tools
15%
10+ tools
11%
Asked verbatim"Walk me through the tools your team uses across the incident lifecycle — from alerting and paging through triage, investigation, and resolution. How many separate tools are involved?"
Our workflow is a 5–7 tool relay race with high manual overhead. Alerting in Datadog or New Relic, paging through PagerDuty, coordination in Slack via incident.io, manual tab-hopping between Splunk and Datadog for investigation, Jira for tracking. The handoffs are where we lose time.
— Manager / Director, SaaS & Technology, APAC
▸The Incident Stack

The Tools Practitioners Actually Reach For

Share of all 103 respondents who named each tool anywhere in their interview.

PagerDuty
47%
Datadog
36%
Slack
34%
Jira
28%
Splunk
15%
Prometheus
13%
Confluence
12%
New Relic
10%
ServiceNow
8%
Opsgenie
7%
Asked verbatim"Walk me through the tools your team uses across the incident lifecycle — from alerting and paging through triage, investigation, and resolution."
▸Stack Anatomy

How the Incident Stack Breaks Down by Function

The same tools mapped to the lifecycle role they play. 56% of teams who articulate their stack span 3 or more functional categories — that's where the boundary-crossing happens during a Sev-1.

Alerting / Paging
82%
Observability
70%
Communication / Chat
61%
Ticketing / Tracking
49%
Postmortem / Docs
21%
Dedicated incident management
10%
Asked verbatim"Walk me through the tools your team uses across the incident lifecycle — from alerting and paging through triage, investigation, and resolution."
▸Key Insight

What fragmentation actually costs is context.

Every tool boundary a responder crosses during an incident is a moment where signal can get lost, attention fractures, and time-to-resolution stretches. 77% of respondents told us separate observability and incident tools slow root cause analysis, and a majority of teams reach for five or more tools across the incident lifecycle. Tool sprawl looks like a budgeting line on the invoice, but it shows up as a latency problem during a Sev-1. As AI moves further inside the incident workflow, the same boundaries that slow human responders are the ones an AI assistant has to bridge.

During a Sev-1, today's reality is usually: alert fires in PagerDuty, metrics live in Datadog, logs live somewhere else, deploy history is in CI/CD, discussion happens in Slack.
— Manager / Director, Financial Services, AMER
02
Finding 02 / Workflow Impact

The fragmentation tax shows up everywhere.

A fragmented stack doesn't stay theoretical for long. Disconnected tools bend response times, cloud judgment in the first 15 minutes, and turn outages into business events. 77% of respondents said separate observability and incident tools slow root cause analysis, and the pattern holds across every region we surveyed.

The dollars aren't hypothetical either: nearly four out of five teams have already quantified what an outage costs them. The cost shows up in MTTR data, in on-call rotations that lean on resilience instead of process, and in a "first 15 minutes" more responders describe as decoded in real time than executed from a runbook. And as AI-driven delivery pushes more change through the system at more teams, the seams in the stack are exactly where that cost is most likely to compound.

▸The Time Penalty

Do Separate Tools Slow Root Cause Analysis?

77% say yes (50% somewhat + 27% significantly). Only 16% claim no impact at all.

27% Sig.
50% Somewhat
16% No
7% Other
Yes, significantly Yes, somewhat No impact Other / not classified
Asked verbatim"Do you feel like having observability data and incident management in separate tools slows down time to root cause?"
▸Outage Math

Have You Quantified What an Outage Costs?

79% have quantified outage cost in dollar terms (38% specific estimates + 41% rough ballpark figures). Leadership now expects a number, not a feeling.

38% Specific
41% Ballpark
21% Not quantified
Specific dollar estimate Rough ballpark figure Have not quantified / N/A
Asked verbatim"Have you ever been able to quantify the cost of an outage in dollar terms, even roughly?"
▸ The Quantum, In Their Own Words

What practitioners said an hour of downtime actually costs them

$150K /hr
VP / C-Level
AMER · payments outage
£50K /hr
Senior IC, SaaS & Technology
EMEA
$50K /30min
Manager / Director, SaaS & Technology
APAC
$10–50K /hr
Senior IC, Financial Services
AMER · core service
Illustrative figures from open-text answers to the outage-cost question. The survey didn't ask respondents to give a per-hour figure, so these aren't a panel average; they're representative of the kind of numbers practitioners volunteered when asked to quantify cost.
▸Where the Pain Lands

What Major Incidents Actually Hurt

When asked open-ended how downtime impacts the business, respondents named these areas.

Revenue loss
56%
Customer trust / experience
56%
Leadership confidence
38%
Reputational / brand
21%
Operations / support load
14%
SLA / contract penalty
6%
Asked verbatim"When a major incident causes downtime, how does that impact the business beyond the engineering team — things like revenue, customer experience, or leadership confidence?"
▸The Human Cost

How Practitioners Describe the On-Call Experience

Only 40% described on-call as cleanly stable or supported. 48% flagged stress, fatigue, or mixed strain (39% mixed: stressful but manageable + 9% negative: morale strain), even where rotations are well-run.

Positive / stable
40%
Mixed: stressful but managed
39%
Negative: morale strain
9%
Asked verbatim"How would you describe the on-call experience at your organization, and what's the impact on your team's morale, retention, and day-to-day productivity?"
▸Regional Lens

The Fragmentation Tax, Region by Region

Every region reports majority slowdown — though the small APAC universe (n=9 classifiable answers) makes that region's specific share less stable.

EMEA · n=39
79%
say separate tools slow root cause analysis
APAC · n=18
72%
say separate tools slow root cause analysis
AMER · n=46
65%
say separate tools slow root cause analysis
Asked verbatim, broken out by region"Do you feel like having observability data and incident management in separate tools slows down time to root cause?"
▸Regional Lens

Who Names Revenue Loss as a Top Business Impact, by Region

EMEA respondents are 16 percentage points more likely than AMER to name revenue loss as a primary business impact of major incidents.

EMEA · n=39
64%
named revenue loss as a top impact
APAC · n=18
61%
named revenue loss as a top impact
AMER · n=46
48%
named revenue loss as a top impact
Asked verbatim, broken out by region"When a major incident causes downtime, how does that impact the business beyond the engineering team?"
▸Key Insight

Incidents have graduated from engineering events to board-level events.

The era of "we'll deal with downtime when it happens" is over. 56% of teams cite revenue loss as a direct impact, 56% point to customer trust, and 38% feel it in leadership confidence. The regional split sharpens the picture: 64% of EMEA respondents and 61% of APAC respondents named revenue loss as a top business impact, vs. 48% in AMER. Outside North America, finance leaders appear to have already wired downtime into the P&L conversation.

Engineering organizations are increasingly expected to put a number on MTTR. And as AI accelerates the rate at which change moves into production, the cost of incidents is positioned to land harder, faster, and on more visible parts of the business than it did even two years ago. That cost shows up first in the seams between every tool a responder has to cross.

It mainly hits customer experience first — users lose trust quickly. If it lasts, it can affect revenue and trigger escalation up to leadership because it becomes a reputational risk. Engineering feels it too, but the real pressure is business confidence and customer retention.
— VP / C-Level, SaaS & Technology, EMEA
Revenue loss up to £50K per hour, customer complaints increase, leadership becomes anxious for the post-mortem.
— Senior IC, SaaS & Technology, EMEA
03
Finding 03 / The AI Gap

Software ships faster. Response is racing to keep up.

AI is no longer a curiosity in incident response: 94% of teams are using or trialing it in some form. But adoption is not the same as readiness. 44% of respondents say their incident response process hasn't kept pace with AI-accelerated software delivery, and the readiness gap is widest in AMER. Only 28% of AMER respondents say their IR has kept pace, vs. 51% in EMEA and 67% in APAC. The pattern flips the typical "North America leads" assumption.

Where respondents do report AI delivering value inside IR, they point first to specific stages: triage and investigation. Those are the early minutes that decide everything. Where they describe AI stalling, the open-text answers cluster around two themes: trust in AI outputs (concerns about hallucinations and verification) and the practical problem of AI agents that can only see part of the picture. Both are easier to address when the data the AI is reasoning over already lives in one place.

▸Adoption Curve

AI Inside the Incident Response Workflow Today

94% are using or trialing AI in their IR workflow (59% actively using + 24% limited / early adoption + 11% piloting / planning). Adoption is near-universal but uneven.

Actively using
59%
Limited / early adoption
24%
Piloting / planning
11%
Not using
6%
Asked verbatim"Has your team adopted any AI or automation in your incident response workflow? Where are you using it, what improvements have you seen, and where are the biggest gaps or barriers to adoption?"
▸The Velocity Gap

Has IR Kept Pace with AI-Accelerated Delivery?

44% report a pace gap between how fast software ships and how fast incidents get resolved (27% somewhat behind + 17% significant gap).

Somewhat behind
27%
Significant gap
17%
Asked verbatim"Do you feel like your incident response processes have kept pace with AI-accelerated software delivery, or is there a widening gap?"
AI tools like copilots have sped up code velocity, leading to 20–30% more deployments weekly. This correlates with a similar rise in incidents — mostly P2/P3 config drifts or integration bugs from AI-generated code lacking edge-case handling.
— VP / C-Level, AMER
▸Where AI Earns Its Keep

The Stages Where AI Delivers the Most Value

Respondents pointed first to investigation and triage: the early minutes that decide everything.

Investigation / RCA
47%
Triage
44%
Postmortem / docs
16%
Alert noise reduction
12%
Asked verbatim"Where do you see the most value for AI in IR specifically — is it more in triage, in investigation, in postmortems, somewhere else?"
▸Regional Lens

Where the AI Velocity Gap Is Widest

APAC reports the highest readiness; AMER the lowest. The pattern flips the typical "North America leads" assumption.

APAC · n=18
67%
say IR has kept pace with AI delivery
EMEA · n=39
51%
say IR has kept pace with AI delivery
AMER · n=46
28%
say IR has kept pace with AI delivery
Asked verbatim, broken out by region"Do you feel like your incident response processes have kept pace with AI-accelerated software delivery, or is there a widening gap?"
▸Key Insight

The AI gap and the data gap are the same gap.

The pace gap between software delivery and incident response is real (44% of respondents report it), and the regional pattern is striking: only 28% of AMER respondents say their IR has kept pace with AI delivery, vs. 51% in EMEA and 67% in APAC. The open-text answers around AI inside IR cluster around two themes: trust in AI outputs (concerns about hallucinations and the need for verification) and the practical limits of AI agents that can only see part of the picture during an incident.

Both barriers point at the same underlying condition: AI in IR works best when the data it reasons over already lives in one place. An assistant that has to stitch context across logs in one tool, traces in another, deploys in a third, and incident state in a fourth is doing harder work than the same assistant with unified context. That's not a panel finding — it's the structural reason buyers in this study are reaching so consistently for native integration.

Biggest concern is unreliable AI outputs — hallucinations in root cause suggestions could escalate incidents, like misfiring rollbacks on bad correlations. We need 99%+ confidence thresholds before autonomous actions.
— VP / C-Level, AMER
04
Finding 04 / What Comes Next

What buyers are reaching for next.

Practitioners aren't waiting for the gap to close itself. Asked the open-ended question, "if you could change one thing about how your organization handles incident response, what would it be?", they overwhelmingly asked for fewer dashboards rather than more. Tool consolidation, automation, and noise reduction came back as the three most-cited wishes. Each of these maps to a different lever for getting more out of the AI teams have already adopted.

Nearly half are already actively evaluating switching their incident management platform. The buying intent is strongest in APAC: 72% of APAC respondents are weighing a switch, vs. 41% in AMER and 38% in EMEA. And among teams already weighing a switch, 80% say native integration with observability is "very important" or a "critical requirement." The follow-through gap reinforces the urgency: postmortem action items don't reliably ship, and incidents repeat — the structural loop teams say they're hoping AI can help them close.

▸The Wishlist

What They'd Change First, In Their Own Words

Top themes from open-ended responses to "what would you change?"

Tool consolidation / fewer tools
24%
More automation
24%
Faster detection / response
17%
Alert noise reduction
16%
Better cross-team comms
9%
Better / more AI
9%
Standardization / process maturity
8%
Postmortem follow-through
7%
Asked verbatim"If you could change one thing about how your organization handles incident response — tooling, process, culture, anything — what would it be and why?"
▸Buying Signal

Are You Considering Switching Your IM Platform?

48% are already evaluating or considering switching alternatives in the next 12 months. Only 43% are firmly satisfied.

43% Satisfied
48% Considering or evaluating
9% Other
Firmly satisfied Considering or actively evaluating Other / not classified
Asked verbatim"How well is your current incident management platform meeting your team's needs, and is your team actively evaluating or considering switching platforms in the next 12 months?"
▸The Follow-Through Gap

Share of Postmortem Action Items That Actually Get Completed

64% of teams complete fewer than three out of every four postmortem action items (5% complete only 0–25% + 11% complete 26–50% + 48% complete 51–75%). The follow-through gap leaves the loop open.

0–25% complete
5%
26–50% complete
11%
51–75% complete
48%
76–90% complete
28%
91–100% complete
8%
Asked verbatim"Of the action items that come out of postmortems, what percentage would you say actually get completed versus falling off the radar?"
▸The Loop That Won't Close

Have You Experienced the Same Incident Repeating?

93% of teams who answered directly said yes. Incidents do recur because follow-up actions don't get completed. Only 7% said no.

93% Yes (incidents recur)
7% No
Yes — same incident has recurred No
Asked verbatim"After an incident is resolved, what does your postmortem or post-incident review process look like? Has your team ever experienced the same incident repeating because follow-up actions weren't completed?"
▸The Stated Need

How Important Is Native Integration Between Observability & IM?

Among teams already considering a platform switch, 80% say native integration is "very important" or a "critical requirement" (62% very important + 18% critical).

Critical requirement
18%
Very important
62%
Asked verbatim"How important is it that your incident response tooling is natively integrated with your observability data — metrics, logs, traces — rather than bolted on as a separate tool?"
▸Regional Lens

Where the Buying Signal Is Loudest

APAC is the most-active market by a wide margin: nearly twice the rate of EMEA.

APAC · n=18
72%
weighing an IM platform switch
AMER · n=46
41%
weighing an IM platform switch
EMEA · n=39
38%
weighing an IM platform switch
Asked verbatim, broken out by region"Is your team actively evaluating or considering switching incident management platforms in the next 12 months?"
▸Key Insight

The market knows. The buyers are in motion.

When 24% of the most senior practitioners in the field volunteer "consolidate our tools" as their top wish, and another 24% say "more automation," the signal is hard to miss: teams are tired of stitching together best-of-breed tools that don't talk to each other. The follow-through data exposes the cost: 64% of teams complete fewer than 75% of their postmortem action items, and 93% of those who clearly answered yes/no said incidents do repeat.

Action items don't reliably survive the handoff between observability, paging, ticketing, and chat. Four out of five of the teams already weighing a switch say native integration is "very important" or a "critical requirement." For these buyers, native integration is at the top of the spec sheet — and Datadog reads that consistency as the same point AI in IR is also pushing toward: a unified surface where the data already lives together. In APAC, 72% are already weighing a switch, by far the most actively shopping region. The buyers are in motion. The question is which platform delivers the unified surface they're now actively asking for.

The Datadog Perspective

One platform. One surface. Every signal.

Datadog's incident management is built where the metrics, logs, traces, deploys, and team conversations already live. So when a Sev-1 fires, your responders are already on the signal — no six-tab scramble to find the truth. That's how AI in IR actually works: when the underlying data plane is unified, the assistant can finally see the whole picture.

Explore Datadog Incident Management
Methodology

How this research was conducted.

103
Total respondents
Apr 2026
Field period
98%
Have IR involvement / oversight

Research conducted via structured conversational interviews with 103 SREs, platform engineers, DevOps practitioners, and engineering leaders across cloud-native, hybrid, and migrating organizations. Respondents span three geographic regions: AMER (45%), EMEA (38%), and APAC (17%). The sample skews senior, with 91% in manager-level roles or above (40% VP/C-level, 48% Manager/Director, 11% Senior IC). The industry mix is heavily concentrated in SaaS & Technology (62%), with smaller representation from Retail / E-commerce (11%), Financial Services (6%), Healthcare (3%), and Media & Entertainment (2%); 17% are categorized as Other. Org-size skews mid-market, with 81% of respondents at companies of 500–5,000 employees.

All percentages are calculated as a share of unique respondents (not total mentions). Multi-mention questions, such as which business areas are impacted by major incidents and what teams would most like to change, may sum to more than 100% as respondents could cite multiple themes. All findings, including the regional and integration-status cross-tabs woven through this report, draw on the full 103-respondent sample.

Datadog Research // Q2 2026
The Incident Response Reckoning
0