Datadog Research Report  //  Q2 2026

The Incident
Response
Reckoning.

Engineering teams are firefighting Sev-1s across half a dozen disconnected tools while AI-accelerated software ships faster than their workflows can adapt. New research with 103 SREs, platform leaders, and engineering executives reveals an industry caught between fragmented stacks and rising stakes.
Sample
103 Practitioners
Field Period
April 2026
Audience
SRE / Platform / Eng. Leadership
Coverage
Cloud-Native & Hybrid Orgs
The Finding

The teams responsible for keeping production running don't have a tooling problem — they have a fragmentation problem. Half run observability and incident management on entirely separate platforms; three in four say that disconnect slows their root cause analysis. The downstream cost is a loop that won't close: 93% of teams who clearly answered yes/no to recurrence said the same incident has happened more than once. And as AI accelerates delivery, the response gap is widest in the region most assumed to be ahead — only 28% of AMER respondents say their incident response has kept pace with AI-accelerated software shipping, vs. 67% in APAC.

▸ Tooling Reality
77%
say separate observability and incident tools slow down their root cause analysis.
▸ Business Stakes
79%
have already put a dollar figure on what an outage costs their business.
▸ The Broken Loop
64%
complete fewer than three in four postmortem action items — and incidents keep repeating.
Finding 01 / Infrastructure 01

A stack pulled in too many directions.

Modern incident response is rarely a single workflow — it's a relay race between half a dozen tools. Half of teams operate observability and incident management on completely separate platforms, and half run five or more distinct tools across the incident lifecycle. The result is a stack that looks comprehensive on paper and feels chaotic the moment a Sev-1 fires.

▸ Platform Architecture

Where Observability Lives vs. Where Incidents Get Managed

Half of respondents say their metrics, logs, and traces sit on a different platform than their incident management.
Asked verbatim"Do your observability tools (metrics, logs, traces) live in the same platform as your incident management, or are they separate?"
▸ Tool Sprawl

How Many Tools Touch a Single Incident

50% use 5 or more tools across the incident lifecycle (28% with 5–6 tools + 14% with 7–9 + 8% with 10+).
Asked verbatim"Walk me through the tools your team uses across the incident lifecycle — from alerting and paging through triage, investigation, and resolution. How many separate tools are involved?"
NoteUniverse = the 72 respondents whose answers could be quantified — counting digits ("5 tools"), word-numbers ("about five"), and explicit lists of tool brands. Bands sum to 100% within that universe; 31 respondents gave qualitative answers ("a variety of tools") that couldn't be counted.
▸ The Incident Stack

The Tools Practitioners Actually Reach For

Mentions across alerting, observability, communication, and ticketing — share of all 103 respondents who named the tool.
Asked verbatim"Walk me through the tools your team uses across the incident lifecycle — from alerting and paging through triage, investigation, and resolution."
NoteMulti-mention: each respondent could name multiple tools, so percentages sum to more than 100%.
▸ Stack Anatomy

How the Incident Stack Breaks Down by Function

The same tools mapped to the lifecycle role they play. 49% of teams who articulate their stack span 3 or more functional categories — that's where the boundary-crossing happens during a Sev-1.
Asked verbatim"Walk me through the tools your team uses across the incident lifecycle — from alerting and paging through triage, investigation, and resolution. How many separate tools are involved?"
NoteUniverse = the 46 respondents who named specific tools (others gave a count without naming brands). Each bar is the share within that universe who mentioned at least one tool in the category. Multi-mention: most teams span 3+ categories, so percentages do not sum to 100%.
Key Insight

The cost of fragmentation isn't licensing. It's context loss.

Every tool boundary a responder crosses during an incident is a moment where signal is lost, attention fractures, and time-to-resolution stretches. The teams hit hardest aren't the ones with too few tools — they're the ones whose tools refuse to talk to each other. Tool sprawl isn't a budgeting problem; it's a latency problem dressed up as a procurement decision.

During a Sev-1, today's reality is usually: alert fires in PagerDuty, metrics live in Datadog, logs live somewhere else, deploy history is in CI/CD, discussion happens in Slack.
— Manager / Director, Tech Ops & IT Operations
Finding 02 / Workflow Impact 02

The fragmentation tax shows up everywhere.

A fragmented stack doesn't stay theoretical for long. Disconnected tools don't just feel inconvenient — they bend response times, cloud judgment in the first 15 minutes, and turn outages into business events. 77% of respondents said separate observability and incident tools meaningfully slow root cause analysis — but the pain isn't evenly distributed. EMEA respondents feel it hardest (79%), followed by APAC (72%) and AMER (65%) — every region is above 60%, but the intensity varies. The dollars aren't hypothetical either: nearly four out of five teams have already quantified what an outage costs them. The cost shows up in MTTR data, in on-call rotations that lean on resilience instead of process, and in a "first 15 minutes" that more responders describe as decoded in real time than executed from a runbook. Beneath all of these numbers sits one structural truth: the cost of fragmentation isn't a single line item — it shows up in MTTR, in on-call burnout, in business impact, and in the seams between every tool a responder has to cross.

▸ The Time Penalty

Do Separate Tools Slow Root Cause Analysis?

77% say yes (50% somewhat + 27% significantly). Only 16% claim no impact at all.
Asked verbatim"Do you feel like having observability data and incident management in separate tools slows down time to root cause?"
▸ Outage Math

Have You Quantified What an Outage Costs?

79% have quantified outage cost in dollar terms (38% specific estimates + 41% rough ballpark figures). Leadership now expects a number, not a feeling.
Asked verbatim"Have you ever been able to quantify the cost of an outage in dollar terms, even roughly?"
▸ Where the Pain Lands

What Major Incidents Actually Hurt

When asked open-ended how downtime impacts the business, respondents named these areas.
Asked verbatim"When a major incident causes downtime, how does that impact the business beyond the engineering team — things like revenue, customer experience, or leadership confidence?"
NoteMulti-mention: open-ended answers were coded into themes; respondents could cite more than one impact, so percentages sum to more than 100%.
▸ The Human Cost

How Practitioners Describe the On-Call Experience

Only 40% described on-call as cleanly stable or supported. 48% flagged stress, fatigue, or mixed strain (39% mixed: stressful but manageable + 9% negative: morale strain) — even where rotations are well-run.
Asked verbatim"How would you describe the on-call experience at your organization, and what's the impact on your team's morale, retention, and day-to-day productivity?"
NoteOpen-ended responses categorized by sentiment. Universe = 101 respondents who described their on-call experience.
▸ Regional Lens

The Fragmentation Tax, Region by Region

EMEA practitioners feel the slowdown most acutely (79%); AMER feels it least (65%). Every region is above 60% — the pain is global; the intensity isn't.
Asked verbatim, broken out by region"Do you feel like having observability data and incident management in separate tools slows down time to root cause?"
NoteEach bar is the share within its own region (n=46 / 39 / 18) who answered "Yes, significantly" or "Yes, somewhat." Bars are independent regional statistics, not parts of a single distribution.
▸ Regional Lens

Who Names Revenue Loss as a Top Business Impact, by Region

EMEA respondents are 19 percentage points more likely than AMER to name revenue loss as a primary business impact of major incidents. Outside North America, finance leaders appear to have already wired downtime into the P&L conversation.
Asked verbatim, broken out by region"When a major incident causes downtime, how does that impact the business beyond the engineering team — things like revenue, customer experience, or leadership confidence?"
NoteEach bar is the share within its own region (n=46 / 39 / 18) who explicitly named revenue, sales, financial loss, or P&L impact in their open-ended answer. Multi-mention coding: respondents could cite multiple business impacts, so this is share who included revenue, not exclusive choice. Bars are independent regional statistics.
Key Insight

Incidents have graduated from engineering events to board-level events.

The era of "we'll deal with downtime when it happens" is over. 58% of teams cite revenue loss as a direct impact, 55% point to customer trust, and 39% feel it in leadership confidence. The regional split sharpens the picture: 69% of EMEA respondents and 61% of APAC respondents named revenue loss as a top business impact, vs. 50% in AMER — outside North America, finance leaders appear to have already wired downtime into the P&L conversation. Engineering organizations are now expected to defend their MTTR the way finance defends its books. The fragmentation tax is the cost they're paying first — and it shows up not in dashboards, but in burnout, in follow-through, and in the seams between every tool a responder has to cross.

It mainly hits customer experience first — users lose trust quickly. If it lasts, it can affect revenue and trigger escalation up to leadership because it becomes a reputational risk. Engineering feels it too, but the real pressure is business confidence and customer retention.
— Chief Technology Officer, SaaS & Technology
Finding 03 / The AI Gap 03

Software ships faster. Response can't catch up.

Then AI accelerated everything. AI is no longer a curiosity in incident response — 94% of teams are using or trialing it in some form. But adoption isn't the same as readiness: 44% of respondents say their incident response process hasn't kept pace with AI-accelerated software delivery, and the readiness gap is widest in AMER. Only 28% of AMER respondents say their IR has kept pace, vs. 51% in EMEA and 67% in APAC — the pattern flips the typical "North America leads" assumption. Code is being shipped at machine velocity; investigation, triage, and postmortem are still mostly running at human speed. The teams pulling ahead aren't necessarily the ones with the most AI features — they're the ones whose AI sits on top of unified data, not a federation of disconnected dashboards.

▸ Adoption Curve

AI Inside the Incident Response Workflow Today

94% are using or trialing AI in their IR workflow (59% actively using + 24% limited / early adoption + 11% piloting / planning). Adoption is near-universal but uneven.
Asked verbatim"Has your team adopted any AI or automation in your incident response workflow? Where are you using it, what improvements have you seen, and where are the biggest gaps or barriers to adoption?"
NoteOpen-ended responses categorized by adoption stage.
▸ The Velocity Gap

Has IR Kept Pace with AI-Accelerated Delivery?

44% report a pace gap (27% somewhat behind + 17% significant gap) between how fast software ships and how fast incidents get resolved.
Asked verbatim"Do you feel like your incident response processes have kept pace with AI-accelerated software delivery, or is there a widening gap?"
▸ Where AI Earns Its Keep

The Stages Where AI Delivers the Most Value

Respondents pointed first to triage and investigation — the early minutes that decide everything.
Asked verbatim"Where do you see the most value for AI in IR specifically — is it more in triage, in investigation, in postmortems, somewhere else?"
NoteMulti-mention: respondents could name more than one stage, so percentages sum to more than 100%.
▸ Regional Lens

Where the AI Velocity Gap Is Widest

APAC reports the highest readiness; AMER the lowest. The pattern flips the typical "North America leads, the rest catches up" story.
Asked verbatim, broken out by region"Do you feel like your incident response processes have kept pace with AI-accelerated software delivery, or is there a widening gap?"
NoteCross-tab: each bar is a share within its own region (n=46 / 39 / 18). Bars are independent statistics, not parts of a single distribution.
Key Insight

AI in IR is mostly a Band-Aid until the underlying stack is unified.

An AI agent that can't see logs, traces, deploys, and incident state in the same context is doing pattern-matching with one eye closed. When the data plane is fragmented, every AI suggestion is one disconnected dashboard away from a wrong answer responders can't verify. The regional split sharpens the point: only 28% of AMER respondents say their IR has kept pace with AI delivery, vs. 51% in EMEA and 67% in APAC. The teams getting real lift from AI in incident response have one thing in common: their AI sits on top of a unified data plane, not a federation of disconnected dashboards. That's why "we adopted AI" and "our IR has kept pace with AI-driven delivery" are not the same answer — and why the gap is biggest in the region that ships fastest.

AI tools like copilots have sped up code velocity, leading to 20–30% more deployments weekly. This correlates with a similar rise in incidents — mostly P2/P3 config drifts or integration bugs from AI-generated code lacking edge-case handling.
— CTO, VP / C-Level
Finding 04 / What Comes Next 04

The improvement loop is broken.

Practitioners aren't waiting for the gap to close itself. Asked the open-ended question — "if you could change one thing about how your organization handles incident response, what would it be?" — they didn't ask for more dashboards. They asked for fewer. Tool consolidation, automation, and noise reduction were the three most-cited wishes, and nearly half are already actively evaluating switching their incident management platform. The buying intent is strongest in APAC: 72% of APAC respondents are weighing a switch, vs. 41% in AMER and 38% in EMEA. The follow-through gap reinforces the same message: postmortem action items don't reliably ship, and four out of five of the teams already weighing a platform switch say native integration with observability is "very important" or a "critical requirement".

▸ The Wishlist

What They'd Change First, In Their Own Words

Top themes from open-ended responses to "what would you change?"
Asked verbatim"If you could change one thing about how your organization handles incident response — tooling, process, culture, anything — what would it be and why?"
NoteMulti-mention themes coded from open-ended responses. Respondents could name more than one wish, so percentages sum to more than 100%.
▸ Buying Signal

Are You Considering Switching Your IM Platform?

48% are already evaluating alternatives in the next 12 months — only 43% are firmly satisfied.
Asked verbatim"How well is your current incident management platform meeting your team's needs, and is your team actively evaluating or considering switching platforms in the next 12 months?"
▸ The Follow-Through Gap

Share of Postmortem Action Items That Actually Get Completed

64% of teams complete fewer than three out of every four postmortem action items (5% complete only 0–25% + 11% complete 26–50% + 48% complete 51–75%). The follow-through gap leaves the loop open.
Asked verbatim"Of the action items that come out of postmortems, what percentage would you say actually get completed versus falling off the radar?"
NoteUniverse = the 97 respondents whose answers could be quantified — counting digit percentages ("70%"), word percentages ("eighty percent"), and ranges via midpoint ("70–80%" → 75%). 3 respondents weren't asked the question; another 3 gave answers that couldn't be converted to a number ("action items are vague").
▸ The Loop That Won't Close

Have You Experienced the Same Incident Repeating?

93% of teams who answered directly said yes — incidents do recur because follow-up actions don't get completed. Only 7% said no.
Asked verbatim"After an incident is resolved, what does your postmortem or post-incident review process look like? Has your team ever experienced the same incident repeating because follow-up actions weren't completed?"
NoteUniverse = 41 respondents who gave a clear yes/no answer to whether incidents recur due to incomplete follow-ups. Of the remaining 62, most either focused their answer on their postmortem process without addressing recurrence, or gave qualified responses ("rarely," "depends") that couldn't be cleanly bucketed.
▸ The Stated Need

How Important Is Native Integration Between Observability & IM?

Among teams already considering a platform switch, 80% say native integration is "very important" or a "critical requirement" (62% very important + 18% critical). Almost no one called it optional or "nice to have." This is the active-buyer segment speaking.
Asked verbatim"How important is it that your incident response tooling is natively integrated with your observability data — metrics, logs, traces — rather than bolted on as a separate tool?"
NoteQ13-a branch question: only asked of respondents whose Q13 answer classified as "evaluating or considering switching platforms in the next 12 months." So this chart represents the views of the active-buyer segment (n=60), not the full panel. Open responses categorized into the survey's 5-point scale (Critical requirement / Very important / Somewhat important / Nice to have / Not important); zero responses fell into "Nice to have" or "Not important," and 6 of 60 couldn't be cleanly categorized.
▸ Regional Lens

Where the Buying Signal Is Loudest

APAC is the most-active market by a wide margin — nearly twice the rate of EMEA.
Asked verbatim, broken out by region"Is your team actively evaluating or considering switching incident management platforms in the next 12 months?"
NoteCross-tab: each bar is a share within its own region (n=46 / 39 / 18). Bars are independent statistics, not parts of a single distribution.
Key Insight

The market knows. The buyers are in motion.

When 1 in 5 of the most senior practitioners in the field volunteer "consolidate our tools" as their top wish, and another 18% say "more automation," the signal is unambiguous: teams are tired of stitching together best-of-breed tools that don't talk to each other. The follow-through data exposes the cost: 64% of teams complete fewer than 75% of their postmortem action items, and 93% of those who clearly answered yes/no said incidents do repeat. Action items don't survive the handoff between observability, paging, ticketing, and chat — they get lost in the seams of the stack. Four out of five of the teams already weighing a switch say native integration is "very important" or a "critical requirement" — the active-buyer segment knows exactly what it wants. And in APAC, 72% are already weighing a switch — by far the most actively shopping region. The buyers are in motion. The question is which platform finally delivers the unified surface the work demands.

The Datadog Perspective

One platform.
One surface.
Every signal.

Datadog's incident management is built where the metrics, logs, traces, deploys, and team conversations already live — so when a Sev-1 fires, your responders aren't context-switching across six tabs to find the truth. They're already on it. That's how AI in IR actually works: when the underlying data plane is unified, the assistant can finally see the whole picture.

Explore Datadog Incident Management
Methodology

How this research was conducted.

103
Total Respondents
Apr
2026 Field Period
98%
Have IR Involvement / Oversight

Region

Asked verbatim"Which region are you primarily based in?"

Industry

Asked verbatim"Which industry best describes your organization?"

Org Size (Employees)

Asked verbatim"Approximately how many employees does your organization have?"

Respondent Roles

Asked verbatim"What is your primary role?"

Seniority

Asked verbatim"What is your level / seniority at your organization?"

Research conducted via structured conversational interviews with 103 SREs, platform engineers, DevOps practitioners, and engineering leaders across cloud-native, hybrid, and migrating organizations. Respondents span three geographic regions — AMER (45%), EMEA (38%), and APAC (17%) — and the sample skews senior, with 91% in manager-level roles or above (40% VP/C-level, 48% Manager/Director, 11% Senior IC). The industry mix is heavily concentrated in SaaS & Technology (62%), with smaller representation from Retail / E-commerce (11%), Financial Services (6%), Healthcare (3%), and Media & Entertainment (2%); 17% are categorized as Other. Org-size skews mid-market, with 81% of respondents at companies of 500–5,000 employees. All percentages are calculated as a share of unique respondents (not total mentions). Multi-mention questions — such as which business areas are impacted by major incidents and what teams would most like to change — may sum to more than 100% as respondents could cite multiple themes. All findings, including the regional and integration-status cross-tabs woven through this report, draw on the full 103-respondent sample.

Datadog Research // Q2 2026
The Incident Response Reckoning
0