Datadog Research // Q2 2026
The AI Observability Reckoning
Datadog Research Report // Q2 2026

The Smoke-Detector Problem.

A new study of 27 senior infrastructure leaders finds that AI in observability has crossed the credibility threshold, but mostly as an alert engine. The teams running production want it to investigate, diagnose, and fix. Most of their tools cannot yet.

Sample
27 Leaders
Field Period
April 2026
Audience
CTO / VP / Director
Coverage
500–5,000+ Employees
The Finding

Adoption is no longer the question. 96% of respondents are at "Significant," "Mature," or "Growing" stages of AI in observability, and every team in the study is using AI for at least one workflow today. But only 37% trust AI to act beyond low-risk tasks. The capability gap is what these leaders raised again and again: they want AI that investigates, diagnoses, and fixes, not just flags and summarizes. That gap is what will decide where the next round of observability dollars goes.

The Gap
85%
use AI today for log summarization. Only 33% say AI's biggest impact is on root cause analysis, the workflow most often named when leaders are asked to pick where AI matters most.
The Stakes
89%
are increasing AI-for-observability budget in the next year. 52% are spending it on data quality and telemetry, not new AI tools.
The Test
70%
say a proven track record of accurate, reliable outputs is their top prerequisite for trusting AI more, narrowly ahead of explainability.
01
Finding 01 / The Capability Gap
Detector vs. Investigator

AI does the summarizing. Engineers still do the thinking.

96% of respondents are at "Significant," "Mature," or "Growing" stages of AI in observability. Only one team described themselves as "Early." But adoption clusters around the easiest, lowest-risk uses: pattern recognition in logs, anomaly detection, alert noise reduction. The harder work (root cause analysis across systems, correlating signals into a verdict, taking action without human review) is exactly where these leaders say their tooling falls short.

Where AI Lives Today

Which observability workflows AI is actually doing

Log summarization is near-universal at 85%. Anomaly detection (63%) and alert correlation (52%) follow. Automated remediation and change impact analysis trail by more than 30 points.

Asked verbatim: Which observability or engineering workflows are you currently using AI for?
Note: Multi-select. Each bar is the share of all 27 respondents who named the workflow. Bars do not sum to 100%.
Where AI Matters Most

The workflow most impacted by AI today

Asked to pick the single workflow where AI delivers the biggest impact, 33% named root cause analysis. Anomaly detection (22%) and alert correlation (15%) round out the top three.

Asked verbatim: Of the workflows you mentioned, which one do you think is most impacted by your use of AI today — and why?
Note: Open-ended responses coded into eight workflow categories from the Q1 multi-select list. Universe = 27.
The Capability Gap

What teams say AI tools don't do yet

Asked open-ended where AI falls short, leaders named four capabilities most often: end-to-end automation (30%), reliability and consistency (26%), accurate root cause analysis (22%), and business-context understanding (19%). Only 15% said there was no real gap.

Asked verbatim: What's the biggest gap between what your AI tooling does for you today and what you need it to do over the next year or two?
Note: Open-ended responses coded into themes. Multi-mention; respondents could name more than one gap, so percentages sum to more than 100%. Universe = 27.
Key Insight

Detectors versus investigators.

The gap between AI's most-used capability (log summarization, 85%) and AI's most-impactful capability (root cause analysis, 33%) is the architecture of the problem in this dataset. Today's AI flags signals well, but respondents say it does not connect them. The capability respondents most often described as missing was end-to-end automation and accurate root cause analysis across systems: a tool that takes the signals AI already surfaces and produces a verdict an engineer can act on.

Until that capability exists, advances in alert noise reduction concentrate the human triage step rather than reducing it: AI gets faster at flagging, but the engineer is still the one who has to figure out why.

The gap is the distance between a tool that acts like a smoke detector and one that acts like a lead investigator. We have plenty detectors; we need the investigator.
— Director of Engineering, SaaS / United States
02
Finding 02 / Delivery Architecture
Native, Hybrid, and Why

Teams want AI where the data already is.

70% deliver AI through native platforms or a combination model. Native (built into the observability platform they already use) leads at 44%, with another 26% taking a combination approach. Only 22% rely primarily on bolt-on integrations. Just 7% built it in-house. The reasoning given by respondents is consistent across roles and regions: speed to value, reduced architectural complexity, and AI that already understands their telemetry.

Delivery Model

How AI is delivered in observability today

Native AI built into the existing observability platform leads at 44%, with another 26% taking a combination approach. Just 7% of teams have built AI internally.

Asked verbatim: How is AI delivered in your observability environment today?
Note: Single-select. Bars sum to 100% within universe = 27 (rounded).
Approach Preference

What teams want their AI experience to feel like

85% favor either a hybrid model or full customization. They want turnkey defaults they can extend or override. Only one respondent wanted out-of-the-box AI with predefined workflows. Only three said they had no strong preference.

Asked verbatim: When it comes to AI in your observability stack, which approach do you prefer — out-of-the-box AI with predefined workflows, hybrid (turnkey defaults with the ability to customize), fully customizable, or no strong preference?
Note: Single-select. Bars sum to 100% within universe = 27 (rounded).
The Foundation

How monitoring quality limits AI performance

Among the 41% of teams who say their monitoring has gaps or limited impact, the consequences are concrete: missed incidents, wasted engineer time, and false alarms that erode trust. Strong monitoring is the precondition for AI to be useful at all.

Asked verbatim: What impact does your monitoring quality have on the performance of your AI tooling — does AI's accuracy depend on how strong your monitoring is, or is it largely independent?
Note: Single-select. Universe = 27.
Key Insight

Buyers want AI that is unified, customizable, and grounded.

Three findings move together. 70% of teams choose native or combination delivery. 85% want hybrid or fully customizable AI. And 41% say monitoring quality has at least some impact on how their AI tools perform. The pattern points to one architectural preference: AI grounded in unified telemetry, with platform defaults that engineers can override. Bolt-on AI living outside the data plane is not where buyers are spending. Out-of-the-box AI with no customization is also not where they are spending.

For vendors, this is a positioning gate. The buyers signaling they are ready to spend more next year are not asking for more AI features. They are asking for AI that is already plugged into their telemetry and tunable to their environment.

Native AI tools are already integrated with our telemetry data, ensuring better security and more accurate insights without the high overhead of building and maintaining an internal LLM infrastructure.
— Director of Engineering, SaaS / United States
03
Finding 03 / Business Impact
What AI Failure Actually Costs

When AI gets it wrong, the bill arrives in engineer-hours.

Asked to walk through the most recent time AI fell short, whether it was a hallucinated root cause, a missed alert, or a false positive, leaders described real, quantifiable downstream cost. 59% named slower incident response and stretched MTTR. 41% pointed to wasted engineer time. 19% described customer-facing impact: checkout errors, two-hour service degradations, customer experience hits. These are not theoretical AI safety concerns. They are the receipts from the most recent incidents these teams worked through.

The Reliability Blocker

The hardest part of getting AI to work reliably

When forced to name the single hardest part, the answers cluster around four roughly equal frustrations: data quality, model drift, context and interpretability, and trust and accuracy. Notably, "data quality" and "signal-to-noise tuning" registered as distinct problems and were not grouped together by respondents.

Asked verbatim: What has been the hardest part of getting AI to work reliably in your observability stack?
Note: Single-select. Universe = 27.
Downstream Cost

What AI failure actually does to the business

From open-ended descriptions of recent AI-related incidents, six business-impact themes emerged. Slower incident response and wasted engineer time dominate. Customer-facing impact and trust erosion appeared often enough to be reportable.

Asked verbatim: Tell me about the last time that was an issue — what did it look like in practice, and what was the downstream impact on your business (engineer time, customer experience, revenue, etc.)?
Note: Multi-mention themes coded from open-ended responses. Each bar is the share of all 27 respondents who named the impact. Universe = 27.
Key Insight

Bad AI and broken stack produce the same engineering bill.

The most consistent pattern across the incident stories: when AI is wrong, the response time gets longer, not shorter. One Director of Engineering described AI flagging a planned mitigation as a critical incident, pulling senior engineers into a 45-minute investigation of a non-issue. A false alarm pulls a team into a non-incident. A missed signal lets a real outage stretch into customer impact. Trust erosion sits underneath all of it. Three respondents described AI failures that had affected how willing their team was to rely on AI again, citing concerns like looking "incompetent" and ongoing engineering "stress."

Practitioners are now able to put hours, dollars, and customer impact against specific AI failures their teams have lived through. That concreteness is what makes the trust prerequisites in the next section so specific.

The AI hallucinated a root cause during a database spike — wasting hours of senior engineer time and delaying resolution, which directly impacted customer experience.
— Director of Engineering, SaaS / United States
04
Finding 04 / Trust & Autonomy
The Earn-Out Question

Trust earns itself with a track record.

Asked which two qualities most matter for trusting AI more broadly, leaders gave a sharply differentiated answer. 70% named a proven track record of accurate, reliable outputs. Explainability followed at 52%. Consistent performance over time at 48%. Visibility into model behavior trailed at 26%. Regulatory oversight at 19%. The top two describe a single concept: show me the receipts.

The Trust Bar

What earns broader AI trust

Picking the top two prerequisites yields clear differentiation. Track record (70%) and explainability (52%) lead. Consistent performance over time follows at 48%. Visibility into model behavior and compliance oversight are clearly secondary.

Asked verbatim: What are the two most important prerequisites for trusting AI more broadly in your environment? (Pick top 2.)
Note: Multi-select capped at 2 per respondent. Bars are share of 27. Multi-mention; bars sum to ~215% by design.
The Autonomy Ceiling

How much autonomy teams give AI in production today

Despite high adoption, the autonomy bar is conservative. 63% limit AI to either recommendations only or low-risk well-defined actions such as auto-scaling and service restarts. Only 11% allow AI to operate autonomously across production workflows.

Asked verbatim: What level of autonomy are you comfortable giving AI in your production environment today?
Note: Single-select. Universe = 27.
Cross-Cut: Adoption Stage

The autonomy gap between advanced and earlier-stage teams

Splitting the same autonomy question by adoption stage exposes a sharp threshold. 100% of Early and Growing teams keep AI at "recommendations only" or "low-risk only." Among Significant and Mature teams, half have crossed into letting AI act on most operational tasks or operate autonomously. Earning the next bracket of autonomy is what differentiates an advanced AI-observability practice from a growing one.

Note: Single-select autonomy levels split by adoption stage. Universe = 27 (Significant + Mature n=20, Growing + Early n=7). Small n on the Growing/Early side; report as directional.
Cross-Cut: Adoption Stage

Earlier-stage teams want explainability. Advanced teams want a track record.

The trust prerequisites question splits cleanly by maturity. Among Early and Growing teams, explainability leads at 71% — they want to see how AI thinks before they trust it. Among Significant and Mature teams, track record leads at 75% — they have lived with AI long enough to want evidence it works, not theory of how. The shift is the practitioner's journey from "I want to understand it" to "I want it to be right."

Note: Top-2 multi-select trust prerequisites split by adoption stage. Bars are share of respondents within each group who named the prerequisite. Universe = 27 (Significant + Mature n=20, Growing + Early n=7). Small n on the Growing/Early side; report as directional.
Cross-Cut: Region

EMEA leaders are notably more conservative on autonomy

A second cut, by region, surfaces a directional signal worth flagging for the next fielding wave. 67% of EMEA respondents keep AI at low-risk actions only, compared to 29% in AMER. None of the six EMEA respondents allow AI to act independently on most operational tasks; 33% of AMER respondents do. With only six EMEA respondents in this round, this is a directional read rather than a finding, but the absolute zero on "act independently" warrants follow-up at scale.

Note: Single-select autonomy levels split by region. Universe = 27 (AMER n=21, EMEA n=6). EMEA n is small; report as directional.
Key Insight

The path to autonomy runs through audit logs.

Two questions answered together describe the entire trust ladder in this market. What earns trust? A proven track record, ahead of explainability. What does AI get to do today? Mostly recommendations and low-risk actions. The connecting tissue is one word both groups used: proof. Several respondents proposed concrete metrics. "False positives under 5% across our monitoring stack, measured against known incidents we've manually validated." "A traceability log linking every AI action to specific telemetry markers." "99% success rate on automated low-risk remediations over six months without false positives."

The cross-cuts sharpen the picture. Earlier-stage teams want to understand AI; advanced teams want it to be right. Explainability dominates the trust list at 71% among Growing and Early teams; track record dominates at 75% among Significant and Mature. And the autonomy ceiling moves with maturity: not a single Early or Growing team has crossed past low-risk action, while half of Significant + Mature teams have.

The answer to "how do we earn more autonomy?" runs through audit, in the words these respondents used. The proof they describe — "this AI was right, this many times, in environments like ours, with this exact log" — is what they say would unlock the next bracket of autonomy.

"Proof" would be a 99% success rate on automated low-risk remediations over a six-month period without any false positives.
— Director of Cloud / Reliability, SaaS / United States
05
Finding 05 / Investment Forecast
Where the Dollars Are Going

The next round of spend goes to the data underneath AI.

Eighty-nine percent of respondents are increasing their AI-for-observability investment in the next year. 26% by more than 30%, another 63% by 10 to 30%. Asked where the dollars are going, the top answer was not "buy more AI tools." It was data quality, pipelines, and telemetry work to support AI, named by 52%. Training engineers (41%) and buying additional AI-capable observability platforms (37%) followed.

Investment Direction

Expected change in AI-for-observability investment

89% are increasing. 7% (two respondents) are holding steady or scaling back. One is too early to say. The direction is unambiguous. The question is what the money is being spent on.

Asked verbatim: How do you expect your organization's investment in AI for observability to change over the next 12 months?
Note: Single-select. Universe = 27.
Investment Destinations

Where the AI-for-observability dollars are going

Asked to pick the top two destinations, data quality, pipeline, and telemetry work leads at 52%. Training existing engineering staff is second at 41%. Vendor tools are third at 37%. Hiring AI and ML engineers, infrastructure costs, and internal AI tooling cluster together in a third tier.

Asked verbatim: Where specifically are those investment dollars going? (Pick your top 2.)
Note: Multi-select capped at 2 per respondent (some chose 3). Bars are share of 27.
What's Driving It

The forces making the case internally

Three drivers dominate the open-ended responses: executive and board interest (41%), pressure to do more with less (37%), and competitive pressure (30%). Specific incidents and reliability improvements are secondary motivators. Leadership-level interest is the most-cited driver in the open-ended responses.

Asked verbatim: What are the key drivers behind that decision — what's making the case internally?
Note: Multi-mention themes coded from open-ended responses. Each bar is the share of all 27 respondents who cited the driver.
Key Insight

Most of the new AI spend is foundation work.

The headline statistic, 89% increasing AI investment, is direction. What matters is where the money lands. The top two spending priorities (data quality at 52%, training at 41%) are not AI features at all. They are the foundation that makes AI work in the first place. Only 22% are spending on infrastructure and compute. Only 22% on building internal AI tooling. Only 19% on consolidating onto a single platform.

Leadership pressure shows up clearly. 41% of respondents cited executive or board-level interest as a top driver, ahead of cost-cutting (37%) and competitive pressure (30%). The dollars are going to the engineering substrate as often as the AI ribbon on top of it. Vendors who can show their AI is grounded in clean telemetry, and who can help teams clean it up, are aligned with where these 27 buyers are spending next.

It's pressure to do more with less. We need AI to handle increasing system complexity and reduce the manual troubleshooting burden on our engineers.
— Director of Engineering, SaaS / United States
The Datadog Perspective

One platform. Grounded in your telemetry.

Datadog Bits AI is built where your logs, metrics, traces, and incident data already live. That architectural choice is what 70% of respondents in this study chose for their own AI delivery, and what 85% said they prefer when given the option. When AI sits on a unified data plane, the leap from detector to investigator becomes a software problem rather than an architecture problem. Root cause across systems. Audit trails any senior engineer can review. Native, customizable, grounded.

Explore Datadog Bits AI →
Methodology

How this research was conducted.

Total Respondents
27
Field Period
April 2026
Conversational Survey
100%

Industry Mix

Industry classification for the 16 Cint panelists drawn from the panel-side STANDARD_INDUSTRY_PERSONAL field. The 11 Pure Spectrum panelists were screened to "Science / Technology / Programming" at recruitment and roll into SaaS / Tech. The resulting 89% SaaS / Tech skew is a panel-composition artifact and a known limitation of this round.

Title Mix

"What's your current title?" Single-select with options including Technical Lead, AI Platform Lead, Head of Platform Engineering, Head of Infrastructure, Director of SRE, Director of Cloud / Reliability, Director of Engineering, VP of Engineering, and CTO. Open-ended responses normalized to the closest title.

Region

"What country do you live in?" Open-ended responses mapped to AMER (United States) and EMEA (United Kingdom). Heavy AMER skew is a known limitation of this round and will be addressed in subsequent fielding.

Company Size

"How many employees are at your company across all departments?" Bracket responses preserved; raw numbers binned to standard brackets.

AI Adoption Stage

"How far along is your organization in adopting AI for observability?" Single-select between Early, Growing, Significant, and Mature.

Research conducted via conversational AI-led interviews with 27 senior infrastructure leaders (CTO, VP, Director-level and above) at companies of 500+ employees. Respondents were screened for direct involvement in observability budget approval and AI tooling decisions. The field instrument was a hybrid structured plus open-ended conversation guide containing 14 main questions and 13 follow-ups, all asked of every respondent regardless of branching logic.

The sample skews AMER (78%), with EMEA representation at 22% and no APAC respondents in this round. Industry composition is heavily concentrated in SaaS / Tech (89%), reflecting the panel quotas at recruitment for both supplier panels: Cint panelists were sourced primarily from Information Technology and Computer Software, and Pure Spectrum panelists were screened to "Science / Technology / Programming." Healthcare, Financial Services, and Retail are each represented by a single respondent. Vertical breadth is a known limitation and is being addressed in the next fielding wave with explicit quotas across additional industries. All percentages are calculated as a share of the 27 unique respondents (not total mentions). Multi-mention questions, including workflows in use, business-impact themes, and investment destinations, may sum to more than 100% as respondents could cite multiple categories. Open-ended responses were coded into themes by a single analyst using pre-defined keyword categories fixed before coding began; each respondent counted once per primary theme. Quotes are verbatim from respondents who passed both screener and post-fielding fraud review (42 of 69 completers were excluded for low-effort, off-topic, templated AI, or paste-artifact responses; only the remaining 27 contribute to this report).

Questions about data quality and signal-to-noise tuning are reported as distinct categories rather than grouped together, reflecting feedback from the prior round. The single-select trust prerequisites question was changed to a top-2 multi-select to surface meaningful differentiation. The prior round's flat distribution would not have been narratable.

Datadog Research Q2 2026
The Smoke-Detector Problem
0