AI has crossed a threshold inside data engineering. Eighty-nine percent of organizations now deploy AI workloads directly on their data warehouses and data lakes, four out of five teams use AI to answer business questions that used to belong to BI tools, and every single respondent in this survey said their responsibilities have changed in the past year because of AI.

That shift has exposed something data teams have long known but executives have not always heard. Data quality is now the number one blocker to AI adoption, three quarters of those quality issues originate outside the data team's own pipelines, and 95% of practitioners are concerned about the cost and performance impact of running AI on shared infrastructure. Organizations are betting their AI strategy on a foundation that most data engineers do not yet feel fully equipped to support.

The new normal

AI Has Moved Into the Data Layer

A year ago, AI mostly lived in separate ML platforms and notebooks. Today it runs where the data lives. Almost nine in ten teams are deploying AI directly on their warehouses and lakes, and four-fifths of organizations now use AI to answer business questions that previously routed through BI dashboards. The data layer is no longer just a reporting substrate. It is an AI runtime. The pace varies by region: in Europe, 72% of teams report BI usage is decreasing as AI takes over more analytical workloads, while 42% of North American teams say their BI usage has actually grown over the past year alongside their new AI tools.

Question

Is your company deploying AI directly on your data warehouses or data lakes?

Question

Has your company started using AI to query or analyze business data?

Question

Over the past 12 months, how has your team's usage of traditional BI tools changed?

Question, by region

How has your team's usage of BI tools changed in the past 12 months, broken down by region?

Each region's bars sum to 100% within that region. Sample sizes: North America 52, Europe 43, Asia-Pacific 14.

Open-text question, thematic analysis

How has AI started to change the way you and your company approach data engineering work?

Each percentage = share of the full sample (n=109) whose free-text answer touched on that theme. Respondents could be coded into multiple themes, so totals exceed 100%. About 8–12 respondents per question gave answers that did not match any recurring theme; they remain in the denominator.

Key Insight

The data warehouse has quietly become an AI runtime, and that changes what "production" means for a data engineer. Models, copilots, and natural-language queries all read from the same tables that power dashboards. A quality issue that used to embarrass a Tuesday-morning report can now propagate into an LLM-generated answer that ends up in front of a customer. The day-to-day work is shifting too: respondents most often describe AI as a co-author for SQL and pipelines, an anomaly detector for quality issues, and a way to claw back time for higher-value work. The transition is happening everywhere, but not at the same pace. European teams report the steepest decline in BI usage; North American teams report BI growing alongside their new AI tools, leaving the underlying data layer to serve both query types from the same tables.

AI has elevated the importance of data governance, lineage, and trust. We are now placing greater emphasis on ensuring that data is accurate, auditable, and compliant, as it directly feeds into AI-driven decisions. — Data Architect, VP / SVP, Technology & Software, Europe

We spend less time writing boilerplate now. AI handles the repetitive stuff so we can focus on actual problems. — Data Engineer, Director, Financial Services, Asia-Pacific

The new bottleneck

Data Quality Is the #1 Blocker to AI Adoption

When asked what is holding back AI adoption, data engineers point at the same thing more often than anything else: the data itself. Quality issues are cited as the single biggest blocker, more than twice as often as the next most common answer. And those issues mostly do not start where the data team can see them. Three out of four originate upstream, in third-party feeds, source applications, schema migrations, and manual entry. The intensity is not evenly distributed: in Europe, more than half of all respondents picked data quality as their #1 AI blocker, compared to just 19% in North America. The shape of the problem differs too: North American teams are most often broken by third-party data partners, while European teams are most often broken by their own internal source systems.

Question

What has been your company's single biggest blocker when trying to adopt AI?

Question

When data quality issues come up, where do they typically originate?

Question

On average, how long does it take your team to go from discovering an issue to fully resolving it?

Open-text question, thematic analysis

How does your company currently detect and resolve data quality or pipeline issues? Walk me through what happens when something breaks.

Question, by region

Share of respondents who cited data quality as the #1 blocker to AI adoption, by region

Each bar shows the share of that region's respondents who picked data quality as the #1 blocker; bars do not sum to 100% because each is an independent regional sub-sample. Sample sizes: North America 52, Europe 43, Asia-Pacific 14. Other top blockers in North America include infrastructure limitations (21%) and security and compliance (19%).

Key Insight

Detection alone is not the hard part. Tracing a downstream symptom back to the upstream system that caused it, fast enough to fix it before AI applications and dashboards consume the bad data, is. Twenty-three percent of teams still take more than a day to resolve a data quality issue once it is discovered, and roughly half of respondents describe a workflow where downstream users find problems first. That puts data engineers in a permanent reactive posture, fixing issues after they have already affected business decisions. The regional pattern adds nuance: 42% of North American teams say their quality issues come from third-party data providers, while 35% of European teams trace issues to their own internal source systems. Same observability gap, different upstream culprits.

Quite often we still hear about problems from the business first, especially if something looks off in a dashboard. — Senior Data Engineer, Director, Financial Services, Europe

A lot of data quality problems happen when our external providers send us incorrect data and we usually find out when our users report the issue. — Data Engineer, Senior IC, Retail & E-Commerce, North America

I would make schema changes controlled and visible so nothing breaks silently and teams are not surprised by sudden pipeline failures anymore. — Data Engineer, Director, Technology & Software, Europe

Honestly, I could fix the reactive nature of it all. We spend too much time firefighting instead of preventing issues upfront. — Senior Data Engineer, Director, Financial Services, Europe

The infrastructure tax

AI Is Putting Real Pressure on the Stack

Running AI on the data layer is not free. Ninety-five percent of respondents have at least some concern about its cost and performance impact, and 58% describe themselves as "very" or "extremely" concerned. The specifics are revealing: storage growth, budget unpredictability, and resource contention with non-AI workloads top the list. These are the symptoms of putting a new, heavy workload on shared infrastructure with limited visibility into what is actually consuming compute. The pressure is not felt equally everywhere. Forty-eight percent of North American respondents say they are very or extremely concerned, compared to 63% in Europe and 79% in Asia-Pacific.

Question

What concerns does your company have about the cost and performance impact of integrating AI into your data infrastructure?

Question (multi-select)

Which specific cost and performance concerns is your team facing?

Multi-select question. Percentages show the share of respondents who selected each concern; totals exceed 100%.

Question, by region

How concerned is your company about the cost and performance impact of AI, broken down by region?

Each region's bars sum to 100% within that region. Sample sizes: North America 52, Europe 43, Asia-Pacific 14.

Key Insight

The traditional data observability question was "is the pipeline broken?" The new question is "is this query worth what it just cost?" As AI moves into the warehouse, FinOps and observability are converging. Most teams do not yet have a single view that connects performance, spend, and the workload responsible for both, which is why 45% say they cannot reliably predict or budget AI infrastructure costs. Concern is not evenly distributed: 48% of North American respondents report being very or extremely concerned, compared to 63% in Europe and 79% in Asia-Pacific. The survey measures the gap, not the cause.

The role is changing

Engineers Are More Valuable, Less Equipped

The good news in this data: AI has not displaced data engineers. It has elevated them. Eighty-seven percent feel more valuable than a year ago, and every respondent reports new responsibilities, especially around data quality for AI training and AI output monitoring. The harder news: only 30% feel "very well-equipped" to handle the demands AI is placing on their work, and the skills they most want to develop, MLOps, real-time streaming, observability, and prompt engineering, sit at the intersection of AI and the data infrastructure that has to support it.

Question

Do you feel more or less valuable in your role compared to a year ago?

Question

Do you feel like your company is well-equipped to handle the demands AI is placing on your work?

Question (multi-select)

Which new responsibilities have entered your role because of AI?

Multi-select question. Totals exceed 100% because most engineers report multiple new responsibilities.

Question (multi-select)

Which skills do data engineers most want to develop in 2026?

Multi-select question. Totals exceed 100% because respondents could select multiple skill priorities.

Key Insight

The role is expanding faster than the toolset. Engineers are being asked to govern the data feeding AI models, monitor model outputs, and contain AI infrastructure costs, all while keeping the regular pipelines running. Seventy percent say they have skill or tooling gaps. The skills they want to develop point at the same answer the rest of this report does: observability for the new workloads they are responsible for. The priorities split by region in a revealing way. Data observability is the #1 skill priority in North America, named by 52% of respondents, while only 30% of European respondents and 21% of Asia-Pacific respondents put it on their list. European teams instead lead on AI-native skills (51% want MLOps; 51% want prompt engineering and LLM integration). The survey shows the priority split, not the reason behind it.

The convergence

Teams Are Converging on End-to-End Visibility

We asked one final, open-ended question: if you could wave a magic wand and fix one thing about how your company manages data reliability and quality today, what would it be? The 109 free-text answers describe the same reliability and visibility gaps. Catch issues at the source. See them end-to-end. Consolidate the monitoring tools. Standardize the definitions. Automate the root cause. Give every dataset an owner. Stop being reactive. Different vocabulary, one underlying ask.

Open-text question, thematic analysis

If you could wave a magic wand and fix one thing about how your company manages data reliability and quality today, what would it be and why?

Key Insight

When asked what one thing they would fix, respondents kept describing the same thing in different words: detect issues before users do, trace them back to the source, stop spending nights firefighting, see everything in one place. Four of the top themes point at the same gap: proactive monitoring, end-to-end lineage, source-level validation, and unified tooling.

If I could change one thing, it would be to implement fully proactive data observability, where issues are detected and resolved before impacting users. Right now, too much time is spent reacting rather than preventing problems. — Analytics Engineer, VP / SVP, Technology & Software, North America

End-to-end lineage, instantly. Knowing exactly where bad data came from would cut our debugging time in half. — Data Engineer, Director, Financial Services, Asia-Pacific

End to end lineage tracking would be a total game changer for understanding exactly how a small upstream bug impacts our final dashboards. — Data Architect, VP / SVP, Manufacturing & Industrial, North America

Built for what comes next

This is the gap Datadog Data Observability was built for.

The same teams running AI on their warehouses are the ones who keep asking for full-lifecycle visibility, lineage that traces issues to their upstream source, and a single view that ties cost, performance, and reliability together.

Datadog Data Observability gives data teams what this research describes by name: end-to-end lineage from source systems through pipelines, warehouses, and the BI and AI tools that consume them. Anomaly detection catches problems before stakeholders do. And cost, performance, and reliability metrics live in the same place, so the same view answers "is this broken?" and "is this worth it?".

Explore Datadog Data Observability

Methodology

About this research

This research was conducted in April and May of 2026 with 109 data engineering practitioners across North America, Europe, and Asia-Pacific. Respondents were screened to ensure they currently work as Data Engineers, Senior or Lead Data Engineers, Analytics Engineers, Data Platform Engineers, or Data Architects, and that their day-to-day responsibilities include data ingestion, pipelines and orchestration, data lakes, data warehouses, modeling, or architecture.

The survey covered AI's impact on the data engineering role, data quality and reliability practices, infrastructure deployment patterns, cost and performance pressures, and the skills practitioners are prioritizing for 2026.

109

Total respondents

94%

Mid-market & commercial companies

Industries represented

90%

Manager-level or above

Demographic

Respondent role distribution

Demographic

Industry distribution

Demographic

Company size

Demographic

Seniority level

Demographic

Region

Screening question (multi-select)

Which areas of data work do you currently handle?

Multi-select question. Most respondents handle several areas, so totals exceed 100%.

All percentages are calculated from the 109 unique respondents who completed the survey, except where noted as multi-select. The role-distribution chart shows the 107 respondents who specified a role. Single-select question percentages are rounded using the largest-remainder method so each chart sums to exactly 100%. Open-text responses for the three free-text questions were coded into recurring themes; each respondent could be assigned to multiple themes. Regional cross-tabulations use sample sizes of North America 52, Europe 43, and Asia-Pacific 14; readers should treat the Asia-Pacific percentages as directional given the smaller sub-sample.