How AI is reshaping the role of the data engineer, the workloads they support, and the infrastructure they need to keep it all reliable.
AI has crossed a threshold inside data engineering. Eighty-nine percent of organizations now deploy AI workloads directly on their data warehouses and data lakes, four out of five teams use AI to answer business questions that used to belong to BI tools, and every single respondent in this survey said their responsibilities have changed in the past year because of AI.
That shift has exposed something data teams have long known but executives have not always heard. Data quality is now the number one blocker to AI adoption, three quarters of those quality issues originate outside the data team's own pipelines, and 95% of practitioners are concerned about the cost and performance impact of running AI on shared infrastructure. Organizations are betting their AI strategy on a foundation that most data engineers do not yet feel fully equipped to support.
A year ago, AI mostly lived in separate ML platforms and notebooks. Today it runs where the data lives. Almost nine in ten teams are deploying AI directly on their warehouses and lakes, and four-fifths of organizations now use AI to answer business questions that previously routed through BI dashboards. The data layer is no longer just a reporting substrate. It is an AI runtime. The pace varies by region: in Europe, 72% of teams report BI usage is decreasing as AI takes over more analytical workloads, while 42% of North American teams say their BI usage has actually grown over the past year alongside their new AI tools.
The data warehouse has quietly become an AI runtime, and that changes what "production" means for a data engineer. Models, copilots, and natural-language queries all read from the same tables that power dashboards. A quality issue that used to embarrass a Tuesday-morning report can now propagate into an LLM-generated answer that ends up in front of a customer. The day-to-day work is shifting too: respondents most often describe AI as a co-author for SQL and pipelines, an anomaly detector for quality issues, and a way to claw back time for higher-value work. The transition is happening everywhere, but not at the same pace. European teams report the steepest decline in BI usage; North American teams report BI growing alongside their new AI tools, leaving the underlying data layer to serve both query types from the same tables.
When asked what is holding back AI adoption, data engineers point at the same thing more often than anything else: the data itself. Quality issues are cited as the single biggest blocker, more than twice as often as the next most common answer. And those issues mostly do not start where the data team can see them. Three out of four originate upstream, in third-party feeds, source applications, schema migrations, and manual entry. The intensity is not evenly distributed: in Europe, more than half of all respondents picked data quality as their #1 AI blocker, compared to just 19% in North America. The shape of the problem differs too: North American teams are most often broken by third-party data partners, while European teams are most often broken by their own internal source systems.
Detection alone is not the hard part. Tracing a downstream symptom back to the upstream system that caused it, fast enough to fix it before AI applications and dashboards consume the bad data, is. Twenty-three percent of teams still take more than a day to resolve a data quality issue once it is discovered, and roughly half of respondents describe a workflow where downstream users find problems first. That puts data engineers in a permanent reactive posture, fixing issues after they have already affected business decisions. The regional pattern adds nuance: 42% of North American teams say their quality issues come from third-party data providers, while 35% of European teams trace issues to their own internal source systems. Same observability gap, different upstream culprits.
Running AI on the data layer is not free. Ninety-five percent of respondents have at least some concern about its cost and performance impact, and 58% describe themselves as "very" or "extremely" concerned. The specifics are revealing: storage growth, budget unpredictability, and resource contention with non-AI workloads top the list. These are the symptoms of putting a new, heavy workload on shared infrastructure with limited visibility into what is actually consuming compute. The pressure is not felt equally everywhere. Forty-eight percent of North American respondents say they are very or extremely concerned, compared to 63% in Europe and 79% in Asia-Pacific.
The traditional data observability question was "is the pipeline broken?" The new question is "is this query worth what it just cost?" As AI moves into the warehouse, FinOps and observability are converging. Most teams do not yet have a single view that connects performance, spend, and the workload responsible for both, which is why 45% say they cannot reliably predict or budget AI infrastructure costs. Concern is not evenly distributed: 48% of North American respondents report being very or extremely concerned, compared to 63% in Europe and 79% in Asia-Pacific. The survey measures the gap, not the cause.
The good news in this data: AI has not displaced data engineers. It has elevated them. Eighty-seven percent feel more valuable than a year ago, and every respondent reports new responsibilities, especially around data quality for AI training and AI output monitoring. The harder news: only 30% feel "very well-equipped" to handle the demands AI is placing on their work, and the skills they most want to develop, MLOps, real-time streaming, observability, and prompt engineering, sit at the intersection of AI and the data infrastructure that has to support it.
The role is expanding faster than the toolset. Engineers are being asked to govern the data feeding AI models, monitor model outputs, and contain AI infrastructure costs, all while keeping the regular pipelines running. Seventy percent say they have skill or tooling gaps. The skills they want to develop point at the same answer the rest of this report does: observability for the new workloads they are responsible for. The priorities split by region in a revealing way. Data observability is the #1 skill priority in North America, named by 52% of respondents, while only 30% of European respondents and 21% of Asia-Pacific respondents put it on their list. European teams instead lead on AI-native skills (51% want MLOps; 51% want prompt engineering and LLM integration). The survey shows the priority split, not the reason behind it.
We asked one final, open-ended question: if you could wave a magic wand and fix one thing about how your company manages data reliability and quality today, what would it be? The 109 free-text answers describe the same reliability and visibility gaps. Catch issues at the source. See them end-to-end. Consolidate the monitoring tools. Standardize the definitions. Automate the root cause. Give every dataset an owner. Stop being reactive. Different vocabulary, one underlying ask.
When asked what one thing they would fix, respondents kept describing the same thing in different words: detect issues before users do, trace them back to the source, stop spending nights firefighting, see everything in one place. Four of the top themes point at the same gap: proactive monitoring, end-to-end lineage, source-level validation, and unified tooling.
The same teams running AI on their warehouses are the ones who keep asking for full-lifecycle visibility, lineage that traces issues to their upstream source, and a single view that ties cost, performance, and reliability together.
Datadog Data Observability gives data teams what this research describes by name: end-to-end lineage from source systems through pipelines, warehouses, and the BI and AI tools that consume them. Anomaly detection catches problems before stakeholders do. And cost, performance, and reliability metrics live in the same place, so the same view answers "is this broken?" and "is this worth it?".
Explore Datadog Data ObservabilityThis research was conducted in April and May of 2026 with 109 data engineering practitioners across North America, Europe, and Asia-Pacific. Respondents were screened to ensure they currently work as Data Engineers, Senior or Lead Data Engineers, Analytics Engineers, Data Platform Engineers, or Data Architects, and that their day-to-day responsibilities include data ingestion, pipelines and orchestration, data lakes, data warehouses, modeling, or architecture.
The survey covered AI's impact on the data engineering role, data quality and reliability practices, infrastructure deployment patterns, cost and performance pressures, and the skills practitioners are prioritizing for 2026.
All percentages are calculated from the 109 unique respondents who completed the survey, except where noted as multi-select. The role-distribution chart shows the 107 respondents who specified a role. Single-select question percentages are rounded using the largest-remainder method so each chart sums to exactly 100%. Open-text responses for the three free-text questions were coded into recurring themes; each respondent could be assigned to multiple themes. Regional cross-tabulations use sample sizes of North America 52, Europe 43, and Asia-Pacific 14; readers should treat the Asia-Pacific percentages as directional given the smaller sub-sample.