How data engineering teams are navigating upstream failures, AI-driven transformation, and the collapse of traditional BI — and what it means for observability.
Despite massive investment in AI capabilities, data engineering teams say the fundamentals are still broken. Forty-three percent of respondents named data quality as the single biggest barrier preventing their organizations from fully leveraging AI — outpacing budget constraints, talent gaps, and tooling limitations by wide margins.
Data quality isn't a new challenge — but AI has raised the stakes dramatically. Models amplify the impact of bad data, turning small inconsistencies into compounding errors at scale. Teams that previously tolerated "good enough" data are now discovering that AI demands a far higher quality bar than traditional analytics ever did.
We spent three months building an ML pipeline, and it all fell apart because our source data had silent schema drift nobody caught. The model was technically working — just on garbage inputs.
When data breaks, data teams are the ones who hear about it. But 64% of respondents say the root cause of most issues lies upstream — in source systems, application teams, and third-party data feeds that data engineers have limited ability to control or monitor.
This ownership-vs-control gap is the defining frustration of modern data engineering. Teams are accountable for data reliability but lack visibility into the systems where problems start. Without end-to-end observability that extends upstream into source applications and ingestion layers, data engineers are stuck playing detective after the damage is done.
We're basically the canary in the coal mine. By the time we know something broke upstream, three dashboards are already wrong and the CFO is asking questions.
Honestly, our most reliable monitoring system is a Slack message from an analyst saying 'the numbers look weird today.' That's not a process — that's a prayer.
Traditional BI is losing its monopoly on business intelligence. Nearly three-quarters of respondents say their organizations now use AI to answer questions that would have gone through dashboards and reports just a year ago — and 58% are seeing measurable declines in traditional BI tool usage.
The shift from BI to AI-powered analytics isn't just about new tools — it fundamentally changes what data teams need to deliver. When AI is querying your warehouse directly, the tolerance for stale, inconsistent, or undocumented data drops to zero. Observability becomes the prerequisite, not the afterthought.
Our product managers used to open Looker five times a day. Now they ask the AI assistant and it hits the warehouse directly. It's faster, but it means our data has to be perfect — there's no dashboard logic to mask the mess anymore.
The enthusiasm for AI is real — but so is the concern about what it costs to run. A striking 92% of data engineers expressed some level of concern about the cost and performance impact of integrating AI into their data infrastructure, making it the most universally shared anxiety in the survey.
Cost concerns don't mean teams want to slow down — they want to move fast with visibility. The gap isn't ambition; it's observability. Without clear insight into query costs, compute utilization, and performance bottlenecks across their AI workloads, teams are flying blind through the most expensive infrastructure decisions they've ever made.
We turned on an AI feature that hit our Snowflake bill like a freight train. Nobody had visibility into the query patterns until the invoice came. We need observability for cost the same way we have it for uptime.
Data engineering in 2026 looks fundamentally different than it did two years ago. Eighty-one percent of respondents say their responsibilities have expanded to include AI-related work, from managing ML feature stores to optimizing AI query pipelines. The role isn't disappearing — it's becoming more strategic, more complex, and more critical.
The expanding scope of data engineering is both an opportunity and a risk. As responsibilities grow, so does the blast radius of failures. Teams need observability that scales with their expanding mandate — covering not just pipelines and warehouses, but AI workloads, cost optimization, upstream dependencies, and cross-functional SLAs.
My title hasn't changed, but my job has. Two years ago I was building ETL pipelines. Now I'm managing feature stores, optimizing AI queries, and explaining cost trade-offs to the CFO. I've never felt more important — or more overwhelmed.
Datadog Data Observability gives you end-to-end visibility — from upstream sources to AI workloads — so your team can detect issues before they cascade, control costs before they spiral, and deliver reliable data at the speed AI demands.
Explore Data Observability →This research was conducted via conversational AI-moderated surveys with qualified data engineering professionals. Respondents were screened for active involvement in data pipelines, modeling, architecture, or warehouse management. Data Analysts and Data Scientists were excluded to maintain practitioner focus. All percentages are calculated from unique respondents (sessions), not total mentions. Multi-select questions may total more than 100%.