Research Report

The Data Quality Crisis in the Age of AI

How data engineering teams are navigating upstream failures, AI-driven transformation, and the collapse of traditional BI — and what it means for observability.

150 Data Engineers Surveyed Q1 2026 Datadog Research
AI is reshaping data engineering faster than teams can adapt. Our survey of 150 data engineers across industries reveals a discipline at an inflection point: data quality remains the single largest barrier to AI adoption, upstream failures are the root cause most teams can't control, and traditional BI is losing ground to AI-powered analytics — yet observability tooling hasn't kept pace with the complexity these shifts demand.
43%
Say data quality is the #1 barrier to AI adoption
64%
Report data issues originate upstream of the data team
73%
Now use AI to answer questions that once went through BI
1

43% Say Data Quality Is the #1 Barrier to AI Adoption

Despite massive investment in AI capabilities, data engineering teams say the fundamentals are still broken. Forty-three percent of respondents named data quality as the single biggest barrier preventing their organizations from fully leveraging AI — outpacing budget constraints, talent gaps, and tooling limitations by wide margins.

Top Barriers to AI Adoption

Has Poor Data Quality Blocked an AI Initiative?

Key Insight

Data quality isn't a new challenge — but AI has raised the stakes dramatically. Models amplify the impact of bad data, turning small inconsistencies into compounding errors at scale. Teams that previously tolerated "good enough" data are now discovering that AI demands a far higher quality bar than traditional analytics ever did.

We spent three months building an ML pipeline, and it all fell apart because our source data had silent schema drift nobody caught. The model was technically working — just on garbage inputs.

— Senior Data Engineer, Financial Services (5,000+ employees)

Most Common Data Quality Challenges (Multi-Select)

2

64% of Data Issues Start Upstream — Outside the Data Team's Control

When data breaks, data teams are the ones who hear about it. But 64% of respondents say the root cause of most issues lies upstream — in source systems, application teams, and third-party data feeds that data engineers have limited ability to control or monitor.

Where Do Data Issues Originate?

Most Common Upstream Culprits (Multi-Select)

Key Insight

This ownership-vs-control gap is the defining frustration of modern data engineering. Teams are accountable for data reliability but lack visibility into the systems where problems start. Without end-to-end observability that extends upstream into source applications and ingestion layers, data engineers are stuck playing detective after the damage is done.

We're basically the canary in the coal mine. By the time we know something broke upstream, three dashboards are already wrong and the CFO is asking questions.

— Data Platform Engineer, Retail / E-Commerce (1,000–5,000 employees)

How Teams Currently Detect Pipeline Issues (Multi-Select)

Honestly, our most reliable monitoring system is a Slack message from an analyst saying 'the numbers look weird today.' That's not a process — that's a prayer.

— Analytics Engineer, Technology / Software (Under 1,000 employees)
3

AI Is Eating BI — And Reshaping What Data Teams Deliver

Traditional BI is losing its monopoly on business intelligence. Nearly three-quarters of respondents say their organizations now use AI to answer questions that would have gone through dashboards and reports just a year ago — and 58% are seeing measurable declines in traditional BI tool usage.

Is AI Replacing Traditional BI Queries?

Change in BI Tool Usage Over Past 12 Months

Key Insight

The shift from BI to AI-powered analytics isn't just about new tools — it fundamentally changes what data teams need to deliver. When AI is querying your warehouse directly, the tolerance for stale, inconsistent, or undocumented data drops to zero. Observability becomes the prerequisite, not the afterthought.

Questions Now Routed to AI Instead of BI (Multi-Select)

Teams Deploying AI on Their Data Warehouse / Lake

Our product managers used to open Looker five times a day. Now they ask the AI assistant and it hits the warehouse directly. It's faster, but it means our data has to be perfect — there's no dashboard logic to mask the mess anymore.

— Data Architect, Technology / Software (5,000+ employees)
4

92% Are Concerned About the Cost & Performance Impact of AI

The enthusiasm for AI is real — but so is the concern about what it costs to run. A striking 92% of data engineers expressed some level of concern about the cost and performance impact of integrating AI into their data infrastructure, making it the most universally shared anxiety in the survey.

Level of Concern About AI's Cost & Performance Impact

Does Your Team Feel Equipped for AI Demands?

Key Insight

Cost concerns don't mean teams want to slow down — they want to move fast with visibility. The gap isn't ambition; it's observability. Without clear insight into query costs, compute utilization, and performance bottlenecks across their AI workloads, teams are flying blind through the most expensive infrastructure decisions they've ever made.

We turned on an AI feature that hit our Snowflake bill like a freight train. Nobody had visibility into the query patterns until the invoice came. We need observability for cost the same way we have it for uptime.

— Lead Data Engineer, Healthcare / Life Sciences (5,000+ employees)

Do Data Engineers Feel More or Less Valuable Than a Year Ago?

5

The Role Is Evolving — And Data Engineers Know It

Data engineering in 2026 looks fundamentally different than it did two years ago. Eighty-one percent of respondents say their responsibilities have expanded to include AI-related work, from managing ML feature stores to optimizing AI query pipelines. The role isn't disappearing — it's becoming more strategic, more complex, and more critical.

New Responsibilities Added in Past 12 Months (Multi-Select)

Skills Data Engineers Say They Need to Develop (Multi-Select)

Key Insight

The expanding scope of data engineering is both an opportunity and a risk. As responsibilities grow, so does the blast radius of failures. Teams need observability that scales with their expanding mandate — covering not just pipelines and warehouses, but AI workloads, cost optimization, upstream dependencies, and cross-functional SLAs.

My title hasn't changed, but my job has. Two years ago I was building ETL pipelines. Now I'm managing feature stores, optimizing AI queries, and explaining cost trade-offs to the CFO. I've never felt more important — or more overwhelmed.

— Staff Data Engineer, Financial Services (1,000–5,000 employees)

See Every Signal Across Your Entire Data Stack

Datadog Data Observability gives you end-to-end visibility — from upstream sources to AI workloads — so your team can detect issues before they cascade, control costs before they spiral, and deliver reliable data at the speed AI demands.

Explore Data Observability →

Methodology

150
Qualified Respondents
Q1 2026
Survey Period
12
Open-Ended Questions
6
Screener Questions

This research was conducted via conversational AI-moderated surveys with qualified data engineering professionals. Respondents were screened for active involvement in data pipelines, modeling, architecture, or warehouse management. Data Analysts and Data Scientists were excluded to maintain practitioner focus. All percentages are calculated from unique respondents (sessions), not total mentions. Multi-select questions may total more than 100%.

Respondent Roles

Industries Represented

Company Size

Region

Note: Because survey questions were open-ended and conversational, not all respondents addressed every topic. Per-question sample sizes vary. Percentages for behavioral and attitudinal questions are based on the number of respondents who discussed that specific topic, noted where applicable. All data in this report is synthetic and for demonstration purposes only.
0