The hidden reason your AI assistant feels so sluggish

The New Stackby Alasdair BrownApril 3, 20265 min read1 views

AI workloads are exposing a mismatch in how most teams have built their data platforms. You see it whether you The post The hidden reason your AI assistant feels so sluggish appeared first on The New Stack .

AI workloads are exposing a mismatch in how most teams have built their data platforms.

You see it whether you are building agentic apps, shipping conversational analytics, or using AI to speed up incident response. Suddenly, the database has to handle far more concurrent queries, return answers in well under a second, and retain much more granular data for much longer. Systems built for batch reporting and periodic dashboards start to look out of step with the job at hand.

This is already happening across three areas that are converging faster than most teams expected: application development, business analytics, and observability.

Agents don’t query like humans

The move from human-driven to agent-driven analytics may be the biggest shift in database workload patterns in the last decade.

When a person asks a question in natural language, the model behind the scenes usually does not issue one tidy SQL query. It can trigger dozens in rapid succession while it explores the schema, tests different paths, and reasons through multiple possibilities in parallel. One prompt turns into a burst of concurrent queries. Analyst workloads start to resemble customer-facing production traffic: high concurrency, low latency, interactive response times.

“The move from human-driven to agent-driven analytics may be the biggest shift in database workload patterns in the last decade.”

That breaks some of the assumptions on which traditional cloud data warehouses were built. Those systems are generally optimized for throughput on relatively infrequent, heavyweight queries, not thousands of short, concurrent ones. Put AI analyst workloads on top of that architecture, and you usually end up in one of two bad places: latency that makes the assistant feel sluggish, or costs that rise faster than the value you are getting.

This is why real-time analytical databases built for interactive workloads are starting to look less like a nice-to-have and more like the natural fit. The rise of MCP servers that expose databases directly to agents, analytics bots in Slack, and open-source agentic architectures gives a pretty clear picture of what production agentic analytics looks like in practice: natural language in, SQL out, answers back in seconds, with the database quietly handling the concurrency.

Postgres + OLAP Is becoming the default

One of the clearest signals in the market is the growing consensus around a simple architecture: Postgres for transactions, paired with a columnar OLAP engine for analytics. GitLab described this pattern back in 2022, and it has increasingly become the default open-source stack for scaling agentic AI applications.

Postgres handles row-oriented transactional workloads. ClickHouse, or another columnar engine, handles the analytical side: fast ingestion, sub-second queries across very large datasets, and the concurrency that AI-powered features demand.

AI makes this architecture feel less optional and more urgent. Features like AI-generated insights, natural-language product interfaces, and autonomous analysis all depend on a much tighter loop between transactional writes and analytical reads. The closer the integration between those two layers, the faster teams can ship useful products rather than fight their plumbing.

Observability runs into the same problem

Observability is running into the same architectural problem.

The classic three-pillar model, with metrics, logs, and traces stored separately, was shaped by an era when storage was expensive, and query patterns were more predictable. AI-driven SRE workflows do not fit that model very well.

They need granular, high-cardinality data with long retention so an agent can triage incidents, correlate signals, and work backward to a root cause. Sampled logs and aggressively rolled-up metrics are a poor substrate for that kind of reasoning. If an AI agent is trying to connect an error spike to a deployment event from three days earlier, the real constraint is often not the model, but the missing data.

This is the shift Charity Majors has described as Observability 2.0: wide, structured events in a columnar engine, with metrics and traces derived at query time rather than precomputed in advance. A growing number of modern observability companies have moved in this direction. Traditional vendors are stuck with an uncomfortable tradeoff: their per-GB pricing pushes customers to ingest less data, which is the opposite of what AI-heavy workflows need.

Two categories and one set of requirements

For years, observability and data warehousing were treated as separate categories, with different buyers, budgets, and tooling. Technically, though, they are starting to look a lot alike.

Both write into object storage. Both need low-latency, high-concurrency queries. Both are layering on AI-driven analysis. And the underlying data overlaps more than most teams assume. API calls can also be purchased. Errors can also be failed transactions. Open table formats like Iceberg are making this convergence much more practical, with columnar databases serving as the fast query layer on top.

The cost of waiting is going up

The database market is being redrawn around the requirements AI workloads impose: high concurrency, real-time performance, full-fidelity retention, and direct accessibility for agents.

Columnar analytical databases built for interactive workloads are in a strong position because those requirements line up with what they were designed to do. But the bigger point is architectural, not just vendor-specific.

“The cost of migrating off legacy platforms is real, but finite. The cost of spending the next five years on a platform that cannot handle agentic query volumes is not.”

Teams will need tight integration between transactional and analytical systems, as in the Postgres + OLAP pattern. They will need native agent interfaces, such as MCP, so AI systems can access the data without layers of bespoke glue code. And they will need LLM observability tooling to trace, evaluate, and govern agent behavior in production.

The cost of migrating off legacy platforms is real, but finite. The cost of spending the next five years on a platform that cannot handle agentic query volumes is not.

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

platformassistant

ProductsLive

A Reasoning Log: What Happens When Integration Fails Honestly

This is a log of a language model running through a structured reasoning cycle on a deliberately difficult question. The structure has eleven levels. The interesting part is not the final answer — it is what happens at the integration point. The question chosen for this run: "Why, in the modern world, despite unprecedented access to information, knowledge, and technology, do depth of understanding and wisdom not grow on average — and in many respects actually decline?" This question was selected because it carries genuine tension between two parallel streams: the facts (information abundance, attention economy, algorithmic amplification) and the values (what it actually means for understanding to deepen). That tension is what makes it a useful test. The structure The reasoning cycle separa

DEV Community

7m23 minutes ago

ModelsFresh

Efficient3D: A Unified Framework for Adaptive and Debiased Token Reduction in 3D MLLMs

arXiv:2604.02689v1 Announce Type: new Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have expanded reasoning capabilities into 3D domains, enabling fine-grained spatial understanding. However, the substantial size of 3D MLLMs and the high dimensionality of input features introduce considerable inference overhead, which limits practical deployment on resource constrained platforms. To overcome this limitation, this paper presents Efficient3D, a unified framework for visual token pruning that accelerates 3D MLLMs while maintaining competitive accuracy. The proposed framework introduces a Debiased Visual Token Importance Estimator (DVTIE) module, which considers the influence of shallow initial layers during attention aggregation, thereby producing more reliable importa

arXiv cs.CV

2mabout 3 hours ago

ModelsFresh

Cross-Vehicle 3D Geometric Consistency for Self-Supervised Surround Depth Estimation on Articulated Vehicles

arXiv:2604.02639v1 Announce Type: new Abstract: Surround depth estimation provides a cost-effective alternative to LiDAR for 3D perception in autonomous driving. While recent self-supervised methods explore multi-camera settings to improve scale awareness and scene coverage, they are primarily designed for passenger vehicles and rarely consider articulated vehicles or robotics platforms. The articulated structure introduces complex cross-segment geometry and motion coupling, making consistent depth reasoning across views more challenging. In this work, we propose \textbf{ArticuSurDepth}, a self-supervised framework for surround-view depth estimation on articulated vehicles that enhances depth learning through cross-view and cross-vehicle geometric consistency guided by structural priors fr

arXiv cs.CV

1mabout 3 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 222 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Products

ProductsLive

Why Some AI Feels “Process-Obsessed” While Others Just Ship Code

I ran a simple experiment. Same codebase. One AI rated it 9/10 production-ready . Another rated it 5/10 . At first, it looks like one of them is wrong. But the difference is not accuracy — it’s philosophy. Two Types of AI Behavior 1. Process-Driven (Audit Mindset) Focus: edge cases, failure modes, scalability Conservative scoring Assumes production = survives real-world stress 2. Outcome-Driven (Delivery Mindset) Focus: working solution, completeness Generous scoring Assumes production = can be shipped What’s Actually Happening Both are correct — under different assumptions. One asks: “Will this break in production?” The other asks: “Does this solve the problem?” You’re not comparing quality. You’re comparing evaluation lenses . Failure Modes Process-driven systems Over-analysis Slower shi

DEV Community

1m14 minutes ago

Products

The convergence of FinTech and artificial intelligence: Driving efficiency and trust in financial services - cio.economictimes.indiatimes.com

The convergence of FinTech and artificial intelligence: Driving efficiency and trust in financial services cio.economictimes.indiatimes.com

GNews AI finance

1m2 months ago

ProductsFresh

Digital Project Abandonment Crisis: Deadweight Loss in Plain Sight

The most cited figure in startup failure research comes from the U.S. Bureau of Labor Statistics: roughly 20% of businesses fail in their first year, and about 65% within ten years. For technology companies specifically, CB Insights' analysis of over 110 startup post-mortems found that 42% failed because there was no genuine market need for what they built. Running out of cash was second at 29% — but as the report noted, cash problems typically trail the market need problem by months. The technology sector fails at higher rates than the broader business population. Approximately 63% of tech businesses fail within five years. For software-as-a-service companies specifically, the dynamics are similar: roughly 90% of SaaS startups fail to reach sustainable scale. None of these numbers are enc

Hackernoon AI

1mabout 3 hours ago

ProductsFresh

How to Build a Voice Agent With AssemblyAI

This tutorial shows you how to build a complete voice agent that can have natural conversations with users. You'll create an application that listens to speech, processes it with AI, and responds back with voice—handling the full conversation loop in real-time. Read All

Hackernoon AI

1mabout 3 hours ago