Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessIs cutting ‘please’ and ‘thank you’ when talking to ChatGPT better for the planet? An expert explains - The IndependentGoogle News: ChatGPTOpenAI CEO and CFO Diverge on IPO Timing - The InformationGoogle News: OpenAII built a faster alternative to cp and rsync — here's how it worksDEV CommunityThe Service Layer: Where Separate Components Become a SystemDEV Community🚀Playwright vs Selenium in 2026: The Ultimate Guide for Modern Test AutomationDEV CommunityBuilding a Decentralized Mesh Network in Rust — Lessons from the Global SouthDEV CommunitySocratic AI: how I learned formal grammars (and built a compiler) without losing control of what I was buildingDEV CommunityOpenAI Is Making Microsoft and Ashton Kutcher Incredibly Rich - inc.comGoogle News: OpenAIQodo vs Tabnine: AI Coding Assistants Compared (2026)DEV CommunityShielding Your LLMs: A Deep Dive into Prompt Injection & Jailbreak DefenseDEV CommunityI Connected 12 MCP Servers to Amazon Q. Here's What BrokeDEV CommunityHow to Publish a Power BI Report and Embed It on a WebsiteDEV CommunityBlack Hat USADark ReadingBlack Hat AsiaAI BusinessIs cutting ‘please’ and ‘thank you’ when talking to ChatGPT better for the planet? An expert explains - The IndependentGoogle News: ChatGPTOpenAI CEO and CFO Diverge on IPO Timing - The InformationGoogle News: OpenAII built a faster alternative to cp and rsync — here's how it worksDEV CommunityThe Service Layer: Where Separate Components Become a SystemDEV Community🚀Playwright vs Selenium in 2026: The Ultimate Guide for Modern Test AutomationDEV CommunityBuilding a Decentralized Mesh Network in Rust — Lessons from the Global SouthDEV CommunitySocratic AI: how I learned formal grammars (and built a compiler) without losing control of what I was buildingDEV CommunityOpenAI Is Making Microsoft and Ashton Kutcher Incredibly Rich - inc.comGoogle News: OpenAIQodo vs Tabnine: AI Coding Assistants Compared (2026)DEV CommunityShielding Your LLMs: A Deep Dive into Prompt Injection & Jailbreak DefenseDEV CommunityI Connected 12 MCP Servers to Amazon Q. Here's What BrokeDEV CommunityHow to Publish a Power BI Report and Embed It on a WebsiteDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

Towards Robustness: A Critique of Current Vector Database Assessments

arXiv cs.DBby [Submitted on 1 Jul 2025 (v1), last revised 2 Apr 2026 (this version, v2)]April 3, 20262 min read1 views
Source Quiz

arXiv:2507.00379v2 Announce Type: replace Abstract: Vector databases are critical infrastructure in AI systems, and average recall is the dominant metric for their evaluation. Both users and researchers rely on it to choose and optimize their systems. We show that relying on average recall is problematic. It hides variability across queries, allowing systems with strong mean performance to underperform significantly on hard queries. These tail cases confuse users and can lead to failure in downstream applications such as RAG. We argue that robustness consistently achieving acceptable recall across queries is crucial to vector database evaluation. We propose Robustness-$\delta$@K, a new metric that captures the fraction of queries with recall above a threshold $\delta$. This metric offers a

View PDF HTML (experimental)

Abstract:Vector databases are critical infrastructure in AI systems, and average recall is the dominant metric for their evaluation. Both users and researchers rely on it to choose and optimize their systems. We show that relying on average recall is problematic. It hides variability across queries, allowing systems with strong mean performance to underperform significantly on hard queries. These tail cases confuse users and can lead to failure in downstream applications such as RAG. We argue that robustness consistently achieving acceptable recall across queries is crucial to vector database evaluation. We propose Robustness-$\delta$@K, a new metric that captures the fraction of queries with recall above a threshold $\delta$. This metric offers a deeper view of recall distribution, helps vector index selection regarding application needs, and guides the optimization of tail performance. We integrate Robustness-$\delta$@K into existing benchmarks and evaluate mainstream vector indexes, revealing significant robustness differences. More robust vector indexes yield better application performance, even with the same average recall. We also identify design factors that influence robustness, providing guidance for improving real-world performance.

Subjects:

Databases (cs.DB)

Cite as: arXiv:2507.00379 [cs.DB]

(or arXiv:2507.00379v2 [cs.DB] for this version)

https://doi.org/10.48550/arXiv.2507.00379

arXiv-issued DOI via DataCite

Submission history

From: Zikai Wang [view email] [v1] Tue, 1 Jul 2025 02:27:57 UTC (255 KB) [v2] Thu, 2 Apr 2026 00:55:48 UTC (203 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

benchmarkannounceapplication

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Towards Rob…benchmarkannounceapplicationvaluationarxivresearcharXiv cs.DB

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 193 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Products