Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessChinese AI rivals clash over Anthropic’s OpenClaw exit amid global token crunchSCMP Tech (Asia AI)India turns to Iran for oil and gas after 7-year hiatus, signaling limits to U.S. tiltCNBC TechnologyAirAsia X hikes ticket prices by 40%, cut capacity by 10% as Iran war hits fuel costsSCMP Tech (Asia AI)YouTube blokkeert Nvidia s DLSS 5-video na auteursclaim Italiaanse tv-zenderTweakers.netWhat are the differences between pipelines and models in Hugging Face?discuss.huggingface.coAI Mastery Course in Telugu: Hands-On Training with Real ProjectsDev.to AIHow I'm Running Autonomous AI Agents That Actually Earn USDCDev.to AIUnderstanding NLP Token Classification: From Basics to Real-World ApplicationsMedium AIGPT-5.4 Scored 75% on a Test That Measures Real Human Work. My Data Team Scored 72%.Medium AIBizNode Workflow Marketplace: chain multiple bot handles into multi-step pipelines. Client onboarding, contract-to-payment,...Dev.to AITop Artificial Intelligence Development Companies in Dubai, UAE (2026 Edition)Medium AIЯ потратил месяц на AI-инструменты и удалил половину из нихDev.to AI
AI NEWS HUBbyEIGENVECTOREigenvector

From BM25 to Corrective RAG: Benchmarking Retrieval Strategies for Text-and-Table Documents

arXivApril 2, 20262 min read1 views
Source Quiz

Retrieval-Augmented Generation (RAG) systems critically depend on retrieval quality, yet no systematic comparison of modern retrieval methods exists for heterogeneous documents containing both text and tabular data. We benchmark ten retrieval strategies spanning sparse, dense, hybrid fusion, cross-encoder reranking, query expansion, index augmentation, and adaptive retrieval on a challenging financial QA benchmark of 23,088 queries over 7,318 documents with mixed text-and-table content. We evaluate retrieval quality via Recall@k, MRR, and nDCG, and end-to-end generation quality via Number Matc — Meftun Akarsu, Recep Kaan Karaman, Christopher Mierbach

View PDF HTML (experimental)

Abstract:Retrieval-Augmented Generation (RAG) systems critically depend on retrieval quality, yet no systematic comparison of modern retrieval methods exists for heterogeneous documents containing both text and tabular data. We benchmark ten retrieval strategies spanning sparse, dense, hybrid fusion, cross-encoder reranking, query expansion, index augmentation, and adaptive retrieval on a challenging financial QA benchmark of 23,088 queries over 7,318 documents with mixed text-and-table content. We evaluate retrieval quality via Recall@k, MRR, and nDCG, and end-to-end generation quality via Number Match, with paired bootstrap significance testing. Our results show that (1) a two-stage pipeline combining hybrid retrieval with neural reranking achieves Recall@5 of 0.816 and MRR@3 of 0.605, outperforming all single-stage methods by a large margin; (2) BM25 outperforms state-of-the-art dense retrieval on financial documents, challenging the common assumption that semantic search universally dominates; and (3) query expansion methods (HyDE, multi-query) and adaptive retrieval provide limited benefit for precise numerical queries, while contextual retrieval yields consistent gains. We provide ablation studies on fusion methods and reranker depth, actionable cost-accuracy recommendations, and release our full benchmark code.

Comments: 11 pages, 6 figures, 6 tables

Subjects:

Information Retrieval (cs.IR); Computation and Language (cs.CL)

MSC classes: H.3.3, H.3.4

Cite as: arXiv:2604.01733 [cs.IR]

(or arXiv:2604.01733v1 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2604.01733

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Recep Kaan Karaman [view email] [v1] Thu, 2 Apr 2026 07:53:40 UTC (94 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
From BM25 t…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 313 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers