Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfakeOracle Lays Off Thousands to Offset AI SpendingGizmodoFranklin Templeton agrees to acquire CoinFund spinoff 250 Digital to form Franklin Crypto, which will offer strategies designed for institutional investors (Vicky Ge Huang/Wall Street Journal)TechmemeDeveloper’s Guide to Building ADK Agents with SkillsGoogle Developers BlogUMW Inaugural AI Expert-in-Residence Shares Insight on Technology’s ‘Tremendous’ Impact - University of Mary WashingtonGoogle News: AISpaceX Said to File Confidentially for IPO Before AI RivalsBloomberg TechnologyCargill Wins 2026 BIG Artificial Intelligence Excellence Award - foodmarket.comGoogle News: AIWhen machines judge without knowing: AI, augmentation and the limits of automated cybersecurity decisions - IAPPGNews AI cybersecurityMeet the Agentic AI Design-to-Source Workspace for PLM: From CAD to Confident Sourcing Decisions - Oracle BlogsGNews AI agenticYouTube blasted by hundreds of experts over ‘AI slop’ videos served up to kidsFast Company TechApono Uses Gamified AI Security Exercise to Engage Cloud Security Community - TipRanksGoogle News: AI SafetyBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessBoston Becomes First Major District to Bring AI Literacy Into Classrooms - GoverningGoogle News: AIHow payment fraud evolved from ancient Roman coins to AI-deepfakes — and what's next - The Business JournalsGNews AI deepfakeOracle Lays Off Thousands to Offset AI SpendingGizmodoFranklin Templeton agrees to acquire CoinFund spinoff 250 Digital to form Franklin Crypto, which will offer strategies designed for institutional investors (Vicky Ge Huang/Wall Street Journal)TechmemeDeveloper’s Guide to Building ADK Agents with SkillsGoogle Developers BlogUMW Inaugural AI Expert-in-Residence Shares Insight on Technology’s ‘Tremendous’ Impact - University of Mary WashingtonGoogle News: AISpaceX Said to File Confidentially for IPO Before AI RivalsBloomberg TechnologyCargill Wins 2026 BIG Artificial Intelligence Excellence Award - foodmarket.comGoogle News: AIWhen machines judge without knowing: AI, augmentation and the limits of automated cybersecurity decisions - IAPPGNews AI cybersecurityMeet the Agentic AI Design-to-Source Workspace for PLM: From CAD to Confident Sourcing Decisions - Oracle BlogsGNews AI agenticYouTube blasted by hundreds of experts over ‘AI slop’ videos served up to kidsFast Company TechApono Uses Gamified AI Security Exercise to Engage Cloud Security Community - TipRanksGoogle News: AI Safety

OmniRAG-Agent: Agentic Omnimodal Reasoning for Low-Resource Long Audio-Video Question Answering

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2602.03707v4 Announce Type: replace Abstract: Long-horizon omnimodal question answering answers questions by reasoning over text, images, audio, and video. Despite recent progress on OmniLLMs, low-resource long audio-video QA still suffers from costly dense encoding, weak fine-grained retrieval, limited proactive planning, and no clear end-to-end optimization. To address these issues, we propose OmniRAG-Agent, an agentic omnimodal QA method for budgeted long audio-video reasoning. It builds an image-audio retrieval-augmented generation module that lets an OmniLLM fetch short, relevant fr — Yifan Zhu, Xinyu Mu, Tao Feng, Zhonghong Ou, Yuning Gong, Haoran Luo

View PDF HTML (experimental)

Abstract:Long-horizon omnimodal question answering answers questions by reasoning over text, images, audio, and video. Despite recent progress on OmniLLMs, low-resource long audio-video QA still suffers from costly dense encoding, weak fine-grained retrieval, limited proactive planning, and no clear end-to-end optimization. To address these issues, we propose OmniRAG-Agent, an agentic omnimodal QA method for budgeted long audio-video reasoning. It builds an image-audio retrieval-augmented generation module that lets an OmniLLM fetch short, relevant frames and audio snippets from external banks. Moreover, it uses an agent loop that plans, calls tools across turns, and merges retrieved evidence to answer complex queries. Furthermore, we apply group relative policy optimization to jointly improve tool use and answer quality over time. Experiments on OmniVideoBench, WorldSense, and Daily-Omni show that OmniRAG-Agent consistently outperforms prior methods under low-resource settings and achieves strong results, with ablations validating each component.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2602.03707 [cs.CL]

(or arXiv:2602.03707v4 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2602.03707

arXiv-issued DOI via DataCite

Submission history

From: Xinyu Mu [view email] [v1] Tue, 3 Feb 2026 16:28:24 UTC (15,724 KB) [v2] Wed, 4 Feb 2026 03:33:14 UTC (15,724 KB) [v3] Sun, 22 Feb 2026 15:44:32 UTC (15,724 KB) [v4] Mon, 30 Mar 2026 11:14:57 UTC (15,726 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
OmniRAG-Age…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 87 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers