OmniRAG-Agent: Agentic Omnimodal Reasoning for Low-Resource Long Audio-Video Question Answering
arXiv:2602.03707v4 Announce Type: replace Abstract: Long-horizon omnimodal question answering answers questions by reasoning over text, images, audio, and video. Despite recent progress on OmniLLMs, low-resource long audio-video QA still suffers from costly dense encoding, weak fine-grained retrieval, limited proactive planning, and no clear end-to-end optimization. To address these issues, we propose OmniRAG-Agent, an agentic omnimodal QA method for budgeted long audio-video reasoning. It builds an image-audio retrieval-augmented generation module that lets an OmniLLM fetch short, relevant fr — Yifan Zhu, Xinyu Mu, Tao Feng, Zhonghong Ou, Yuning Gong, Haoran Luo
View PDF HTML (experimental)
Abstract:Long-horizon omnimodal question answering answers questions by reasoning over text, images, audio, and video. Despite recent progress on OmniLLMs, low-resource long audio-video QA still suffers from costly dense encoding, weak fine-grained retrieval, limited proactive planning, and no clear end-to-end optimization. To address these issues, we propose OmniRAG-Agent, an agentic omnimodal QA method for budgeted long audio-video reasoning. It builds an image-audio retrieval-augmented generation module that lets an OmniLLM fetch short, relevant frames and audio snippets from external banks. Moreover, it uses an agent loop that plans, calls tools across turns, and merges retrieved evidence to answer complex queries. Furthermore, we apply group relative policy optimization to jointly improve tool use and answer quality over time. Experiments on OmniVideoBench, WorldSense, and Daily-Omni show that OmniRAG-Agent consistently outperforms prior methods under low-resource settings and achieves strong results, with ablations validating each component.
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2602.03707 [cs.CL]
(or arXiv:2602.03707v4 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2602.03707
arXiv-issued DOI via DataCite
Submission history
From: Xinyu Mu [view email] [v1] Tue, 3 Feb 2026 16:28:24 UTC (15,724 KB) [v2] Wed, 4 Feb 2026 03:33:14 UTC (15,724 KB) [v3] Sun, 22 Feb 2026 15:44:32 UTC (15,724 KB) [v4] Mon, 30 Mar 2026 11:14:57 UTC (15,726 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivData centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn - The Independent
<a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxQcVVnRFpzdEtnNVFmdll6VlViUUc5aUhkSzR4Wi1zOVNOMFo2TGtBcjZLR1ZnNVdmYUlPcDNrNW9oT3YzUFFSYlJjLUlLUmtQT1pWQzFxVWRnSXZjelJpaXoxTURrZGw0OFVMc2U5SGhyOVpEMnlnVmhrQ3R6VF9teFNPLTJ0c3JaNGJJeHRaR3ZmOGRFd0FMLVQ2ZHpTMm42NGc?oc=5" target="_blank">Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn</a> <font color="#6f6f6f">The Independent</font>
Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models - WSJ
<a href="https://news.google.com/rss/articles/CBMiuANBVV95cUxQelg3M0U0azc4TENIb2NHX09Ea1AtczN5T3ptb0lBS0g0MXdsbjBYVWNTc3RmZU1pQm5USjI0WWRNZjhGaVRtdmhhU01qQ1ZnUEZQN3B3QVFxek5BeWdRLU5EeDlJSEw3blJIOGVTSDR2dVl2RHpFTmd1dEpYdElxbmFNM1UyTzAxTm1wQmJOTk10ZE80VFgxVGJYUGdTbXFCa041VVhvZmVHLWMxTDVHaDlFdE8tSjIzVTZLY2dpVzlYRUROZ1JLMUhscFluQU44Y3ZKbDN0ZHUyeGpVNU5aTGtSaF9pM0YwVG1sd3p6S0V6OVc0WGZPQk1qOGY2UU5MUkJ6MHA5SmlaLUtURU5tQzFXZ2hVSnRNTHM3UWl5QmxYRkJiNDJkd1VYUFBWeG1mZFNEb0JtQl9SWUFwTU9IVnlfZWVLeTRTU25IZDRJM1pVQ3F1eFRIV1o0NUVveW8xRjFzNVQyQkdFOU5xdFhqZ0F3S3VJMHNNZHBPVEE1eUpTVTA3QUp3WFZKMk9CeDJUVWwyOWZBUDJkelpOQl9laUQ2QjVYRW1iYUU3OW1LMkRMSDJWQlRKRw?oc=5" target="_blank">Exclusive | Caltech Researchers Claim Radical Compression of High-Fidelity AI Models</a> <font color="#6f6f6f">WSJ</font>
Penn State Extension AI tool, Tilva, expands access to research-based guidance - The Pennsylvania State University
<a href="https://news.google.com/rss/articles/CBMiuAFBVV95cUxOX0prRHBaY0x3cnNKM3RnR3BuTmlBeW1xRE1wNFlMdEpZM2d2Uk9EMEM4MU5ONnkyNDVDbm9oYjVxRDNTZjZ2NzF3VUJvTWpsU2k2a1EtRDVaZjI5X3U2SEJraG4tN0JCLU4xaThNa2FtYnFZU0pSSkNkaGRYdEpaVlZYbXlmMUF4VzFkcHQtM052eE5sVG9wODA1dDRGUlNrWFRZenRmRU1DckNHNUg5blhCc0Jnby1Z?oc=5" target="_blank">Penn State Extension AI tool, Tilva, expands access to research-based guidance</a> <font color="#6f6f6f">The Pennsylvania State University</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn - The Independent
<a href="https://news.google.com/rss/articles/CBMiogFBVV95cUxQcVVnRFpzdEtnNVFmdll6VlViUUc5aUhkSzR4Wi1zOVNOMFo2TGtBcjZLR1ZnNVdmYUlPcDNrNW9oT3YzUFFSYlJjLUlLUmtQT1pWQzFxVWRnSXZjelJpaXoxTURrZGw0OFVMc2U5SGhyOVpEMnlnVmhrQ3R6VF9teFNPLTJ0c3JaNGJJeHRaR3ZmOGRFd0FMLVQ2ZHpTMm42NGc?oc=5" target="_blank">Data centers are creating ‘heat islands’ on land around them – warming them by up to 16 degrees, researchers warn</a> <font color="#6f6f6f">The Independent</font>
Researchers to use robotics and AI to help sheep producers - University of Nevada, Reno
<a href="https://news.google.com/rss/articles/CBMic0FVX3lxTFB4UmxpREpFODBJN0lKakYwRVVtdlZPNmNiTExRelVFaDYzYW9kX2RCc0pEZjlmX01fT1dWYTlxZE1ET2ZKVVgzSVZIenY3bDlHa3FXS1dUdVBmTEdLa1hUR2x3OWxHbkE2RnROSjl6VHVHQ2c?oc=5" target="_blank">Researchers to use robotics and AI to help sheep producers</a> <font color="#6f6f6f">University of Nevada, Reno</font>
AIRA_2: Breaking Bottlenecks In AI Research Agents - Forbes
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxNNmtndHhmQ2lpZGdPdTJwY25xejcyV1c1SWNLdWFOWnNwbjRUQTF0ZWdOZFNaclNBNWVsaUgtU0JUM2xrakhoOXVLMVJzVTNkajdrMmJGeS1lYUpMUG1NMkZNMDJFREZZdXU2ZVdEbkNZSDNBRjJBLVYyZE9XeEY4T0RJY3J5aDVWcEZVQ2lWUjhUYXBsUk16d09NdGdsQ3lxb3gw?oc=5" target="_blank">AIRA_2: Breaking Bottlenecks In AI Research Agents</a> <font color="#6f6f6f">Forbes</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!