Research Papers research paper arxiv computer-vision image-recognition

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

arXivMarch 31, 20262 min read1 views

arXiv:2512.13874v2 Announce Type: replace Abstract: As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, one would expect video reasoning models to reason flexibly across different durations. However, SOTA models are still trained to predict answers in a single turn while processing a large number of frames, akin to watching an entire long video, requiring significant resources. This raises the question: Is it possible to develop performant any-horizon video r — Jitesh Jain, Jialuo Li, Zixian Ma, Jieyu Zhang, Chris Dongjoo Kim, Sangho Lee, Rohun Tripathi, Tanmay Gupta, Christopher Clark, Humphrey Shi

View PDF HTML (experimental)

Abstract:As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, one would expect video reasoning models to reason flexibly across different durations. However, SOTA models are still trained to predict answers in a single turn while processing a large number of frames, akin to watching an entire long video, requiring significant resources. This raises the question: Is it possible to develop performant any-horizon video reasoning systems? Inspired by human behavior, we first propose SAGE, an agent system that performs multi-turn reasoning on long videos while handling simpler problems in a single turn. Secondly, we introduce an easy synthetic data generation pipeline using Gemini-2.5-Flash to train the orchestrator, SAGE-MM, which lies at the core of SAGE. We further propose an effective RL post-training recipe essential for instilling any-horizon reasoning ability in SAGE-MM. Thirdly, we curate SAGE-Bench with an average duration of greater than 700 seconds for evaluating video reasoning ability in real-world entertainment use cases. Lastly, we empirically validate the effectiveness of our system, data, and RL recipe, observing notable improvements of up to 6.1% on open-ended video reasoning tasks, as well as an impressive 8.2% improvement on videos longer than 10 minutes.

Comments: Project Page: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2512.13874 [cs.CV]

(or arXiv:2512.13874v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2512.13874

arXiv-issued DOI via DataCite

Submission history

From: Jitesh Jain [view email] [v1] Mon, 15 Dec 2025 20:14:19 UTC (6,108 KB) [v2] Sun, 29 Mar 2026 19:34:05 UTC (6,110 KB)

Original source

arXiv

https://arxiv.org/abs/2512.13874

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research Papers

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play Smithsonian Magazine

GNews AI Netherlands

1mabout 1 month ago

ProductsLive

How to Build an AI Content Playbook That Actually Protects Your Voice

Ahnii! You've read the articles warning you not to let AI take over your content. Ruth Doherty's latest piece is one of the best: a clear-eyed breakdown of where AI helps and where it silently destroys your brand. This post shows you how to take that framework and turn it into an actual operating document for your content pipeline. Why a Framework Without a Playbook Doesn't Stick Ruth's core argument is sharp: AI is an efficiency engine, not a strategy engine. Use it for research, structuring, repurposing, and editing. Keep it away from messaging, customer research, and anything that requires your actual point of view. That distinction is easy to agree with. It's harder to enforce on a Tuesday afternoon when you're behind on three social posts and the AI can draft all of them in 90 seconds

Dev.to AI

6m27 minutes ago

CountriesFresh

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties - International Business Times Australia

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties International Business Times Australia

Google News: Machine Learning

1mabout 6 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 149 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning

Submission history

Daily AI Digest

More about

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

How to Build an AI Content Playbook That Actually Protects Your Voice

Top 10 Best Universities to Study AI in USA 2026 Led by CMU and MIT With Strong Research and Industry Ties - International Business Times Australia

Knowledge Map

Connected Articles — Knowledge Graph

Discussion

More in Research Papers

This Ancient Roman Game Board Was a Mystery. Researchers Used A.I. to Figure Out How to Play - Smithsonian Magazine

URI Day Highlights Student Research and the Future of AI Education in Rhode Island - uri.edu

AI could transform patient education in eye care, new research shows - Medical Xpress

🥇Top AI Papers of the Week