Research Papers research paper arxiv computer-vision image-recognition

VIRST: Video-Instructed Reasoning Assistant for SpatioTemporal Segmentation

arXivMarch 31, 20262 min read0 views

arXiv:2603.27060v1 Announce Type: new Abstract: Referring Video Object Segmentation (RVOS) aims to segment target objects in videos based on natural language descriptions. However, fixed keyframe-based approaches that couple a vision language model with a separate propagation module often fail to capture rapidly changing spatiotemporal dynamics and to handle queries requiring multi-step reasoning, leading to sharp performance drops on motion-intensive and reasoning-oriented videos beyond static RVOS benchmarks. To address these limitations, we propose VIRST (Video-Instructed Reasoning Assistan — Jihwan Hong, Jaeyoung Do

View PDF HTML (experimental)

Abstract:Referring Video Object Segmentation (RVOS) aims to segment target objects in videos based on natural language descriptions. However, fixed keyframe-based approaches that couple a vision language model with a separate propagation module often fail to capture rapidly changing spatiotemporal dynamics and to handle queries requiring multi-step reasoning, leading to sharp performance drops on motion-intensive and reasoning-oriented videos beyond static RVOS benchmarks. To address these limitations, we propose VIRST (Video-Instructed Reasoning Assistant for Spatio-Temporal Segmentation), an end-to-end framework that unifies global video reasoning and pixel-level mask prediction within a single model. VIRST bridges semantic and segmentation representations through the Spatio-Temporal Fusion (STF), which fuses segmentation-aware video features into the vision-language backbone, and employs the Temporal Dynamic Anchor Updater to maintain temporally adjacent anchor frames that provide stable temporal cues under large motion, occlusion, and reappearance. This unified design achieves state-of-the-art results across diverse RVOS benchmarks under realistic and challenging conditions, demonstrating strong generalization to both referring and reasoning oriented settings. The code and checkpoints are available at this https URL.

Comments: CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.27060 [cs.CV]

(or arXiv:2603.27060v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.27060

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Jihwan Hong [view email] [v1] Sat, 28 Mar 2026 00:34:15 UTC (27,536 KB)

Original source

arXiv

https://arxiv.org/abs/2603.27060

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ModelsRecent

Anthropic to all AI companies: Our research tells that all LLMs sometimes act like they have emotion, so - The Times of India

Anthropic to all AI companies: Our research tells that all LLMs sometimes act like they have emotion, so The Times of India

Google News: Claude

1mabout 20 hours ago

ProductsLive

"Be Anything You Want" — OK, Here's How (Technically)

This is a submission for the DEV April Fools Challenge What I Built "I Want To Be..." is a life advice generator that takes your dreams and fulfills them — literally. Want to be rich? Change your name to Richard. Want to be a ninja? Wear all black and move slightly too quietly. People will get the idea. Want to be a cat? Knock something off a table and maintain eye contact. Cat energy. It's a genie who passed the bar exam for loopholes. You asked, we delivered. Technically. 44 categories of deadpan, literally-correct life hacks — from "astronaut" to "wizard" to "left alone" — plus 24 universal fallback answers for the truly original dreamers. Every answer is technically true. None of them are helpful. All of them are stamped 100% LEGIT ADVICE . Demo Try it live on GitHub Pages Type in your

DEV Community

4m35 minutes ago

Countries

AI In Robotics - New Position Paper - International Federation of Robotics

AI In Robotics - New Position Paper International Federation of Robotics

Google News - AI robotics

1mabout 2 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 162 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersLive

ARTIFICIAL INTELLIGENCE KEYNOTE SPEAKER FOR CORPORATE EVENTS & AI CONFERENCES - futuristsspeakers.com

ARTIFICIAL INTELLIGENCE KEYNOTE SPEAKER FOR CORPORATE EVENTS & AI CONFERENCES futuristsspeakers.com

Google News: AI

1m34 minutes ago

Research PapersRecent

This Wi-Fi receiver can work inside a nuclear reactor, keeping robots connected

The research, presented at the IEEE International Solid-State Circuits Conference in San Francisco earlier this year, shows the receiver can continue operating after exposure to 500 kilograys of radiation. That level of endurance far exceeds what even space-grade electronics are designed to handle. Read Entire Article

TechSpot

1mabout 17 hours ago

Research PapersRecent

AI Music & Creators Conference - Bennett College

AI Music & Creators Conference Bennett College

Google News: AI

1mabout 17 hours ago

Research PapersFresh

Can space solve AI's crisis? Oracle cuts 30,000 workers while half of Earth projects remain stuck - Cryptopolitan

Can space solve AI's crisis? Oracle cuts 30,000 workers while half of Earth projects remain stuck Cryptopolitan

GNews AI USA

1mabout 12 hours ago