Models model language model training announce multimodal arxiv

Advancing Complex Video Object Segmentation via Tracking-Enhanced Prompt: The 1st Winner for 5th PVUW MOSE Challenge

arXiv cs.CVby Jinrong Zhang, Canyang Wu, Xusheng He, Weili Guan, Jianlong Wu, Liqiang NieApril 2, 20261 min read0 views

arXiv:2604.00395v1 Announce Type: new Abstract: In the Complex Video Object Segmentation task, researchers are required to track and segment specific targets within cluttered environments, which rigorously tests a method's capability for target comprehension and environmental adaptability. Although SAM3, the current state-of-the-art solution, exhibits unparalleled segmentation performance and robustness on conventional targets, it underperforms on tiny and semantic-dominated objects. The root cause of this limitation lies in SAM3's insufficient comprehension of these specific target types. To address this issue, we propose TEP: Advancing Complex Video Object Segmentation via Tracking-Enhanced Prompts. As a training-free approach, TEP leverages external tracking models and Multimodal Large

View PDF HTML (experimental)

Abstract:In the Complex Video Object Segmentation task, researchers are required to track and segment specific targets within cluttered environments, which rigorously tests a method's capability for target comprehension and environmental adaptability. Although SAM3, the current state-of-the-art solution, exhibits unparalleled segmentation performance and robustness on conventional targets, it underperforms on tiny and semantic-dominated objects. The root cause of this limitation lies in SAM3's insufficient comprehension of these specific target types. To address this issue, we propose TEP: Advancing Complex Video Object Segmentation via Tracking-Enhanced Prompts. As a training-free approach, TEP leverages external tracking models and Multimodal Large Language Models to introduce tracking-enhanced prompts, thereby alleviating the difficulty SAM3 faces in understanding these challenging targets. Our method achieved first place (56.91%) on the test set of the PVUW Challenge 2026: Complex Video Object Segmentation Track.

Comments: 1st Place Solution for the 5th PVUW MOSE Challenge (CVPR 2026 Workshop)

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2604.00395 [cs.CV]

(or arXiv:2604.00395v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2604.00395

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xusheng He [view email] [v1] Wed, 1 Apr 2026 02:23:23 UTC (18,147 KB)

Original source

arXiv cs.CV

https://arxiv.org/abs/2604.00395

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltraining

ModelsLive

Gemma 4: Byte for byte, the most capable open models - blog.google

Gemma 4: Byte for byte, the most capable open models blog.google

GNews AI NVIDIA

1mabout 1 hour ago

Releases

Unilever Deepens Commitment to AI Innovation through Vector Institute Collaboration

TORONTO, March 5, 2024 – Unilever and the Vector Institute today announced a new partnership to advance artificial intelligence (AI) innovation. Through this collaboration, Unilever will gain access to Vector [ ] The post Unilever Deepens Commitment to AI Innovation through Vector Institute Collaboration appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mabout 1 year ago

Releases

CEO Update

TORONTO The Vector Institute, a leading force in artificial intelligence research and innovation, today announced that Tony Gaffney, President and CEO, will transition to a new role at [ ] The post CEO Update appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mabout 1 year ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 158 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Models

ModelsLive

Gemma 4: Byte for byte, the most capable open models - blog.google

Gemma 4: Byte for byte, the most capable open models blog.google

GNews AI NVIDIA

1mabout 1 hour ago

Models

Leveraging Large Language Models for More Efficient Systematic Reviews in Medicine and Beyond

New work from Vector Applied Scientist David Emerson highlights how generalist large language models (LLMs) may be used to automate systematic review screening. “Development of prompt templates for LLM-driven screening [ ] The post Leveraging Large Language Models for More Efficient Systematic Reviews in Medicine and Beyond appeared first on Vector Institute for Artificial Intelligence .

Vector Institute

1mabout 1 year ago

ModelsLive

NVIDIA Optimizes Google's Gemma 4 Models for Local RTX AI - The Tech Buzz

NVIDIA Optimizes Google's Gemma 4 Models for Local RTX AI The Tech Buzz

GNews AI NVIDIA

1mabout 1 hour ago

ModelsLive

Claude.ai Prompt Injection Vulnerability

Article URL: https://www.oasis.security/blog/claude-ai-prompt-injection-data-exfiltration-vulnerability Comments URL: https://news.ycombinator.com/item?id=47616556 Points: 2 # Comments: 0

Hacker News AI Top

1m43 minutes ago