Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenClaw has 500,000 instances and no enterprise kill switchVentureBeat AIBuilding Trust Between Agents: AgentID + ArkForge InteroperabilityDEV CommunityI Analyzed Claude Code's Leaked Source — Here's How Anthropic's AI Agent Actually WorksDEV CommunityI wish AI Agents just knew how I work without me explaining - so I made something that quietly observes me, learns and teaches it.DEV CommunityEmotion-Aware Voice Agents: How AI Now Detects Frustration and Adjusts in Real TimeDEV CommunityXoul - Local Personal Assistant Agent Release (Beta, v0.1.0-beta)DEV CommunityIntroduction to GIT- GITHUB/GITLABDEV CommunityTurboQuant MoE 0.3.0DEV CommunityCSS Grid Lanes (Masonry Layout) Is Here: A Complete Guide for 2026DEV CommunityIran says it will start targeting US tech companies like Apple, Google, Meta, Microsoft, Nvidia and Tesla in the Middle East starting 8PM local time on April 1 (Julia Shapero/The Hill)TechmemeBuild and Stream Browser-Based XR Experiences with NVIDIA CloudXR.jsNVIDIA Tech BlogDelta is bringing free Wi-Fi to flights using Amazon's satellitesTechSpotGoogle Gemini may tailor AI answers based on query tone: Report - searchengineland.comGoogle News: GeminiBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenClaw has 500,000 instances and no enterprise kill switchVentureBeat AIBuilding Trust Between Agents: AgentID + ArkForge InteroperabilityDEV CommunityI Analyzed Claude Code's Leaked Source — Here's How Anthropic's AI Agent Actually WorksDEV CommunityI wish AI Agents just knew how I work without me explaining - so I made something that quietly observes me, learns and teaches it.DEV CommunityEmotion-Aware Voice Agents: How AI Now Detects Frustration and Adjusts in Real TimeDEV CommunityXoul - Local Personal Assistant Agent Release (Beta, v0.1.0-beta)DEV CommunityIntroduction to GIT- GITHUB/GITLABDEV CommunityTurboQuant MoE 0.3.0DEV CommunityCSS Grid Lanes (Masonry Layout) Is Here: A Complete Guide for 2026DEV CommunityIran says it will start targeting US tech companies like Apple, Google, Meta, Microsoft, Nvidia and Tesla in the Middle East starting 8PM local time on April 1 (Julia Shapero/The Hill)TechmemeBuild and Stream Browser-Based XR Experiences with NVIDIA CloudXR.jsNVIDIA Tech BlogDelta is bringing free Wi-Fi to flights using Amazon's satellitesTechSpotGoogle Gemini may tailor AI answers based on query tone: Report - searchengineland.comGoogle News: Gemini

Live Interactive Training for Video Segmentation

arXivMarch 31, 20262 min read0 views
Source Quiz

arXiv:2603.26929v1 Announce Type: new Abstract: Interactive video segmentation often requires many user interventions for robust performance in challenging scenarios (e.g., occlusions, object separations, camouflage, etc.). Yet, even state-of-the-art models like SAM2 use corrections only for immediate fixes without learning from this feedback, leading to inefficient, repetitive user effort. To address this, we introduce Live Interactive Training (LIT), a novel framework for prompt-based visual systems where models also learn online from human corrections at inference time. Our primary instanti — Xinyu Yang, Haozheng Yu, Yihong Sun, Bharath Hariharan, Jennifer J. Sun

View PDF HTML (experimental)

Abstract:Interactive video segmentation often requires many user interventions for robust performance in challenging scenarios (e.g., occlusions, object separations, camouflage, etc.). Yet, even state-of-the-art models like SAM2 use corrections only for immediate fixes without learning from this feedback, leading to inefficient, repetitive user effort. To address this, we introduce Live Interactive Training (LIT), a novel framework for prompt-based visual systems where models also learn online from human corrections at inference time. Our primary instantiation, LIT-LoRA, implements this by continually updating a lightweight LoRA module on-the-fly. When a user provides a correction, this module is rapidly trained on that feedback, allowing the vision system to improve performance on subsequent frames of the same video. Leveraging the core principles of LIT, our LIT-LoRA implementation achieves an average 18-34% reduction in total corrections on challenging video segmentation benchmarks, with a negligible training overhead of ~0.5s per correction. We further demonstrate its generality by successfully adapting it to other segmentation models and extending it to CLIP-based fine-grained image classification. Our work highlights the promise of live adaptation to transform interactive tools and significantly reduce redundant human effort in complex visual tasks. Project: this https URL.

Comments: CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26929 [cs.CV]

(or arXiv:2603.26929v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26929

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Xinyu Yang [view email] [v1] Fri, 27 Mar 2026 19:10:23 UTC (9,203 KB)

Original source

arXiv

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Live Intera…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 337 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers