Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAssessing Marvell Technology (MRVL) After Nvidia’s US$2b AI Partnership And Connectivity Push - simplywall.stGNews AI NVIDIADutchess to host artificial intelligence summit at Marist in Poughkeepsie - Daily FreemanGoogle News: AIAnthropic’s Catastrophic Leak May Have Just Handed China the Blueprints to Claude Al - TipRanksGoogle News: ClaudeOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-UpWired AIMeta's AI push is reshaping how work gets done inside the companyBusiness InsiderOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up - WIREDGoogle News: OpenAI[P] Remote sensing foundation models made easy to use.Reddit r/MachineLearningFirst time NeurIPS. How different is it from low-ranked conferences? [D]Reddit r/MachineLearningScaling Trust: How Salesforce s Security Team Uses Agentforce to Triage Security Reports at Speedengineering.salesforce.com$18M Salary: UBTech Robotics Wants to Hire a Humanoid AI Mastermind - eWeekGoogle News - AI roboticsAI & Tech brief: Ireland ascendant - The Washington PostGNews AI EUPeople would rather have an Amazon warehouse in their backyard than a data centerTechCrunch AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAssessing Marvell Technology (MRVL) After Nvidia’s US$2b AI Partnership And Connectivity Push - simplywall.stGNews AI NVIDIADutchess to host artificial intelligence summit at Marist in Poughkeepsie - Daily FreemanGoogle News: AIAnthropic’s Catastrophic Leak May Have Just Handed China the Blueprints to Claude Al - TipRanksGoogle News: ClaudeOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-UpWired AIMeta's AI push is reshaping how work gets done inside the companyBusiness InsiderOpenAI's Fidji Simo Is Taking Medical Leave Amid an Executive Shake-Up - WIREDGoogle News: OpenAI[P] Remote sensing foundation models made easy to use.Reddit r/MachineLearningFirst time NeurIPS. How different is it from low-ranked conferences? [D]Reddit r/MachineLearningScaling Trust: How Salesforce s Security Team Uses Agentforce to Triage Security Reports at Speedengineering.salesforce.com$18M Salary: UBTech Robotics Wants to Hire a Humanoid AI Mastermind - eWeekGoogle News - AI roboticsAI & Tech brief: Ireland ascendant - The Washington PostGNews AI EUPeople would rather have an Amazon warehouse in their backyard than a data centerTechCrunch AI
AI NEWS HUBbyEIGENVECTOREigenvector

From Static to Dynamic: Exploring Self-supervised Image-to-Video Representation Transfer Learning

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.26597v1 Announce Type: new Abstract: Recent studies have made notable progress in video representation learning by transferring image-pretrained models to video tasks, typically with complex temporal modules and video fine-tuning. However, fine-tuning heavy modules may compromise inter-video semantic separability, i.e., the essential ability to distinguish objects across videos. While reducing the tunable parameters hinders their intra-video temporal consistency, which is required for stable representations of the same object within a video. This dilemma indicates a potential trade- — Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Xilin Zhao, Qingming Huang

View PDF

Abstract:Recent studies have made notable progress in video representation learning by transferring image-pretrained models to video tasks, typically with complex temporal modules and video fine-tuning. However, fine-tuning heavy modules may compromise inter-video semantic separability, i.e., the essential ability to distinguish objects across videos. While reducing the tunable parameters hinders their intra-video temporal consistency, which is required for stable representations of the same object within a video. This dilemma indicates a potential trade-off between the intra-video temporal consistency and inter-video semantic separability during image-to-video transfer. To this end, we propose the Consistency-Separability Trade-off Transfer Learning (Co-Settle) framework, which applies a lightweight projection layer on top of the frozen image-pretrained encoder to adjust representation space with a temporal cycle consistency objective and a semantic separability constraint. We further provide a theoretical support showing that the optimized projection yields a better trade-off between the two properties under appropriate conditions. Experiments on eight image-pretrained models demonstrate consistent improvements across multiple levels of video tasks with only five epochs of self-supervised training. The code is available at this https URL.

Comments: Accepted at CVPR 2026

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.26597 [cs.CV]

(or arXiv:2603.26597v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26597

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Yang Liu [view email] [v1] Fri, 27 Mar 2026 16:56:50 UTC (16,863 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
From Static…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 119 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers