Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessStartup funding shatters all records in Q1TechCrunch AIHow to Use Shaders in React (2026 WebGPU / WebGL Tutorial)DEV CommunityThe 5th Agent Orchestration Pattern: Market-Based Task AllocationDEV CommunityThe Hidden Cost of Copy-Pasting Code Into ChatGPTDEV Community14-Package Monorepo: How We Structured WAIaaS for AI Agent BuildersDEV CommunityPromoting raw BG3 gameplay bundle previews in the TD2 SDL portDEV CommunityWhat Is New In Helm 4 And How It Improves Over Helm 3DEV CommunityDevelopers Are Designing for AI Before Users NowDEV CommunityStop Using Elaborate Personas: Research Shows They Degrade Claude Code OutputDEV CommunityAnthropic Executive Blames Claude Code Leak on ‘Process Errors’Bloomberg TechnologyAn Engineering-grade breakdown of RAG PipelineDEV CommunityHate Speech Detection Still Cooks (Even in 2026)Towards AIBlack Hat USADark ReadingBlack Hat AsiaAI BusinessStartup funding shatters all records in Q1TechCrunch AIHow to Use Shaders in React (2026 WebGPU / WebGL Tutorial)DEV CommunityThe 5th Agent Orchestration Pattern: Market-Based Task AllocationDEV CommunityThe Hidden Cost of Copy-Pasting Code Into ChatGPTDEV Community14-Package Monorepo: How We Structured WAIaaS for AI Agent BuildersDEV CommunityPromoting raw BG3 gameplay bundle previews in the TD2 SDL portDEV CommunityWhat Is New In Helm 4 And How It Improves Over Helm 3DEV CommunityDevelopers Are Designing for AI Before Users NowDEV CommunityStop Using Elaborate Personas: Research Shows They Degrade Claude Code OutputDEV CommunityAnthropic Executive Blames Claude Code Leak on ‘Process Errors’Bloomberg TechnologyAn Engineering-grade breakdown of RAG PipelineDEV CommunityHate Speech Detection Still Cooks (Even in 2026)Towards AI

VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration

arXivMarch 31, 20261 min read0 views
Source Quiz

arXiv:2601.22674v3 Announce Type: replace Abstract: Multimodal large language models (MLLMs) suffer from high computational costs due to excessive visual tokens, particularly in high-resolution and video-based scenarios. Existing token reduction methods typically focus on isolated pipeline components and often neglect textual alignment, leading to performance degradation. In this paper, we propose VisionTrim, a unified framework for training-free MLLM acceleration, integrating two effective plug-and-play modules: 1) the Dominant Vision Token Selection (DVTS) module, which preserves essential v — Hanxun Yu, Wentong Li, Xuan Qu, Song Wang, Junbo Chen, Jianke Zhu

View PDF HTML (experimental)

Abstract:Multimodal large language models (MLLMs) suffer from high computational costs due to excessive visual tokens, particularly in high-resolution and video-based scenarios. Existing token reduction methods typically focus on isolated pipeline components and often neglect textual alignment, leading to performance degradation. In this paper, we propose VisionTrim, a unified framework for training-free MLLM acceleration, integrating two effective plug-and-play modules: 1) the Dominant Vision Token Selection (DVTS) module, which preserves essential visual tokens via a global-local view, and 2) the Text-Guided Vision Complement (TGVC) module, which facilitates context-aware token merging guided by textual cues. Extensive experiments across diverse image and video multimodal benchmarks demonstrate the performance superiority of our VisionTrim, advancing practical MLLM deployment in real-world applications. The code is available at: this https URL.

Comments: ICLR2026, Code Link: this https URL

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2601.22674 [cs.CV]

(or arXiv:2601.22674v3 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2601.22674

arXiv-issued DOI via DataCite

Submission history

From: Hanxun Yu [view email] [v1] Fri, 30 Jan 2026 07:45:48 UTC (8,833 KB) [v2] Mon, 2 Feb 2026 09:21:10 UTC (8,832 KB) [v3] Sat, 28 Mar 2026 09:00:26 UTC (8,827 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
VisionTrim:…researchpaperarxivcomputer-vi…image-recog…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 190 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers