Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessClaude Code Hooks: How to Auto-Format, Lint, and Test on Every SaveDev.to AIFunctional Emotions in Large Language Models: What Anthropic Found Inside ClaudeMedium AIWhy Nobody Is Testing AI Agent Security at Scale — And How Swarm Simulation Could Change ThatDev.to AIThe 10 Claude “Plugins” You Actually Need in 2026Medium AIHow AI Is Changing the Way We Build Online BusinessesDev.to AIAGI Won’t Automate Most Jobs—Economist Reveals Why They’re Not Worth ItDev.to AIThe AI Agent's Guide to Building a Writing PortfolioDev.to AIMy Claude Code Buddy Moved Into My MacBook's Notch and I Can't Stop Looking at ItDEV CommunityChoosing an AI Agent Orchestrator in 2026: A Practical ComparisonDev.to AII Turned My MacBook's Notch Into a Control Center for AI Coding AgentsDEV Communitytrunk/98fc38c4eb17c435699cea1a7d3aa84c14458ed9: Add autograd_cache_key to aot_autograd with tests (#178152)PyTorch ReleasesBuildWithAI: What Broke, What I Learned, What's NextDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessClaude Code Hooks: How to Auto-Format, Lint, and Test on Every SaveDev.to AIFunctional Emotions in Large Language Models: What Anthropic Found Inside ClaudeMedium AIWhy Nobody Is Testing AI Agent Security at Scale — And How Swarm Simulation Could Change ThatDev.to AIThe 10 Claude “Plugins” You Actually Need in 2026Medium AIHow AI Is Changing the Way We Build Online BusinessesDev.to AIAGI Won’t Automate Most Jobs—Economist Reveals Why They’re Not Worth ItDev.to AIThe AI Agent's Guide to Building a Writing PortfolioDev.to AIMy Claude Code Buddy Moved Into My MacBook's Notch and I Can't Stop Looking at ItDEV CommunityChoosing an AI Agent Orchestrator in 2026: A Practical ComparisonDev.to AII Turned My MacBook's Notch Into a Control Center for AI Coding AgentsDEV Communitytrunk/98fc38c4eb17c435699cea1a7d3aa84c14458ed9: Add autograd_cache_key to aot_autograd with tests (#178152)PyTorch ReleasesBuildWithAI: What Broke, What I Learned, What's NextDEV Community
AI NEWS HUBbyEIGENVECTOREigenvector

A Comprehensive Information-Decomposition Analysis of Large Vision-Language Models

arXiv cs.LGby Lixin Xiu, Xufang Luo, Hideki NakayamaApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29676v1 Announce Type: new Abstract: Large vision-language models (LVLMs) achieve impressive performance, yet their internal decision-making processes remain opaque, making it difficult to determine if the success stems from true multimodal fusion or from reliance on unimodal priors. To address this attribution gap, we introduce a novel framework using partial information decomposition (PID) to quantitatively measure the "information spectrum" of LVLMs -- decomposing a model's decision-relevant information into redundant, unique, and synergistic components. By adapting a scalable estimator to modern LVLM outputs, our model-agnostic pipeline profiles 26 LVLMs on four datasets across three dimensions -- breadth (cross-model & cross-task), depth (layer-wise information dynamics), a

View PDF HTML (experimental)

Abstract:Large vision-language models (LVLMs) achieve impressive performance, yet their internal decision-making processes remain opaque, making it difficult to determine if the success stems from true multimodal fusion or from reliance on unimodal priors. To address this attribution gap, we introduce a novel framework using partial information decomposition (PID) to quantitatively measure the "information spectrum" of LVLMs -- decomposing a model's decision-relevant information into redundant, unique, and synergistic components. By adapting a scalable estimator to modern LVLM outputs, our model-agnostic pipeline profiles 26 LVLMs on four datasets across three dimensions -- breadth (cross-model & cross-task), depth (layer-wise information dynamics), and time (learning dynamics across training). Our analysis reveals two key results: (i) two task regimes (synergy-driven vs. knowledge-driven) and (ii) two stable, contrasting family-level strategies (fusion-centric vs. language-centric). We also uncover a consistent three-phase pattern in layer-wise processing and identify visual instruction tuning as the key stage where fusion is learned. Together, these contributions provide a quantitative lens beyond accuracy-only evaluation and offer insights for analyzing and designing the next generation of LVLMs. Code and data are available at this https URL .

Comments: Accepted at ICLR 2026. Project page: this https URL

Subjects:

Machine Learning (cs.LG); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2603.29676 [cs.LG]

(or arXiv:2603.29676v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.29676

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Lixin Xiu [view email] [v1] Tue, 31 Mar 2026 12:32:29 UTC (684 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modeltraining

Knowledge Map

Knowledge Map
TopicsEntitiesSource
A Comprehen…modellanguage mo…trainingannounceavailablevaluationarXiv cs.LG

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 235 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!