CARPE: Context-Aware Image Representation Prioritization via Ensemble for Large Vision-Language Models
arXiv:2601.13622v3 Announce Type: replace-cross Abstract: Large vision-language models (LVLMs) are typically trained using autoregressive language modeling objectives, which align visual representations with linguistic space. While effective for multimodal reasoning, this alignment can weaken vision-centric capabilities, causing LVLMs to underperform their base vision encoders on tasks such as image classification. To address this limitation, we propose Context-Aware Image Representation Prioritization via Ensemble (CARPE), a lightweight framework that integrates raw vision features with align — Donghee Lee, Rui Cai, Zhe Zhao
View PDF HTML (experimental)
Abstract:Large vision-language models (LVLMs) are typically trained using autoregressive language modeling objectives, which align visual representations with linguistic space. While effective for multimodal reasoning, this alignment can weaken vision-centric capabilities, causing LVLMs to underperform their base vision encoders on tasks such as image classification. To address this limitation, we propose Context-Aware Image Representation Prioritization via Ensemble (CARPE), a lightweight framework that integrates raw vision features with aligned LLM representations through vision-integration layers and a context-aware ensemble mechanism. This design enhances the model's ability to adaptively weight visual and textual modalities and enables the model to capture various aspects of image representations. Extensive experiments demonstrate that CARPE improves performance on both image classification and diverse vision-language benchmarks. Our results suggest that modality balancing plays a critical role in multimodal generalization by improving representation utilization within autoregressive LVLMs.
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2601.13622 [cs.CV]
(or arXiv:2601.13622v3 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2601.13622
arXiv-issued DOI via DataCite
Submission history
From: Dong Hee Lee [view email] [v1] Tue, 20 Jan 2026 05:44:33 UTC (283 KB) [v2] Wed, 18 Mar 2026 04:41:36 UTC (278 KB) [v3] Thu, 26 Mar 2026 21:38:46 UTC (278 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivKnowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.





![[D] CVPR 2026 Travel Grant/Registration Waiver](https://d2xsxph8kpxj0f.cloudfront.net/310419663032563854/konzwo8nGf8Z4uZsMefwMr/default-img-circuit-gold-PMJWD5qsqGfXwX8w9a97Cb.webp)


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!