Research Papers research paper arxiv ai artificial-intelligence

Learning to Select Visual In-Context Demonstrations

arXivby [Submitted on 24 Mar 2026]March 31, 20262 min read1 views

arXiv:2603.26775v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) adapt to visual tasks via in-context learning (ICL), which relies heavily on demonstration quality. The dominant demonstration selection strategy is unsupervised k-Nearest Neighbor (kNN) search. While simple, this similarity-first approach is sub-optimal for complex factual regression tasks; it selects redundant examples that fail to capture the task's full output range. We reframe selection as a sequential decision-making problem and introduce Learning to Select Demonstrations (LSD), training a Reinforc — Eugene Lee, Yu-Chi Lin, Jiajie Diao

View PDF HTML (experimental)

Abstract:Multimodal Large Language Models (MLLMs) adapt to visual tasks via in-context learning (ICL), which relies heavily on demonstration quality. The dominant demonstration selection strategy is unsupervised k-Nearest Neighbor (kNN) search. While simple, this similarity-first approach is sub-optimal for complex factual regression tasks; it selects redundant examples that fail to capture the task's full output range. We reframe selection as a sequential decision-making problem and introduce Learning to Select Demonstrations (LSD), training a Reinforcement Learning agent to construct optimal demonstration sets. Using a Dueling DQN with a query-centric Transformer Decoder, our agent learns a policy that maximizes MLLM downstream performance. Evaluating across five visual regression benchmarks, we uncover a crucial dichotomy: while kNN remains optimal for subjective preference tasks, LSD significantly outperforms baselines on objective, factual regression tasks. By balancing visual relevance with diversity, LSD better defines regression boundaries, illuminating when learned selection is strictly necessary for visual ICL.

Comments: 21 pages, 12 figure, accepted to Computer Vision and Pattern Recognition Conference (CVPR) 2026 Findings Track

Subjects:

Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)

ACM classes: I.2; I.4; H.3

Cite as: arXiv:2603.26775 [cs.LG]

(or arXiv:2603.26775v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.26775

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Eugene Lee [view email] [v1] Tue, 24 Mar 2026 18:07:40 UTC (5,122 KB)

Original source

arXiv

https://arxiv.org/abs/2603.26775

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

ReleasesFresh

Delaunay Canopy: Building Wireframe Reconstruction from Airborne LiDAR Point Clouds via Delaunay Graph

arXiv:2604.02497v1 Announce Type: new Abstract: Reconstructing building wireframe from airborne LiDAR point clouds yields a compact, topology-centric representation that enables structural understanding beyond dense meshes. Yet a key limitation persists: conventional methods have failed to achieve accurate wireframe reconstruction in regions afflicted by significant noise, sparsity, or internal corners. This failure stems from the inability to establish an adaptive search space to effectively leverage the rich 3D geometry of large, sparse building point clouds. In this work, we address this challenge with Delaunay Canopy, which utilizes the Delaunay graph as a geometric prior to define a geometrically adaptive search space. Central to our approach is Delaunay Graph Scoring, which not only

arXiv cs.CV

1mabout 2 hours ago

ModelsFresh

SocioEval: A Template-Based Framework for Evaluating Socioeconomic Status Bias in Foundation Models

arXiv:2604.02660v1 Announce Type: new Abstract: As Large Language Models (LLMs) increasingly power decision-making systems across critical domains, understanding and mitigating their biases becomes essential for responsible AI deployment. Although bias assessment frameworks have proliferated for attributes such as race and gender, socioeconomic status bias remains significantly underexplored despite its widespread implications in the real world. We introduce SocioEval, a template-based framework for systematically evaluating socioeconomic bias in foundation models through decision-making tasks. Our hierarchical framework encompasses 8 themes and 18 topics, generating 240 prompts across 6 class-pair combinations. We evaluated 13 frontier LLMs on 3,120 responses using a rigorous three-stage

arXiv cs.CL

1mabout 2 hours ago

Research PapersFresh

Speaking of Language: Reflections on Metalanguage Research in NLP

arXiv:2604.02645v1 Announce Type: new Abstract: This work aims to shine a spotlight on the topic of metalanguage. We first define metalanguage, link it to NLP and LLMs, and then discuss our two labs' metalanguage-centered efforts. Finally, we discuss four dimensions of metalanguage and metalinguistic tasks, offering a list of understudied future research directions.

arXiv cs.CL

1mabout 2 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 214 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Let's Have a Conversation: Designing and Evaluating LLM Agents for Interactive Optimization

arXiv:2604.02666v1 Announce Type: new Abstract: Optimization is as much about modeling the right problem as solving it. Identifying the right objectives, constraints, and trade-offs demands extensive interaction between researchers and stakeholders. Large language models can empower decision-makers with optimization capabilities through interactive optimization agents that can propose, interpret and refine solutions. However, it is fundamentally harder to evaluate a conversation-based interaction than traditional one-shot approaches. This paper proposes a scalable and replicable methodology for evaluating optimization agents through conversations. We build LLM-powered decision agents that role-play diverse stakeholders, each governed by an internal utility function but communicating like a

arXiv cs.AI

2mabout 2 hours ago

Research PapersFresh

Speaking of Language: Reflections on Metalanguage Research in NLP

arXiv cs.CL

1mabout 2 hours ago

Research PapersFresh

Neural correlates of perceptual consciousness from within: a narrative review of human intracranial research

arXiv:2510.08736v2 Announce Type: replace Abstract: Despite many years of research, the quest to identify neural correlates of perceptual consciousness (NCC) remains unresolved. One major obstacle lies in methodological limitations: most studies rely on non-invasive neural measures with limited spatial or temporal resolution making it difficult to disentangle proper NCCs from concurrent cognitive processes. Additionally, the relatively low sensitivity of non-invasive neural measures limits the interpretation of null findings in studies targeting proper NCCs. In this review, we discuss how human intracranial recordings can advance the search for NCCs, by offering high spatiotemporal resolution, improved signal sensitivity, and broad cortical and subcortical coverage. We review studies that

arXiv q-bio.NC

1mabout 2 hours ago

Research PapersFresh

Size-structured populations with growth fluctuations: Feynman--Kac formula and decoupling

arXiv:2508.14680v2 Announce Type: replace-cross Abstract: We study a size-structured population model in which individual cells grow at a rate determined by a fluctuating internal variable (e.g., gene expression levels). Many previous models of phenotypically heterogeneous populations can be viewed as special cases of this model, and it has previously been observed that the internal variable decouples from cell size under certain conditions. In this work, we generalize these results and connect them to the Feynman-Kac formula, which yields relationships between the lineage dynamics and population distribution in branching processes. To this end, we derive conditions for decoupling, both in the lineage and population ensemble. When decoupling occurs in both ensembles, the size dynamics can

arXiv physics.data-an

1mabout 2 hours ago