Research Papers research paper arxiv ai artificial-intelligence

Contextual inference from single objects in Vision-Language models

arXivMarch 31, 202610 min read0 views

arXiv:2603.26731v1 Announce Type: cross Abstract: How much scene context a single object carries is a well-studied question in human scene perception, yet how this capacity is organized in vision-language models (VLMs) remains poorly understood, with direct implications for the robustness of these models. We investigate this question through a systematic behavioral and mechanistic analysis of contextual inference from single objects. Presenting VLMs with single objects on masked backgrounds, we probe their ability to infer both fine-grained scene category and coarse superordinate context (indo — Martina G. Vilas, Timothy Schauml\"offel, Gemma Roig

View PDF HTML (experimental)

Abstract:How much scene context a single object carries is a well-studied question in human scene perception, yet how this capacity is organized in vision-language models (VLMs) remains poorly understood, with direct implications for the robustness of these models. We investigate this question through a systematic behavioral and mechanistic analysis of contextual inference from single objects. Presenting VLMs with single objects on masked backgrounds, we probe their ability to infer both fine-grained scene category and coarse superordinate context (indoor vs. outdoor). We found that single objects support above-chance inference at both levels, with performance modulated by the same object properties that predict human scene categorization. Object identity, scene, and superordinate predictions are partially dissociable: accurate inference at one level neither requires nor guarantees accurate inference at the others, and the degree of coupling differs markedly across models. Mechanistically, object representations that remain stable when background context is removed are more predictive of successful contextual inference. Scene and superordinate schemas are grounded in fundamentally different ways: scene identity is encoded in image tokens throughout the network, while superordinate information emerges only late or not at all. Together, these results reveal that the organization of contextual inference in VLMs is more complex than accuracy alone suggests, with behavioral and mechanistic signatures

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26731 [cs.CV]

(or arXiv:2603.26731v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2603.26731

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Martina G. Vilas [view email] [v1] Fri, 20 Mar 2026 13:24:15 UTC (2,920 KB)

Original source

arXiv

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Models

Could we switch off a dangerous AI?

New research validates age-old concerns about the difficulty of constraining powerful AI systems.

Future of Life Institute

1mover 1 year ago

Self-Evolving AI

🔮 Exponential View #565: Autoresearch; the solar supercycle; an agentic nation; ChatGPT Olympian, seeing fraud & moving asteroids++

Hi, Welcome to the Sunday edition, in which we make sense of the week behind us.

Exponential View

1m16 days ago

Models

Looking for academic level ML study partners

Hi everyone, I’m looking for a small, focused study group (or learning partner) dedicated to deep, academiclevel understanding of AI/ML not tutorials :), not shortcuts, not surface level courses. My learning path is structured and intense: Advanced Python (data model, OOP, decorators, closures) Linear algebra, calculus, probability (university level) Pandas/Numpy (vectorization, broadcasting, clean pipelines) ML algorithms implemented from scratch (regression, SVM, trees, boosting) PyTorch fundamentals (tensors, autograd, training loops) Goal: strong ML foundations leading into deep learning and transformers. I study 6+ hours daily, consistently, and I’m looking for people who value: discipline depth mathematical intuition research-oriented mindset building real projects (from scratch) rep

Fast.ai Forum

1m4 months ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 150 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research Papers

Disruptive technologies, engineered concepts, and normative guidance

Socially disruptive technologies can induce normative disorientation. This occurs as they disrupt established concepts that have traditionally provided normative guidance. A notable example of such technology-induced conceptual disruption is the advent of ventilator technology. Patients who lost brain stem activity and autonomous ventilation, yet remained alive through ventilator support, created a state of uncertainty: they were considered “dead” in terms of (autonomous) ventilation and brain activity, but “alive” in terms of cardiac function. This descriptive ambiguity led to normative disorientation, particularly among clinicians and patients’ relatives. In response, conceptual engineering and the introduction of the new concept of “brain death” have been identified as critical steps to

Ethics and Information Technology

1m4 months ago

Research Papers

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record

<div class="field field--name-body field--type-text-with-summary field--label-hidden"><div class="field__items"><div class="field__item even"><p>Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper. </p> <p><span>That’s effectively what’s begun happening online in the last few months. The Internet Archive—the world’s largest digital library—has preserved newspapers since it went online </span><a href="https://en.wikipedia.org/wiki/Wayback_Machine"><span>in the mid-1990s</span></a><span>. The Archive’s mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains </span><a href="https://blog.archive.org/trillion/"><span>more than one trillion archived web p

Electronic Frontier Foundation

4m15 days ago

Research PapersFresh

Report: 95% of Organisations do not Have Full Trust in Cybersecurity Vendors

The report examines one of cybersecurity’s most urgent and overlooked necessities: trust. Sophos, a global security solutions firm, have published their findings from a vendor-agnostic study, based on responses from 5,000 organisations across 17 countries. The Cybersecurity Trust Reality 2026 report reveals a critical challenge facing CISOs: Trust in cybersecurity vendors is fragile, difficult to […] The post Report: 95% of Organisations do not Have Full Trust in Cybersecurity Vendors appeared first on DIGIT .

Digit.fyi

1mabout 3 hours ago

Research PapersFresh

Google Quantum Paper Boosts Odds of Bitcoin ‘Q-Day’ by 2032, Researchers Warn

Google warned that quantum advances could break crypto security sooner than expected, with analysts recommending ‘appropriate urgency.’

Decrypt AI

1mabout 5 hours ago