Contextual inference from single objects in Vision-Language models
arXiv:2603.26731v1 Announce Type: cross Abstract: How much scene context a single object carries is a well-studied question in human scene perception, yet how this capacity is organized in vision-language models (VLMs) remains poorly understood, with direct implications for the robustness of these models. We investigate this question through a systematic behavioral and mechanistic analysis of contextual inference from single objects. Presenting VLMs with single objects on masked backgrounds, we probe their ability to infer both fine-grained scene category and coarse superordinate context (indo — Martina G. Vilas, Timothy Schauml\"offel, Gemma Roig
View PDF HTML (experimental)
Abstract:How much scene context a single object carries is a well-studied question in human scene perception, yet how this capacity is organized in vision-language models (VLMs) remains poorly understood, with direct implications for the robustness of these models. We investigate this question through a systematic behavioral and mechanistic analysis of contextual inference from single objects. Presenting VLMs with single objects on masked backgrounds, we probe their ability to infer both fine-grained scene category and coarse superordinate context (indoor vs. outdoor). We found that single objects support above-chance inference at both levels, with performance modulated by the same object properties that predict human scene categorization. Object identity, scene, and superordinate predictions are partially dissociable: accurate inference at one level neither requires nor guarantees accurate inference at the others, and the degree of coupling differs markedly across models. Mechanistically, object representations that remain stable when background context is removed are more predictive of successful contextual inference. Scene and superordinate schemas are grounded in fundamentally different ways: scene identity is encoded in image tokens throughout the network, while superordinate information emerges only late or not at all. Together, these results reveal that the organization of contextual inference in VLMs is more complex than accuracy alone suggests, with behavioral and mechanistic signatures
Subjects:
Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.26731 [cs.CV]
(or arXiv:2603.26731v1 [cs.CV] for this version)
https://doi.org/10.48550/arXiv.2603.26731
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Martina G. Vilas [view email] [v1] Fri, 20 Mar 2026 13:24:15 UTC (2,920 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivLooking for academic level ML study partners
Hi everyone, I’m looking for a small, focused study group (or learning partner) dedicated to deep, academiclevel understanding of AI/ML not tutorials :), not shortcuts, not surface level courses. My learning path is structured and intense: Advanced Python (data model, OOP, decorators, closures) Linear algebra, calculus, probability (university level) Pandas/Numpy (vectorization, broadcasting, clean pipelines) ML algorithms implemented from scratch (regression, SVM, trees, boosting) PyTorch fundamentals (tensors, autograd, training loops) Goal: strong ML foundations leading into deep learning and transformers. I study 6+ hours daily, consistently, and I’m looking for people who value: discipline depth mathematical intuition research-oriented mindset building real projects (from scratch) rep
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Disruptive technologies, engineered concepts, and normative guidance
Socially disruptive technologies can induce normative disorientation. This occurs as they disrupt established concepts that have traditionally provided normative guidance. A notable example of such technology-induced conceptual disruption is the advent of ventilator technology. Patients who lost brain stem activity and autonomous ventilation, yet remained alive through ventilator support, created a state of uncertainty: they were considered “dead” in terms of (autonomous) ventilation and brain activity, but “alive” in terms of cardiac function. This descriptive ambiguity led to normative disorientation, particularly among clinicians and patients’ relatives. In response, conceptual engineering and the introduction of the new concept of “brain death” have been identified as critical steps to

Blocking the Internet Archive Won’t Stop AI, But It Will Erase the Web’s Historical Record
<div class="field field--name-body field--type-text-with-summary field--label-hidden"><div class="field__items"><div class="field__item even"><p>Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper. </p> <p><span>That’s effectively what’s begun happening online in the last few months. The Internet Archive—the world’s largest digital library—has preserved newspapers since it went online </span><a href="https://en.wikipedia.org/wiki/Wayback_Machine"><span>in the mid-1990s</span></a><span>. The Archive’s mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains </span><a href="https://blog.archive.org/trillion/"><span>more than one trillion archived web p
Report: 95% of Organisations do not Have Full Trust in Cybersecurity Vendors
The report examines one of cybersecurity’s most urgent and overlooked necessities: trust. Sophos, a global security solutions firm, have published their findings from a vendor-agnostic study, based on responses from 5,000 organisations across 17 countries. The Cybersecurity Trust Reality 2026 report reveals a critical challenge facing CISOs: Trust in cybersecurity vendors is fragile, difficult to […] The post Report: 95% of Organisations do not Have Full Trust in Cybersecurity Vendors appeared first on DIGIT .



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!