Research Papers research paper arxiv Multimodal Large Language Models egocentric views allocentric mental model

Communicating about Space: Language-Mediated Spatial Integration Across Partial Views

HuggingFace PapersMarch 28, 20262 min read0 views

MLLMs demonstrate limited capability in collaborative spatial communication tasks, achieving only 72% accuracy compared to humans' 95%, with models struggling to build consistent shared mental models unlike human dialogues that become more specific during convergence. (9 upvotes on HuggingFace)

Published on Mar 28

Authors:

Abstract

AI-generated summary

Humans build shared spatial understanding by communicating partial, viewpoint-dependent observations. We ask whether Multimodal Large Language Models (MLLMs) can do the same, aligning distinct egocentric views through dialogue to form a coherent, allocentric mental model of a shared environment. To study this systematically, we introduce COSMIC, a benchmark for Collaborative Spatial Communication. In this setting, two static MLLM agents observe a 3D indoor environment from different viewpoints and exchange natural-language messages to solve spatial queries. COSMIC contains 899 diverse scenes and 1250 question-answer pairs spanning five tasks. We find a consistent capability hierarchy, MLLMs are most reliable at identifying shared anchor objects across views, perform worse on relational reasoning, and largely fail at building globally consistent maps, performing near chance, even for the frontier models. Moreover, we find thinking capability yields consistent gains in anchor grounding, but is insufficient for higher-level spatial communication. To contextualize model behavior, we additionally collect 250 human-human dialogues. Humans achieve 95% aggregate accuracy, leaving significant room for improvement for even the best performing model Gemini-3-Pro-Thinking which achieves 72% aggregate accuracy. Moreover, human conversations become increasingly specific as partners converge on a shared mental model, whereas model dialogues continue to explore new possibilities rather than converging, consistent with a limited ability to build and maintain a robust shared mental model. Our code and data is available at https://github.com/ankursikarwar/Cosmic

View arXiv page View PDF GitHub 0 Add to collection

Get this paper in your agent:

hf papers read 2603.27183

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.27183 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.27183 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.

Original source

HuggingFace Papers

https://huggingface.co/papers/2603.27183

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Research PapersFresh

Researchers Map Mycorrhizal Fungi Carbon Hotspots - Let's Data Science

Researchers Map Mycorrhizal Fungi Carbon Hotspots Let's Data Science

Google News: Machine Learning

1mabout 9 hours ago

ModelsFresh

[D] AI research on small language models

i'm doing research on some trending fields in AI, currently working on small language models and would love to meet people who are working in similar domains and are looking to write/publish papers! submitted by /u/StoicWithSyrup [link] [comments]

Reddit r/MachineLearning

1mabout 2 hours ago

CountriesFresh

Promising Signals on AI Governance from China

View the official memo here. China has consistently signaled a willingness to engage on global AI governance since at least 2017. This memo compiles key statements from the Chinese government and prominent figures demonstrating their desire to coordinate on the problem of AI. Chinese Vice Premier Ding Xuexiang, at the 2025 World Economic Forum, said: [ ] The post Promising Signals on AI Governance from China appeared first on Machine Intelligence Research Institute .

intelligence.org

1mabout 4 hours ago

Knowledge Map

TopicsEntitiesSource

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 159 connections

Scroll to zoom · drag to pan · click to open

Discussion

No comments yet — be the first to share your thoughts!

More in Research Papers

Research PapersFresh

Researchers Map Mycorrhizal Fungi Carbon Hotspots - Let's Data Science

Researchers Map Mycorrhizal Fungi Carbon Hotspots Let's Data Science

Google News: Machine Learning

1mabout 9 hours ago

Research Papers

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI - WSJ

Exclusive | OpenAI’s Former Research Chief Aims to Automate Manufacturing With AI WSJ

GNews AI manufacturing

1mabout 1 month ago

Research Papers

AI Journey 2025 Conference: exploring the future of artificial intelligence - Азия-Плюс

AI Journey 2025 Conference: exploring the future of artificial intelligence Азия-Плюс

Google News - AI Tajikistan

1m5 months ago

Research Papers

VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors

Vision Language Models struggle with fine-grained visual perception tasks due to their language-centric training approach, performing poorly on unnamed visual entities despite having relevant information in their representations. (1 upvotes on HuggingFace)

HuggingFace Papers

3m5 days ago