Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessPerplexity launches Secure Intelligence Institute to advance AI security, privacy, and safety research - Moneycontrol.comGoogle News: AI SafetyAnthropic Source Code Leak Exposes AI Security Logic Before $350B IPO - startupfortune.comGoogle News: ClaudeBoy, 16, takes his own life after chilling ChatGPT question and 'farewell' texts - Daily StarGoogle News: ChatGPTGiving up on EA after 13 yearsLessWrong AIThe End of the "I Am Not a Robot" Box: Why Your Next Login Will Require 5 SquatsDEV CommunityInstagram DMs to Amazon Connect ChatDEV CommunityThe Nines Are Lying to You: What 99.9% Uptime Actually CostsDEV CommunityThe jury verdicts against Meta and YouTube recognized some platform design features as defective, distinct from what Section 230 was created to protect (Casey Newton/Platformer)TechmemeAnthropic code leak sparks renewed concerns over AI security and operational risks - CXO DigitalpulseGoogle News: AI SafetyBefore You Upgrade Hardware, Fix the SoftwareDEV Community2026년, Postman 버릴 때? Axios npm 공격 후 안전한 API 테스트 및 마이그레이션DEV CommunityAnthropic accidentally leaks part of Claude Code source - Latest news from AzerbaijanGoogle News: ClaudeBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessPerplexity launches Secure Intelligence Institute to advance AI security, privacy, and safety research - Moneycontrol.comGoogle News: AI SafetyAnthropic Source Code Leak Exposes AI Security Logic Before $350B IPO - startupfortune.comGoogle News: ClaudeBoy, 16, takes his own life after chilling ChatGPT question and 'farewell' texts - Daily StarGoogle News: ChatGPTGiving up on EA after 13 yearsLessWrong AIThe End of the "I Am Not a Robot" Box: Why Your Next Login Will Require 5 SquatsDEV CommunityInstagram DMs to Amazon Connect ChatDEV CommunityThe Nines Are Lying to You: What 99.9% Uptime Actually CostsDEV CommunityThe jury verdicts against Meta and YouTube recognized some platform design features as defective, distinct from what Section 230 was created to protect (Casey Newton/Platformer)TechmemeAnthropic code leak sparks renewed concerns over AI security and operational risks - CXO DigitalpulseGoogle News: AI SafetyBefore You Upgrade Hardware, Fix the SoftwareDEV Community2026년, Postman 버릴 때? Axios npm 공격 후 안전한 API 테스트 및 마이그레이션DEV CommunityAnthropic accidentally leaks part of Claude Code source - Latest news from AzerbaijanGoogle News: Claude

See, Symbolize, Act: Grounding VLMs with Spatial Representations for Better Gameplay

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.11601v2 Announce Type: replace Abstract: Vision-Language Models (VLMs) excel at describing visual scenes, yet struggle to translate perception into precise, grounded actions. We investigate whether providing VLMs with both the visual frame and the symbolic representation of the scene can improve their performance in interactive environments. We evaluate three state-of-the-art VLMs across Atari games, VizDoom, and AI2-THOR, comparing frame-only, frame with self-extracted symbols, frame with ground-truth symbols, and symbol-only pipelines. Our results indicate that all models benefit — Ashish Baghel, Paras Chopra

View PDF HTML (experimental)

Abstract:Vision-Language Models (VLMs) excel at describing visual scenes, yet struggle to translate perception into precise, grounded actions. We investigate whether providing VLMs with both the visual frame and the symbolic representation of the scene can improve their performance in interactive environments. We evaluate three state-of-the-art VLMs across Atari games, VizDoom, and AI2-THOR, comparing frame-only, frame with self-extracted symbols, frame with ground-truth symbols, and symbol-only pipelines. Our results indicate that all models benefit when the symbolic information is accurate. However, when VLMs extract symbols themselves, performance becomes dependent on model capability and scene complexity. We further investigate how accurately VLMs can extract symbolic information from visual inputs and how noise in these symbols affects decision-making and gameplay performance. Our findings reveal that symbolic grounding is beneficial in VLMs only when symbol extraction is reliable, and highlight perception quality as a central bottleneck for future VLM-based agents.

Comments: 11 pages, 13 figures. Accepted to LMReasoning Workshop at AAAI 2026

Subjects:

Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.11601 [cs.AI]

(or arXiv:2603.11601v2 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2603.11601

arXiv-issued DOI via DataCite

Submission history

From: Ashish Baghel [view email] [v1] Thu, 12 Mar 2026 06:48:57 UTC (261 KB) [v2] Fri, 27 Mar 2026 09:16:31 UTC (231 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
See, Symbol…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 219 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers