Live
Black Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic Publishes Official Skills Guide — How It Compares to Soul SpecDEV CommunityEngineering DDoS Resilience at Scale — How ArzenLabs Designs Protection Beyond 200 TbpsDEV CommunityBacktrader vs VnPy vs Qlib: A Deep Comparison of Python Quant Backtesting Frameworks (2026)DEV CommunityWaaseyaa governance seriesDEV CommunityThe audit that started everything: how Waaseyaa designed an invariant-driven architectural reviewDEV CommunityIntroducing HCEL: The Most Fluent Way to Build AI Pipelines in TypeScriptDEV Community30-Day Cloud & DevOps Challenge: Day 2 — Building My First Backend APIDEV CommunityCompliance and Cost Governance for Landing ZonesDEV CommunityYour AI Writes Code. Who Fixes the Build?DEV CommunityClaude AI Source Code Leaked: Individual Rewriting in Rust to Address Security ConcernsDEV CommunityMicrosoft Commits $1B to Thailand's AI future - AI BusinessGoogle News: Generative AITesla admits that remote humans can sometimes take control of its robotaxisTechSpotBlack Hat USADark ReadingBlack Hat AsiaAI BusinessAnthropic Publishes Official Skills Guide — How It Compares to Soul SpecDEV CommunityEngineering DDoS Resilience at Scale — How ArzenLabs Designs Protection Beyond 200 TbpsDEV CommunityBacktrader vs VnPy vs Qlib: A Deep Comparison of Python Quant Backtesting Frameworks (2026)DEV CommunityWaaseyaa governance seriesDEV CommunityThe audit that started everything: how Waaseyaa designed an invariant-driven architectural reviewDEV CommunityIntroducing HCEL: The Most Fluent Way to Build AI Pipelines in TypeScriptDEV Community30-Day Cloud & DevOps Challenge: Day 2 — Building My First Backend APIDEV CommunityCompliance and Cost Governance for Landing ZonesDEV CommunityYour AI Writes Code. Who Fixes the Build?DEV CommunityClaude AI Source Code Leaked: Individual Rewriting in Rust to Address Security ConcernsDEV CommunityMicrosoft Commits $1B to Thailand's AI future - AI BusinessGoogle News: Generative AITesla admits that remote humans can sometimes take control of its robotaxisTechSpot

Dream to Recall: Imagination-Guided Experience Retrieval for Memory-Persistent Vision-and-Language Navigation

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2510.08553v2 Announce Type: replace-cross Abstract: Vision-and-Language Navigation (VLN) requires agents to follow natural language instructions through environments, with memory-persistent variants demanding progressive improvement through accumulated experience. Existing approaches for memory-persistent VLN face critical limitations: they lack effective memory access mechanisms, instead relying on entire memory incorporation or fixed-horizon lookup, and predominantly store only environmental observations while neglecting navigation behavioral patterns that encode valuable decision-maki — Yunzhe Xu, Yiyuan Pan, Zhe Liu

View PDF HTML (experimental)

Abstract:Vision-and-Language Navigation (VLN) requires agents to follow natural language instructions through environments, with memory-persistent variants demanding progressive improvement through accumulated experience. Existing approaches for memory-persistent VLN face critical limitations: they lack effective memory access mechanisms, instead relying on entire memory incorporation or fixed-horizon lookup, and predominantly store only environmental observations while neglecting navigation behavioral patterns that encode valuable decision-making strategies. We present Memoir, which employs imagination as a retrieval mechanism grounded by explicit memory: a world model imagines future navigation states as queries to selectively retrieve relevant environmental observations and behavioral histories. The approach comprises: 1) a language-conditioned world model that imagines future states serving dual purposes: encoding experiences for storage and generating retrieval queries; 2) Hybrid Viewpoint-Level Memory that anchors both observations and behavioral patterns to viewpoints, enabling hybrid retrieval; and 3) an experience-augmented navigation model that integrates retrieved knowledge through specialized encoders. Extensive evaluation across diverse memory-persistent VLN benchmarks with 10 distinct testing scenarios demonstrates Memoir's effectiveness: significant improvements across all scenarios, with 5.4% SPL gains on IR2R over the best memory-persistent baseline, accompanied by 8.3x training speedup and 74% inference memory reduction. The results validate that predictive retrieval of both environmental and behavioral memories enables more effective navigation, with analysis indicating substantial headroom (73.3% vs 93.4% upper bound) for this imagination-guided paradigm.

Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)

Cite as: arXiv:2510.08553 [cs.CV]

(or arXiv:2510.08553v2 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2510.08553

arXiv-issued DOI via DataCite

Submission history

From: Yunzhe Xu [view email] [v1] Thu, 9 Oct 2025 17:58:01 UTC (4,261 KB) [v2] Mon, 30 Mar 2026 09:03:04 UTC (5,799 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Dream to Re…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 203 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers