Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessNavigating the Challenges of Cross-functional Teams: the Role of Governance and Common GoalsDEV Community[Side B] Pursuing OSS Quality Assurance with AI: Achieving 369 Tests, 97% Coverage, and GIL-Free CompatibilityDEV Community[Side A] Completely Defending Python from OOM Kills: The BytesIO Trap and D-MemFS 'Hard Quota' Design PhilosophyDEV CommunityFrom Attention Economy to Thinking Economy: The AI ChallengeDEV CommunityHow We're Approaching a County-Level Education Data System EngagementDEV CommunityI Built a Portable Text Editor for Windows — One .exe File, No Installation, Forever FreeDEV CommunityBuilding Global Crisis Monitor: A Real-Time Geopolitical Intelligence DashboardDEV CommunityGoogle's TurboQuant saves memory, but won't save us from DRAM-pricing hellThe Register AI/MLWriting Better RFCs and Design DocsDEV CommunityAnthropic took down thousands of Github repos trying to yank its leaked source code — a move the company says was an accidentTechCrunchIntroducing The Screwtape LaddersLessWrong AIA Very Fine UntuningTowards AI

APEX-EM: Non-Parametric Online Learning for Autonomous Agents via Structured Procedural-Episodic Experience Replay

arXiv cs.IRby Pratyay Banerjee, Masud Moshtaghi, Ankit ChadhaApril 1, 20262 min read0 views
Source Quiz

arXiv:2603.29093v1 Announce Type: cross Abstract: LLM-based autonomous agents lack persistent procedural memory: they re-derive solutions from scratch even when structurally identical tasks have been solved before. We present \textbf{APEX-EM}, a non-parametric online learning framework that accumulates, retrieves, and reuses structured procedural plans without modifying model weights. APEX-EM introduces: (1) a \emph{structured experience representation} encoding the full procedural-episodic trace of each execution -- planning steps, artifacts, iteration history with error analysis, and quality scores; (2) a \emph{Plan-Retrieve-Generate-Iterate-Ingest} (PRGII) workflow with Task Verifiers providing multi-dimensional reward signals; and (3) a \emph{dual-outcome Experience Memory} with hybrid

View PDF HTML (experimental)

Abstract:LLM-based autonomous agents lack persistent procedural memory: they re-derive solutions from scratch even when structurally identical tasks have been solved before. We present \textbf{APEX-EM}, a non-parametric online learning framework that accumulates, retrieves, and reuses structured procedural plans without modifying model weights. APEX-EM introduces: (1) a \emph{structured experience representation} encoding the full procedural-episodic trace of each execution -- planning steps, artifacts, iteration history with error analysis, and quality scores; (2) a \emph{Plan-Retrieve-Generate-Iterate-Ingest} (PRGII) workflow with Task Verifiers providing multi-dimensional reward signals; and (3) a \emph{dual-outcome Experience Memory} with hybrid retrieval combining semantic search, structural signature matching, and plan DAG traversal -- enabling cross-domain transfer between tasks sharing no lexical overlap but analogous operational structure. Successful experiences serve as positive in-context examples; failures as negative examples with structured error annotations. We evaluate on BigCodeBench~\cite{zhuo2025bigcodebench}, KGQAGen-10k~\cite{zhang2025kgqagen}, and Humanity's Last Exam~\cite{phan2025hle} using Claude Sonnet 4.5 and Opus 4.5. On KGQAGen-10k, APEX-EM achieves 89.6% accuracy versus 41.3% without memory (+48.3pp), surpassing the oracle-retrieval upper bound (84.9%). On BigCodeBench, it reaches 83.3% SR from a 53.9% baseline (+29.4pp), exceeding MemRL's~\cite{memrl2025} +11.0pp gain under comparable frozen-backbone conditions (noting backbone differences controlled for in our analysis). On HLE, entity graph retrieval reaches 48.0% from 25.2% (+22.8pp). Ablations show component value is task-dependent: rich judge feedback is negligible for code generation but critical for structured queries (+10.3pp), while binary-signal iteration partially compensates for weaker feedback.

Comments: 17 pages, 13 figures

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR)

Cite as: arXiv:2603.29093 [cs.CL]

(or arXiv:2603.29093v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.29093

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Pratyay Banerjee [view email] [v1] Tue, 31 Mar 2026 00:24:56 UTC (1,096 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

claudemodelannounce

Knowledge Map

Knowledge Map
TopicsEntitiesSource
APEX-EM: No…claudemodelannounceanalysiscode genera…componentarXiv cs.IR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 188 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Models