Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenAI, Anthropic eye new AI safety solution - News.azGoogle News: AI SafetyChatGPT comes to CarPlay with iOS 26.4, supports voice-only interaction - The Times of IndiaGoogle News: ChatGPTFair decisions, clear reasons: Creating Fuzzy AI with fairness built in from the start - Asia Research News |Google News: Machine LearningWhy Vera cofounder Yaniv Bernstein was surprised when he said he was giving up AI - Startup DailyGoogle News: Machine LearningReact Native Background Task Processing Methods (2026)DEV CommunityFlutter AI Virtual Try-On: 6-Week Build, Zero BSDEV CommunityHow to Choose the Best Speech-to-text API for Voice AgentsHackernoon AIDetecting Bots in 2026: IP Intelligence + Email Validation in One API CallDEV CommunityExtremism Researchers Pivot to AI Industry’s Trust and Safety Gaps - Startup FortuneGoogle News: AI SafetyI built 2 free web tools to solve problems that annoyed me — here's what I learnedDEV CommunityHow to Build Production Ready AgentScope Workflows with ReAct Agents, Custom Tools, Multi-Agent Debate, Structured Output and Concurrent PipelinesMarkTechPost🌐 Beyond One Data Source: Building Scalable Data Pipelines in Power BIDEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessOpenAI, Anthropic eye new AI safety solution - News.azGoogle News: AI SafetyChatGPT comes to CarPlay with iOS 26.4, supports voice-only interaction - The Times of IndiaGoogle News: ChatGPTFair decisions, clear reasons: Creating Fuzzy AI with fairness built in from the start - Asia Research News |Google News: Machine LearningWhy Vera cofounder Yaniv Bernstein was surprised when he said he was giving up AI - Startup DailyGoogle News: Machine LearningReact Native Background Task Processing Methods (2026)DEV CommunityFlutter AI Virtual Try-On: 6-Week Build, Zero BSDEV CommunityHow to Choose the Best Speech-to-text API for Voice AgentsHackernoon AIDetecting Bots in 2026: IP Intelligence + Email Validation in One API CallDEV CommunityExtremism Researchers Pivot to AI Industry’s Trust and Safety Gaps - Startup FortuneGoogle News: AI SafetyI built 2 free web tools to solve problems that annoyed me — here's what I learnedDEV CommunityHow to Build Production Ready AgentScope Workflows with ReAct Agents, Custom Tools, Multi-Agent Debate, Structured Output and Concurrent PipelinesMarkTechPost🌐 Beyond One Data Source: Building Scalable Data Pipelines in Power BIDEV Community

MemoryCD: Benchmarking Long-Context User Memory of LLM Agents for Lifelong Cross-Domain Personalization

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2603.25973v1 Announce Type: new Abstract: Recent advancements in Large Language Models (LLMs) have expanded context windows to million-token scales, yet benchmarks for evaluating memory remain limited to short-session synthetic dialogues. We introduce \textsc{MemoryCD}, the first large-scale, user-centric, cross-domain memory benchmark derived from lifelong real-world behaviors in the Amazon Review dataset. Unlike existing memory datasets that rely on scripted personas to generate synthetic user data, \textsc{MemoryCD} tracks authentic user interactions across years and multiple domains. — Weizhi Zhang, Xiaokai Wei, Wei-Chieh Huang, Zheng Hui, Chen Wang, Michelle Gong, Philip S. Yu

View PDF HTML (experimental)

Abstract:Recent advancements in Large Language Models (LLMs) have expanded context windows to million-token scales, yet benchmarks for evaluating memory remain limited to short-session synthetic dialogues. We introduce \textsc{MemoryCD}, the first large-scale, user-centric, cross-domain memory benchmark derived from lifelong real-world behaviors in the Amazon Review dataset. Unlike existing memory datasets that rely on scripted personas to generate synthetic user data, \textsc{MemoryCD} tracks authentic user interactions across years and multiple domains. We construct a multi-faceted long-context memory evaluation pipeline of 14 state-of-the-art LLM base models with 6 memory method baselines on 4 distinct personalization tasks over 12 diverse domains to evaluate an agent's ability to simulate real user behaviors in both single and cross-domain settings. Our analysis reveals that existing memory methods are far from user satisfaction in various domains, offering the first testbed for cross-domain life-long personalization evaluation.

Comments: Published as a workshop paper in Lifelong Agent @ ICLR 2026

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2603.25973 [cs.CL]

(or arXiv:2603.25973v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.25973

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Weizhi Zhang [view email] [v1] Thu, 26 Mar 2026 23:28:47 UTC (8,365 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MemoryCD: B…researchpaperarxivnlplanguage-mo…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 238 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers