AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment
arXiv:2603.26680v1 Announce Type: cross Abstract: As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has become a critical frontier. However, progress is currently bottlenecked by the absence of a gold-standard evaluation benchmark. Existing benchmarks either overlook personalized information management that is critical for personalization or rely heavily on synthetic dialogues, which exhibit an inherent distribution gap from real-world dialogue. To bridge this gap, we introduce AlpsBench, An LLM PerSonalization benchmark derived from real-world human-LLM d — Jianfei Xiao, Xiang Yu, Chengbing Wang, Wuqiang Zheng, Xinyu Lin, Kaining Liu, Hongxun Ding, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He
Authors:Jianfei Xiao, Xiang Yu, Chengbing Wang, Wuqiang Zheng, Xinyu Lin, Kaining Liu, Hongxun Ding, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He
View PDF HTML (experimental)
Abstract:As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has become a critical frontier. However, progress is currently bottlenecked by the absence of a gold-standard evaluation benchmark. Existing benchmarks either overlook personalized information management that is critical for personalization or rely heavily on synthetic dialogues, which exhibit an inherent distribution gap from real-world dialogue. To bridge this gap, we introduce AlpsBench, An LLM PerSonalization benchmark derived from real-world human-LLM dialogues. AlpsBench comprises 2,500 long-term interaction sequences curated from WildChat, paired with human-verified structured memories that encapsulate both explicit and implicit personalization signals. We define four pivotal tasks - personalized information extraction, updating, retrieval, and utilization - and establish protocols to evaluate the entire lifecycle of memory management. Our benchmarking of frontier LLMs and memory-centric systems reveals that: (i) models struggle to reliably extract latent user traits; (ii) memory updating faces a performance ceiling even in the strongest models; (iii) retrieval accuracy declines sharply in the presence of large distractor pools; and (iv) while explicit memory mechanisms improve recall, they do not inherently guarantee more preference-aligned or emotionally resonant responses. AlpsBench aims to provide a comprehensive framework.
Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2603.26680 [cs.CL]
(or arXiv:2603.26680v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2603.26680
arXiv-issued DOI via DataCite
Submission history
From: Jianfei Xiao [view email] [v1] Mon, 9 Mar 2026 11:06:19 UTC (1,423 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age well
<p>Could AI write an entire sitcom series? That’s the premise of the new season of comedy drama The Comeback. Its co-creator talks about being shocked by his research – and why the world needs to catch up with AJLT</p><p>TV veteran Michael Patrick King has had a long, lively career, writing, directing and producing on shows including Murphy Brown, Will & Grace and 2 Broke Girls. He’s best-known, though, for his work on the Sex and the City franchise, serving as its showrunner for the bulk of its run, writing and directing its two films, and masterminding its controversial <a href="https://www.theguardian.com/tv-and-radio/2025/aug/02/goodbye-and-just-like-that-right-time-to-end-cursed-spin-off">2020s revival And Just Like That</a>. But this month sees the return of one of his most loved
SEAS Researchers Expose Hidden “Alignment Discretion” Shaping AI Behavior - Harvard School of Engineering and Applied Sciences
<a href="https://news.google.com/rss/articles/CBMiowFBVV95cUxONEZoeldndUE2ZUZsSkloQmZVMk1jZUhncUY0V3c5NDQ0TlNLVTluWkppejlpOVdpemxqNEVvaDAwbG43VWpxOFpuakFtRDNLUlVTbEwzR25kYnJieVdkSEs0MVRESHpPSHN6dEhXUk9qVGRlVjFhT2ZqVjlJV056MG94MDN3dWVCSUtoRWVDODF5bVVET2gxQW5DSk1oT1pUNlVB?oc=5" target="_blank">SEAS Researchers Expose Hidden “Alignment Discretion” Shaping AI Behavior</a> <font color="#6f6f6f">Harvard School of Engineering and Applied Sciences</font>
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers
Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet - simplywall.st
<a href="https://news.google.com/rss/articles/CBMivwFBVV95cUxQNWpZb2ZQVDBIOGVZTTBtLThzaGwxS3NkMnJBSS1wek5pQlJXRWdTOEh5aTdPTE9Cd3JHdjZDeWRtVzdMUUdESHJOQXZDdGNVdGZtTTBhanpfb3UxQnRobVlzNGdVUXJLZWptV2V6NXlNSWllX3FxOU5XYTF0RkM2TnJIaFJkcVBFOGc2alBSLTZEeU85QU1oTjBrMVZSTl84dm9GeFl5OGtUMjc3LVd1dS1fcHZ1RG9HcV82T2JFWdIBxAFBVV95cUxOSE5XVXh0QkM4Yi1WbXNhWkJ2Z2dLRlBGNjAwaTcyNFJWMWRPdXo5WjRQQkRGTG9IamxxbmdhMHpsaEJ6RDQwZl9ENGl5WDc5a2lrTXZ1bVpFbGdsdndHYjFINnZPSnNKX1dZamszUXByR1BlRXF6d1pKOHpBU3M5UFhUSldlUWtIMlRNQzdvTk9haEJKeDI1ZEg0WWQ1SXYzLUZCWElQc3pzR19ucGExdVpnc2hBQXlQNVpOZFVBVzRkLXFE?oc=5" target="_blank">Oracle Layoffs Recast Costs To Back US$50b AI Infrastructure Bet</a> <font color="#6f6f6f">simplywall.st</font>
Riyadh conference to discuss role of AI in media industry - Arab News PK
<a href="https://news.google.com/rss/articles/CBMiVEFVX3lxTE1jdFVMUFA3R2RXM19JR1M1NnpjX210dUZuNkI3VWdQc0tzVVBZaXR3ZlNqUVFyZlB5aTMxOGI3OXFpdGpQX2RsOXF3UU5kaXlma2VpTQ?oc=5" target="_blank">Riyadh conference to discuss role of AI in media industry</a> <font color="#6f6f6f">Arab News PK</font>
Losito named IBM Italia general manager - Telecompaper
<a href="https://news.google.com/rss/articles/CBMiigFBVV95cUxNRTQ0RzVrcHJsVXo0THF3UllROGwyam1FNl9RWlV2dzJFRGtGMktoTGlYVUR5dU1WX1JSTkExQlNSVEFSWktVQVJSazFUUTJyV2tadUlraVlGM3M3WHNZNFNodm5DeVBvTXFkaDNkNXJ4SzF0RnphNGxOYlFGaFRtR241R2M0NFhUakE?oc=5" target="_blank">Losito named IBM Italia general manager</a> <font color="#6f6f6f">Telecompaper</font>


Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!