Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessIran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle EastTechRepublic AI‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age wellThe Guardian AIMastering the art of no in generative AI projects - FinTech GlobalGoogle News: Generative AISources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)TechmemeBrain implants let paralyzed man make music with his thoughtsTechSpotAI Guardrails by Zapier Gives Teams Inline Safety Checks for Every AI-Powered Workflow - citybizGoogle News: AI SafetySource: AWS' operation in Bahrain was damaged after an Iranian strike; Bahrain earlier said the civil defence force was "extinguishing a fire in a facility" (Financial Times)TechmemeAnthropic Accidentally Leaks Claude Source Code - BenzingaGoogle News: ClaudeThe Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code - FuturismGoogle News: ClaudeAOC Reportedly Says She Will Vote Against All Military Aid To Israel, Including Defensive WeaponsInternational Business TimesTop Artificial Intelligence Speakers for Events | Scott Steinberg - futuristsspeakers.comGoogle News: AIThese Raspberry Pi price hikes are no jokeThe Verge AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessIran Threatens to Attack Apple, Google, and Other US Tech Firms in Middle EastTechRepublic AI‘It’s all very possible’: Michael Patrick King on The Comeback return’s shocking AI twist – and why And Just Like That will age wellThe Guardian AIMastering the art of no in generative AI projects - FinTech GlobalGoogle News: Generative AISources: SpaceX has filed confidentially for an IPO, putting it on track for a June listing; it could reportedly seek a valuation of $1.75T+ and raise ~$75B (Bloomberg)TechmemeBrain implants let paralyzed man make music with his thoughtsTechSpotAI Guardrails by Zapier Gives Teams Inline Safety Checks for Every AI-Powered Workflow - citybizGoogle News: AI SafetySource: AWS' operation in Bahrain was damaged after an Iranian strike; Bahrain earlier said the civil defence force was "extinguishing a fire in a facility" (Financial Times)TechmemeAnthropic Accidentally Leaks Claude Source Code - BenzingaGoogle News: ClaudeThe Fact That Anthropic Has Been Boasting About How Much Its Development Now Relies on Claude Makes It Very Interesting That It Just Suffered a Catastrophic Leak of Its Source Code - FuturismGoogle News: ClaudeAOC Reportedly Says She Will Vote Against All Military Aid To Israel, Including Defensive WeaponsInternational Business TimesTop Artificial Intelligence Speakers for Events | Scott Steinberg - futuristsspeakers.comGoogle News: AIThese Raspberry Pi price hikes are no jokeThe Verge AI

AlpsBench: An LLM Personalization Benchmark for Real-Dialogue Memorization and Preference Alignment

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.26680v1 Announce Type: cross Abstract: As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has become a critical frontier. However, progress is currently bottlenecked by the absence of a gold-standard evaluation benchmark. Existing benchmarks either overlook personalized information management that is critical for personalization or rely heavily on synthetic dialogues, which exhibit an inherent distribution gap from real-world dialogue. To bridge this gap, we introduce AlpsBench, An LLM PerSonalization benchmark derived from real-world human-LLM d — Jianfei Xiao, Xiang Yu, Chengbing Wang, Wuqiang Zheng, Xinyu Lin, Kaining Liu, Hongxun Ding, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

Authors:Jianfei Xiao, Xiang Yu, Chengbing Wang, Wuqiang Zheng, Xinyu Lin, Kaining Liu, Hongxun Ding, Yang Zhang, Wenjie Wang, Fuli Feng, Xiangnan He

View PDF HTML (experimental)

Abstract:As Large Language Models (LLMs) evolve into lifelong AI assistants, LLM personalization has become a critical frontier. However, progress is currently bottlenecked by the absence of a gold-standard evaluation benchmark. Existing benchmarks either overlook personalized information management that is critical for personalization or rely heavily on synthetic dialogues, which exhibit an inherent distribution gap from real-world dialogue. To bridge this gap, we introduce AlpsBench, An LLM PerSonalization benchmark derived from real-world human-LLM dialogues. AlpsBench comprises 2,500 long-term interaction sequences curated from WildChat, paired with human-verified structured memories that encapsulate both explicit and implicit personalization signals. We define four pivotal tasks - personalized information extraction, updating, retrieval, and utilization - and establish protocols to evaluate the entire lifecycle of memory management. Our benchmarking of frontier LLMs and memory-centric systems reveals that: (i) models struggle to reliably extract latent user traits; (ii) memory updating faces a performance ceiling even in the strongest models; (iii) retrieval accuracy declines sharply in the presence of large distractor pools; and (iv) while explicit memory mechanisms improve recall, they do not inherently guarantee more preference-aligned or emotionally resonant responses. AlpsBench aims to provide a comprehensive framework.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2603.26680 [cs.CL]

(or arXiv:2603.26680v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2603.26680

arXiv-issued DOI via DataCite

Submission history

From: Jianfei Xiao [view email] [v1] Mon, 9 Mar 2026 11:06:19 UTC (1,423 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
AlpsBench: …researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 136 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers