Live
Black Hat USAAI BusinessBlack Hat AsiaAI BusinessHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchChinese AI lab DeepSeek to run v4 on Huawei chips - Tech in AsiaGNews AI HuaweiAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AIBlack Hat USAAI BusinessBlack Hat AsiaAI BusinessHere's what 'cracking' bitcoin in 9 minutes by quantum computers actually meansCoinDesk AIAnthropic is having a moment in the private markets; SpaceX could spoil the partyTechCrunchChinese AI lab DeepSeek to run v4 on Huawei chips - Tech in AsiaGNews AI HuaweiAmazon is selling a Samsung Galaxy tablet with AI-capabilities for just $270 - aol.comGNews AI SamsungThe Tool That Built the Modern World Is Still the Most Powerful Thing in an Engineer’s ArsenalMedium AI[P] GPU friendly lossless 12-bit BF16 format with 0.03% escape rate and 1 integer ADD decode works for AMD & NVIDIAReddit r/MachineLearningI Tested AI Coding Assistants on the Same Full-Stack App — Here’s the Real WinnerMedium AIIs the Arrow of Time a Crucial Missing Component in Artificial Intelligence?Medium AIv0.20.1: Revert "enable flash attention for gemma4 (#15296)" (#15311)Ollama ReleasesAutomation vs AI: Not Just Similar — They Solve Fundamentally Different ProblemsMedium AIWalmart's AI Checkout Converted 3x Worse. The Interface Is Why.DEV Community✨ Why Humanity Still Moves Toward AI.Medium AI
AI NEWS HUBbyEIGENVECTOREigenvector

Aligning Multimodal Sequential Recommendations via Robust Direct Preference Optimization with Sparse MoE

arXiv cs.IRby Hejin Huang, Jusheng Zhang, Kaitong Cai, Jian Wang, Rong PanApril 1, 20261 min read0 views
Source Quiz

arXiv:2603.29259v1 Announce Type: new Abstract: Preference-based alignment objectives have been widely adopted, from RLHF-style pairwise learning in large language models to emerging applications in recommender systems. Yet, existing work rarely examines how Direct Preference Optimization (DPO) behaves under implicit feedback, where unobserved items are not reliable negatives. We conduct systematic experiments on multimodal sequential recommendation to compare common negative-selection strategies and their interaction with DPO training. Our central finding is that a simple modification, replacing deterministic hard negatives with stochastic sampling from a dynamic top-K candidate pool, consistently improves ranking performance. We attribute its effectiveness to two factors: (1) reducing er

View PDF HTML (experimental)

Abstract:Preference-based alignment objectives have been widely adopted, from RLHF-style pairwise learning in large language models to emerging applications in recommender systems. Yet, existing work rarely examines how Direct Preference Optimization (DPO) behaves under implicit feedback, where unobserved items are not reliable negatives. We conduct systematic experiments on multimodal sequential recommendation to compare common negative-selection strategies and their interaction with DPO training. Our central finding is that a simple modification, replacing deterministic hard negatives with stochastic sampling from a dynamic top-K candidate pool, consistently improves ranking performance. We attribute its effectiveness to two factors: (1) reducing erroneous suppressive gradients caused by false negatives, and (2) retaining informative hard signals while smoothing optimization via controlled stochasticity. With an optional sparse Mixture-of-Experts encoder for efficient capacity scaling, RoDPO achieves up to 5.25% NDCG@5 on three Amazon benchmarks, with nearly unchanged inference cost.

Subjects:

Information Retrieval (cs.IR); Computation and Language (cs.CL)

Cite as: arXiv:2603.29259 [cs.IR]

(or arXiv:2603.29259v1 [cs.IR] for this version)

https://doi.org/10.48550/arXiv.2603.29259

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Hejin Huang [view email] [v1] Tue, 31 Mar 2026 04:49:32 UTC (656 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelbenchmark

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Aligning Mu…modellanguage mo…benchmarktrainingannounceapplicationarXiv cs.IR

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 152 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!