PLOT: Enhancing Preference Learning via Optimal Transport
arXiv:2604.01837v1 Announce Type: new Abstract: Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global token-level relationships. We introduce PLOT, which enhances Preference Learning in fine-tuning-based alignment through a token-level loss derived from Optimal Transport. By formulating preference learning as an Optimal Transport Problem, PLOT aligns model outputs with human preferences while preserving the original distribution of LLMs, ensuring stability and robustness. Furthermore, PLOT leverages token embeddings to capture semantic relationships, enabling globally informed optimization. Experiments across tw
View PDF HTML (experimental)
Abstract:Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global token-level relationships. We introduce PLOT, which enhances Preference Learning in fine-tuning-based alignment through a token-level loss derived from Optimal Transport. By formulating preference learning as an Optimal Transport Problem, PLOT aligns model outputs with human preferences while preserving the original distribution of LLMs, ensuring stability and robustness. Furthermore, PLOT leverages token embeddings to capture semantic relationships, enabling globally informed optimization. Experiments across two preference categories - Human Values and Logic & Problem Solving - spanning seven subpreferences demonstrate that PLOT consistently improves alignment performance while maintaining fluency and coherence. These results substantiate optimal transport as a principled methodology for preference learning, establishing a theoretically grounded framework that provides new insights for preference learning of LLMs.
Subjects:
Computation and Language (cs.CL)
Cite as: arXiv:2604.01837 [cs.CL]
(or arXiv:2604.01837v1 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2604.01837
arXiv-issued DOI via DataCite (pending registration)
Submission history
From: Liang Zhu [view email] [v1] Thu, 2 Apr 2026 09:51:56 UTC (79 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
modellanguage modelannounce
My forays into cyborgism: theory, pt. 1
In this post, I share the thinking that lies behind the Exobrain system I have built for myself. In another post, I'll describe the actual system. I think the standard way of relating to LLM/AIs is as an external tool (or "digital mind") that you use and/or collaborate with. Instead of you doing the coding, you ask the LLM to do it for you. Instead of doing the research, you ask it to. That's great, and there is utility in those use cases. Now, while I hardly engage in the delusion that humans can have some kind of long-term symbiotic integration with AIs that prevents them from replacing us [1] , in the short term, I think humans can automate, outsource, and augment our thinking with LLM/AIs. We already augment our cognition with technologies such as writing and mundane software. Organizi
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.







Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!