Models model language model announce insight global alignment

PLOT: Enhancing Preference Learning via Optimal Transport

arXiv cs.CLby Liang Zhu, Yuelin Bai, Xiankun Ren, Jiaxi Yang, Lei Zhang, Feiteng Fang, Hamid Alinejad-Rokny, Minghuan Tan, Min YangApril 4, 20261 min read0 views

Source Quiz

arXiv:2604.01837v1 Announce Type: new Abstract: Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global token-level relationships. We introduce PLOT, which enhances Preference Learning in fine-tuning-based alignment through a token-level loss derived from Optimal Transport. By formulating preference learning as an Optimal Transport Problem, PLOT aligns model outputs with human preferences while preserving the original distribution of LLMs, ensuring stability and robustness. Furthermore, PLOT leverages token embeddings to capture semantic relationships, enabling globally informed optimization. Experiments across tw

View PDF HTML (experimental)

Abstract:Preference learning in Large Language Models (LLMs) has advanced significantly, yet existing methods remain limited by modest performance gains, high computational costs, hyperparameter sensitivity, and insufficient modeling of global token-level relationships. We introduce PLOT, which enhances Preference Learning in fine-tuning-based alignment through a token-level loss derived from Optimal Transport. By formulating preference learning as an Optimal Transport Problem, PLOT aligns model outputs with human preferences while preserving the original distribution of LLMs, ensuring stability and robustness. Furthermore, PLOT leverages token embeddings to capture semantic relationships, enabling globally informed optimization. Experiments across two preference categories - Human Values and Logic & Problem Solving - spanning seven subpreferences demonstrate that PLOT consistently improves alignment performance while maintaining fluency and coherence. These results substantiate optimal transport as a principled methodology for preference learning, establishing a theoretically grounded framework that provides new insights for preference learning of LLMs.

Subjects:

Computation and Language (cs.CL)

Cite as: arXiv:2604.01837 [cs.CL]

(or arXiv:2604.01837v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2604.01837

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Liang Zhu [view email] [v1] Thu, 2 Apr 2026 09:51:56 UTC (79 KB)

Original source

arXiv cs.CL

https://arxiv.org/abs/2604.01837

Was this article helpful?

Ask AI about this article

Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

modellanguage modelannounce

Products

Privacy, AI and the need for a coherent global baseline - IAPP

Privacy, AI and the need for a coherent global baseline IAPP

GNews AI privacy

1m2 months ago

Analyst NewsLive

My forays into cyborgism: theory, pt. 1

In this post, I share the thinking that lies behind the Exobrain system I have built for myself. In another post, I'll describe the actual system. I think the standard way of relating to LLM/AIs is as an external tool (or "digital mind") that you use and/or collaborate with. Instead of you doing the coding, you ask the LLM to do it for you. Instead of doing the research, you ask it to. That's great, and there is utility in those use cases. Now, while I hardly engage in the delusion that humans can have some kind of long-term symbiotic integration with AIs that prevents them from replacing us [1] , in the short term, I think humans can automate, outsource, and augment our thinking with LLM/AIs. We already augment our cognition with technologies such as writing and mundane software. Organizi