Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
arXiv:2510.00526v2 Announce Type: replace-cross Abstract: Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log likelihood (NLL). While NLL is classically optimal when training from scratch, post-training operates in a different paradigm and could violate its optimality assumptions, where models already encode task-relevant priors and supervision can be long and noisy. Rather than proposing a single universally superior replacement — Gaotang Li, Ruizhong Qiu, Xiusi Chen, Heng Ji, Hanghang Tong
View PDF HTML (experimental)
Abstract:Supervised fine-tuning (SFT) is the standard approach for post-training large language models (LLMs), yet it often shows limited generalization. We trace this limitation to its default training objective: negative log likelihood (NLL). While NLL is classically optimal when training from scratch, post-training operates in a different paradigm and could violate its optimality assumptions, where models already encode task-relevant priors and supervision can be long and noisy. Rather than proposing a single universally superior replacement loss, we systematically study various probability-based objectives and characterize when and why different objectives succeed or fail under varying conditions. Through comprehensive experiments and extensive ablation studies across 8 model backbones, 27 benchmarks, and 7 domains, we uncover a critical dimension that governs objective behavior: the model-capability continuum. Near the model-strong end, prior-leaning objectives that downweight low-probability tokens (e.g., $-p$, $-p^{10}$, thresholded variants) consistently outperform NLL; toward the model-weak end, NLL dominates; in between, no single objective prevails. Our theoretical analysis further elucidates how objectives trade places across the continuum, providing a principled foundation for adapting objectives to model capability. The code is provided at this https URL.
Comments: 28 pages, 6 figures
Subjects:
Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as: arXiv:2510.00526 [cs.CL]
(or arXiv:2510.00526v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2510.00526
arXiv-issued DOI via DataCite
Submission history
From: Gaotang Li [view email] [v1] Wed, 1 Oct 2025 05:17:47 UTC (233 KB) [v2] Fri, 27 Mar 2026 05:33:29 UTC (271 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxiv
Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions
arXiv:2402.16391v2 Announce Type: replace Abstract: Artificial Intelligence (AI) is now used across nearly every industry, making AI model quality essential for building reliable and trustworthy systems. Historically, correctness has been the main focus, but industry AI models must also satisfy many other important quality attributes. To understand how these attributes are perceived, the challenges they create, and the solutions used in practice, we identify nine key quality attributes and interview 15 AI practitioners from diverse backgrounds. The interviews show that practitioners prioritize attributes differently depending on context. For example, efficiency can matter more than correctness in real-time applications, while scalability and deployability are no longer seen as primary conc

Proceedings of the 7th Workshop on Models for Formal Analysis of Real Systems
arXiv:2604.03053v1 Announce Type: cross Abstract: These proceedings contain the papers that were presented at the 7th Workshop on Models for Formal Analysis of Real Systems (MARS 2026), which took place on 12 April 2026 in Turin, Italy, as a satellite event of the 29th International Joint Conferences on Theory and Practice of Software (ETAPS 2026). The goal of MARS is to bring together researchers from different communities who are developing formal models of real systems in areas where complex models occur (e.g., networks, cyber-physical systems, hardware/software codesign, biology). The motivation for MARS stems from the following two observations: - Large case studies are essential to show that specification formalisms and modelling techniques are applicable to real systems, whereas man

Separating Oblivious and Adaptive Differential Privacy under Continual Observation
arXiv:2603.11029v2 Announce Type: replace-cross Abstract: We resolve an open question of Jain, Raskhodnikova, Sivakumar, and Smith (ICML 2023) by exhibiting a problem separating differential privacy under continual observation in the oblivious and adaptive settings. The continual observation (a.k.a. continual release) model formalizes privacy for streaming algorithms, where data is received over time and output is released at each time step. In the oblivious setting, privacy need only hold for data streams fixed in advance; in the adaptive setting, privacy is required even for streams that can be chosen adaptively based on the streaming algorithm's output. We describe the first explicit separation between the oblivious and adaptive settings. The problem showing this separation is based on
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Industry Practitioners Perspectives on AI Model Quality: Perceptions, Challenges, and Solutions
arXiv:2402.16391v2 Announce Type: replace Abstract: Artificial Intelligence (AI) is now used across nearly every industry, making AI model quality essential for building reliable and trustworthy systems. Historically, correctness has been the main focus, but industry AI models must also satisfy many other important quality attributes. To understand how these attributes are perceived, the challenges they create, and the solutions used in practice, we identify nine key quality attributes and interview 15 AI practitioners from diverse backgrounds. The interviews show that practitioners prioritize attributes differently depending on context. For example, efficiency can matter more than correctness in real-time applications, while scalability and deployability are no longer seen as primary conc

Proceedings of the 7th Workshop on Models for Formal Analysis of Real Systems
arXiv:2604.03053v1 Announce Type: cross Abstract: These proceedings contain the papers that were presented at the 7th Workshop on Models for Formal Analysis of Real Systems (MARS 2026), which took place on 12 April 2026 in Turin, Italy, as a satellite event of the 29th International Joint Conferences on Theory and Practice of Software (ETAPS 2026). The goal of MARS is to bring together researchers from different communities who are developing formal models of real systems in areas where complex models occur (e.g., networks, cyber-physical systems, hardware/software codesign, biology). The motivation for MARS stems from the following two observations: - Large case studies are essential to show that specification formalisms and modelling techniques are applicable to real systems, whereas man

The Periodic Table of AI Architecture: Assigning Clear Roles to Scattered AI Findings
A speculative but highly insightful conceptual framework for AI architecture A Mini Textbook for AI Engineers on Structure, Flow, Trace, and Residual Governance.pdf just released on Open Science Framework for public review. This mini-textbook, with detail tutorial notes, offers a unified lens for thinking about intelligent systems — moving beyond “just scale more” toward structured coordination under real limits . It treats advanced AI not as an all-knowing predictor, but as bounded observers that extract stable structure from noisy reality while leaving a governable residual (ambiguity, fragility, and unresolved parts). At its core is a clean grammar built around: Maintained Structure vs. Active Flow Adjudication (separating the viable from the merely possible) Semantic time (event-define

‘This is 160-million-year-old Jurassic clay’: inside Es Devlin’s bid to reshape AI ethics – through pottery
The great artist and designer has summoned spiritual leaders, AI researchers and academics to try their hands at ceramics – and debate their wide-ranging positions on where tech is taking humanity Es Devlin owns a really great bell. It’s a singing bowl – originally used in Buddhist chanting rituals but now found in most quality yoga classes. This particular bell hits just the right frequency to make my temples vibrate pleasantly and, from the way the others gathered around the workbench at Oxford Kilns fall silent when Devlin strikes it, I don’t think I’m alone in feeling my head go ping. Devlin is calling order on a group of artists, AI researchers, spiritual leaders, academics and experts from global tech gathered at the kilns to discuss AI and make pots at the AI and Earth conference or

Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!