Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages
arXiv:2512.08777v2 Announce Type: replace-cross Abstract: We propose a post-training method for lower-resource languages that preserves the fluency of language models even when aligned by disfluent reward models. Preference optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and instruction-tuned language models capable of generating fluent synthetic data. To address this, we focus on developing a fluent preference-aligned language model without any instruction — David Samuel, Lilja {\O}vrelid, Erik Velldal, Andrey Kutuzov
View PDF HTML (experimental)
Abstract:We propose a post-training method for lower-resource languages that preserves the fluency of language models even when aligned by disfluent reward models. Preference optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and instruction-tuned language models capable of generating fluent synthetic data. To address this, we focus on developing a fluent preference-aligned language model without any instruction-tuning data in the target language. Our approach uses an on-policy training method, which we compare with two common alternatives: supervised finetuning on machine-translated data and multilingual finetuning. We conduct a case study on Norwegian Bokmål and evaluate fluency through native-speaker assessments. The results show that the on-policy aspect is crucial and outperforms the alternatives without relying on any hard-to-obtain data.
Subjects:
Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as: arXiv:2512.08777 [cs.CL]
(or arXiv:2512.08777v2 [cs.CL] for this version)
https://doi.org/10.48550/arXiv.2512.08777
arXiv-issued DOI via DataCite
Journal reference: The Fourteenth International Conference on Learning Representations (ICLR 2026)
Submission history
From: David Samuel [view email] [v1] Tue, 9 Dec 2025 16:31:48 UTC (620 KB) [v2] Fri, 27 Mar 2026 10:41:05 UTC (664 KB)
Sign in to highlight and annotate this article

Conversation starters
Daily AI Digest
Get the top 5 AI stories delivered to your inbox every morning.
More about
researchpaperarxivObsolescence without hostility: optimization, uniformity, and the erosion of human meaning in a post-AI world
Most contemporary discussions of artificial intelligence focus on misalignment, loss of control, or catastrophic harm. This paper examines a different and comparatively neglected possibility: that advanced AI may erode the social conditions under which human meaning has historically been generated, without conflict, coercion, or displacement. The central question is not whether AI dominates humanity, but whether human participation remains causally significant once AI systems outperform humans across core instrumental domains. The argument is conditional and long-horizon in scope. It proceeds from the observation that existing limits on AI superiority are primarily technological and economic rather than principled. If these constraints are progressively overcome, and AI systems come to out
Knowledge Map
Connected Articles — Knowledge Graph
This article is connected to other articles through shared AI topics and tags.
More in Research Papers

Realistic Lip Motion Generation Based on 3D Dynamic Viseme and Coarticulation Modeling for Human-Robot Interaction
arXiv:2604.01756v1 Announce Type: new Abstract: Realistic lip synchronization is essential for the natural human-robot non-verbal interaction of humanoid robots. Motivated by this need, this paper presents a lip motion generation framework based on 3D dynamic viseme and coarticulation modeling. By analyzing Chinese pronunciation theory, a 3D dynamic viseme library is constructed based on the ARKit standard, which offers coherent prior trajectories of lips. To resolve motion conflicts within continuous speech streams, a coarticulation mechanism is developed by incorporating initial-final (Shengmu-Yunmu) decoupling and energy modulation. After developing a strategy to retarget high-dimensional spatial lip motion to a 14-DOF lip actuation system of a humanoid head platform, the efficiency and

3-D Relative Localization for Multi-Robot Systems with Angle and Self-Displacement Measurements
arXiv:2604.01703v1 Announce Type: new Abstract: Realizing relative localization by leveraging inter-robot local measurements is a challenging problem, especially in the presence of measurement noise. Motivated by this challenge, in this paper we propose a novel and systematic 3-D relative localization framework based on inter-robot interior angle and self-displacement measurements. Initially, we propose a linear relative localization theory comprising a distributed linear relative localization algorithm and sufficient conditions for localizability. According to this theory, robots can determine their neighbors' relative positions and orientations in a purely linear manner. Subsequently, in order to deal with measurement noise, we present an advanced Maximum a Posterior (MAP) estimator by a

Coupler Position Optimization and Channel Estimation for Flexible Coupler Antenna Aided Multiuser Communication
arXiv:2602.11319v2 Announce Type: replace-cross Abstract: In this paper, we propose a distributed flexible coupler antenna (FCA) array to enhance communication performance with low hardware cost. At each FCA, there is one fixed-position active antenna and multiple passive couplers that can move within a designated region around the active antenna. Moreover, each FCA is equipped with a local processing unit (LPU). All LPUs exchange signals with a central processing unit (CPU) for joint signal processing. We study an FCA-aided multiuser multiple-input multiple-output (MIMO) system, where an FCA array base station (BS) is deployed to enhance the downlink communication between the BS and multiple single-antenna users. We formulate optimization problems to maximize the achievable sum rate of us



Discussion
Sign in to join the discussion
No comments yet — be the first to share your thoughts!