Live
Black Hat USADark ReadingBlack Hat AsiaAI Businessb8646llama.cpp ReleasesIran claims it has hit Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tootomshardware.comThe prompt as a genre: instructional rhetoric for language modelsGenerative AII spent a year burning money on AI and finally decided to do something about itGenerative AIThe largest programming community on Reddit just banned all content related to AI LLMs — r/programming is prioritizing only high-quality discussions about AItomshardware.comEveryone Is Worshipping the Wrong AI Heroes—What Hidden Figures Teaches Us About This MomentGenerative AIAI Pair Programming Made Us Faster — But Worse EngineersGenerative AIWhy We Need to Stop Obsessing Over AI ModelsGenerative AIThe AI Professional Development Loop — and What It Devalues for TeachersGenerative AIBeyond Autoregression: How Diffusion Language Models Are Rewriting the Rules of AIGenerative AIMicrosoft deepens its commitment to Japan with $10 billion investment in AI infrastructure, cybersecurity, and workforce - Microsoft SourceGNews AI cybersecurityAI and humanoids have no place in West Virginia’s schools - West Virginia WatchGNews AI educationBlack Hat USADark ReadingBlack Hat AsiaAI Businessb8646llama.cpp ReleasesIran claims it has hit Oracle data center in Dubai, Amazon data center in Bahrain — country has threatened to attack Nvidia, Intel, and others, tootomshardware.comThe prompt as a genre: instructional rhetoric for language modelsGenerative AII spent a year burning money on AI and finally decided to do something about itGenerative AIThe largest programming community on Reddit just banned all content related to AI LLMs — r/programming is prioritizing only high-quality discussions about AItomshardware.comEveryone Is Worshipping the Wrong AI Heroes—What Hidden Figures Teaches Us About This MomentGenerative AIAI Pair Programming Made Us Faster — But Worse EngineersGenerative AIWhy We Need to Stop Obsessing Over AI ModelsGenerative AIThe AI Professional Development Loop — and What It Devalues for TeachersGenerative AIBeyond Autoregression: How Diffusion Language Models Are Rewriting the Rules of AIGenerative AIMicrosoft deepens its commitment to Japan with $10 billion investment in AI infrastructure, cybersecurity, and workforce - Microsoft SourceGNews AI cybersecurityAI and humanoids have no place in West Virginia’s schools - West Virginia WatchGNews AI education
AI NEWS HUBbyEIGENVECTOREigenvector

Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages

arXivMarch 30, 202610 min read0 views
Source Quiz

arXiv:2512.08777v2 Announce Type: replace-cross Abstract: We propose a post-training method for lower-resource languages that preserves the fluency of language models even when aligned by disfluent reward models. Preference optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and instruction-tuned language models capable of generating fluent synthetic data. To address this, we focus on developing a fluent preference-aligned language model without any instruction — David Samuel, Lilja {\O}vrelid, Erik Velldal, Andrey Kutuzov

View PDF HTML (experimental)

Abstract:We propose a post-training method for lower-resource languages that preserves the fluency of language models even when aligned by disfluent reward models. Preference optimization is now a well-researched topic, but previous work has mostly addressed models for English and Chinese. Lower-resource languages lack both datasets written by native speakers and instruction-tuned language models capable of generating fluent synthetic data. To address this, we focus on developing a fluent preference-aligned language model without any instruction-tuning data in the target language. Our approach uses an on-policy training method, which we compare with two common alternatives: supervised finetuning on machine-translated data and multilingual finetuning. We conduct a case study on Norwegian Bokmål and evaluate fluency through native-speaker assessments. The results show that the on-policy aspect is crucial and outperforms the alternatives without relying on any hard-to-obtain data.

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as: arXiv:2512.08777 [cs.CL]

(or arXiv:2512.08777v2 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2512.08777

arXiv-issued DOI via DataCite

Journal reference: The Fourteenth International Conference on Learning Representations (ICLR 2026)

Submission history

From: David Samuel [view email] [v1] Tue, 9 Dec 2025 16:31:48 UTC (620 KB) [v2] Fri, 27 Mar 2026 10:41:05 UTC (664 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by Eigenvector · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

Knowledge Map

Knowledge Map
TopicsEntitiesSource
Fluent Alig…researchpaperarxivaiartificial-…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 148 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers