Live
Black Hat USAAI BusinessBlack Hat AsiaAI Business"Final Year Student? Here's Exactly What You Need to Get a Dev Job in 2026"DEV CommunityHow I Launched 14 SaaS Products in 6 Months as a Solo Founder Using LovableDEV CommunityFDB Just Launched the First MCP Server for Medication DecisionsDEV CommunityClaude Code Unpacked: A Visual GuideDEV CommunityGoogle Deepmind study exposes six "traps" that can easily hijack autonomous AI agents in the wild - the-decoder.comGoogle News: DeepMind3 Lines of Code Saved Anthropic 250K API Calls Per DayDEV CommunityClaude Knows When You're Mad — And Uses Regex, Not AIDEV CommunityInside Claude Code: 12 Hidden Features Anthropic Didn't Want You to SeeDEV CommunityCameo partners with TikTok to boost popularityTechCrunch AI🔐 AES-256 Finally Makes Sense (And It’s Way Simpler Than You Think)DEV CommunityI Built an OPA Plugin That Turns It Into an AuthZEN-Compatible PDPDEV CommunityMonorepo Architecture with pnpm Workspace, Turborepo & Changesets 📦DEV CommunityBlack Hat USAAI BusinessBlack Hat AsiaAI Business"Final Year Student? Here's Exactly What You Need to Get a Dev Job in 2026"DEV CommunityHow I Launched 14 SaaS Products in 6 Months as a Solo Founder Using LovableDEV CommunityFDB Just Launched the First MCP Server for Medication DecisionsDEV CommunityClaude Code Unpacked: A Visual GuideDEV CommunityGoogle Deepmind study exposes six "traps" that can easily hijack autonomous AI agents in the wild - the-decoder.comGoogle News: DeepMind3 Lines of Code Saved Anthropic 250K API Calls Per DayDEV CommunityClaude Knows When You're Mad — And Uses Regex, Not AIDEV CommunityInside Claude Code: 12 Hidden Features Anthropic Didn't Want You to SeeDEV CommunityCameo partners with TikTok to boost popularityTechCrunch AI🔐 AES-256 Finally Makes Sense (And It’s Way Simpler Than You Think)DEV CommunityI Built an OPA Plugin That Turns It Into an AuthZEN-Compatible PDPDEV CommunityMonorepo Architecture with pnpm Workspace, Turborepo & Changesets 📦DEV Community

MuonEq: Balancing Before Orthogonalization with Lightweight Equilibration

arXivMarch 31, 202610 min read0 views
Source Quiz

arXiv:2603.28254v1 Announce Type: new Abstract: Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before finite-step Newton--Schulz using row/column squared — Da Chang, Qiankun Shi, Lvgang Zhang, Yu Li, Ruijie Zhang, Yao Lu, Yongxiang Liu, Ganzhao Yuan

View PDF HTML (experimental)

Abstract:Orthogonalized-update optimizers such as Muon improve training of matrix-valued parameters, but existing extensions mostly act either after orthogonalization by rescaling updates or before it with heavier whitening-based preconditioners. We introduce {\method}, a lightweight family of pre-orthogonalization equilibration schemes for Muon in three forms: two-sided row/column normalization (RC), row normalization (R), and column normalization (C). These variants rebalance the momentum matrix before finite-step Newton--Schulz using row/column squared-norm statistics and only $\mathcal{O}(m+n)$ auxiliary state. We show that finite-step orthogonalization is governed by input spectral properties, especially stable rank and condition number, and that row/column normalization is a zeroth-order whitening surrogate that removes marginal scale mismatch. For the hidden matrix weights targeted by {\method}, the row-normalized variant R is the natural default and preserves the $\widetilde{\mathcal{O}}(T^{-1/4})$ stationarity guarantee of Muon-type methods. In LLaMA2 pretraining on C4, the default R variant consistently outperforms Muon on 130M and 350M models, yielding faster convergence and lower validation perplexity.

Subjects:

Machine Learning (cs.LG); Machine Learning (stat.ML)

Cite as: arXiv:2603.28254 [cs.LG]

(or arXiv:2603.28254v1 [cs.LG] for this version)

https://doi.org/10.48550/arXiv.2603.28254

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Da Chang [view email] [v1] Mon, 30 Mar 2026 10:28:18 UTC (403 KB)

Was this article helpful?

Sign in to highlight and annotate this article

AI
Ask AI about this article
Powered by AI News Hub · full article context loaded
Ready

Conversation starters

Ask anything about this article…

Daily AI Digest

Get the top 5 AI stories delivered to your inbox every morning.

More about

researchpaperarxiv

Knowledge Map

Knowledge Map
TopicsEntitiesSource
MuonEq: Bal…researchpaperarxivmachine-lea…deep-learni…arXiv

Connected Articles — Knowledge Graph

This article is connected to other articles through shared AI topics and tags.

Knowledge Graph100 articles · 197 connections
Scroll to zoom · drag to pan · click to open

Discussion

Sign in to join the discussion

No comments yet — be the first to share your thoughts!

More in Research Papers